Started in January,1974(Monthly)
Supervised and Sponsored by Chongqing Southwest Information Co., Ltd.
ISSN 1002-137X
CN 50-1075/TP
CODEN JKIEBK
Editors
    Content of Computer Graphics & Multimedia in our journal
        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Study on Time-varying Brain State Based on fMRI Data-A Review
    LIN Qiye, XIA Jianan, ZHOU Xuezhong
    Computer Science    2024, 51 (4): 182-192.   DOI: 10.11896/jsjkx.230700059
    Abstract47)      PDF(pc) (2651KB)(60)       Save
    Functional magnetic resonance imaging(fMRI) has been widely applied in the study of human brain activity.Recently,the use of brain states to investigate brain dynamics has attracted extensive attention from researchers.Previous reviews on brain states typically compare and summarize from the perspective of state definition methods,neglecting the inconsistency in under-lying data formats,which may results in diverse interpretations of brain states.Furthermore,these reviews also lack discussions on the analytical approaches for brain states.Here,we review various methods for defining brain states based on different data formats,provide an overview of different approaches for analyzing brain dynamics based on brain states,and summarize typical research methods in the application of brain states to cognition,psychiatric disorders,physiological states,and other aspects.Fina-lly,we find similarities between the definition of brain meta-states and feature extraction in deep learning.Therefore,we believe that deep learning is a promising approach for studying brain states.
    Reference | Related Articles | Metrics
    Review of Vision-based Neural Network 3D Dynamic Gesture Recognition Methods
    WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping
    Computer Science    2024, 51 (4): 193-208.   DOI: 10.11896/jsjkx.230200205
    Abstract46)      PDF(pc) (7133KB)(62)       Save
    Dynamic gesture recognition,as an important means of human-computer interaction,has received widespread attention.Among them,the visual-based recognition method has become the preferred choice for the new generation of human-computer interaction due to its convenience and low cost.Centered on artificial neural networks,this paper reviews the research progress of visual-based gesture recognition methods,analyzes the development status of different types of artificial neural networks in gesture recognition,investigates and summarizes the types and characteristics of data to be recognized and training datasets.In addition,through performance comparison experiments,different types of artificial neural networks are objectively evaluated,and the results are analyzed.Finally,based on the summary of the research content,the challenges and problems faced in this field are elaborated,and the development trend of dynamic gesture recognition technology is prospected.
    Reference | Related Articles | Metrics
    Metal Surface Defect Detection Method Based on Dual-stream YOLOv4
    XU Hao, LI Fengrun, LU Lu
    Computer Science    2024, 51 (4): 209-216.   DOI: 10.11896/jsjkx.230100141
    Abstract48)      PDF(pc) (3452KB)(69)       Save
    Currently,many researchers use deep learning for surface defect detection.However,most of these studies follow the mainstream object detection algorithm and focus on high-level semantic features while neglecting the importance of low-level semantic information(color,shape) for surface defect detection,resulting in unsatisfactory defect detection effect.To address this issue,a metal surface defect detection network called the dual-stream YOLOv4 network is proposed.The backbone network is split into two branches,with inputs consisting of high-resolution and low-resolution images.The shallow branch is responsible for extracting low-level features from the high-resolution image,while the deep branch is responsible for extracting high-level features from the low-resolution image.The model's total parameter volume is reduced by cutting down the number of layers and channels in both branches.To enhance the low-level semantic features,a tree-structured multi-scale feature fusion method(TMFF) is proposed,and a feature fusion module with a polarized self-attention mechanism and spatial pyramid pooling(FFM-PSASPP) is designed and applied to the TMFF.The algorithm's map@50 results on the test sets of the Northeastern University hot-rolled strip surface defect dataset(NEU-DET),the metal surface defect dataset(GC10-DET),and the enaiter rice cooker inner pot defect dataset are 0.80,0.66,and 0.57,respectively.Compared to most mainstream object detection algorithms used for defect detection,there is an improvement,and the model's parameter volume is only half that of the original YOLOv4,with a speed close to YOLOv4,making it suitable for practical use.
    Reference | Related Articles | Metrics
    Video and Image Salient Object Detection Based on Multi-task Learning
    LIU Zeyu, LIU Jianwei
    Computer Science    2024, 51 (4): 217-228.   DOI: 10.11896/jsjkx.231000051
    Abstract38)      PDF(pc) (5070KB)(66)       Save
    Salient object detection(SOD) can quickly identify high-value salient objects in complex scenes,which simulates human attention and lays the foundation for further vision understanding tasks.Currently,the mainstream methods for image-based salient object detection are usually trained on DUTS-TR dataset,while video-based salient object detection(VSOD) methods are trained on DAVIS,DAVSOD,and DUTS-TR datasets.Because image and video salient object detection tasks have general and specific characteristics,independent models need to be deployed for separate training,which greatly increases computational resources and training time.Current research typically focuses on independent solution for a single task.However,a unified method for both image and video salient object detection is lack of research.To address on aforementioned issues,this paper proposes a multi-task learning-based method for image and video salient object detection,aiming to build a universal framework which simultaneously adapts to both tasks with a single training process,and further bridges the performance gaps between image and video salient object detection models.Qualitative and quantitative experimental results on 12 datasets show that the proposed method can not only adapt to both tasks,but also achieve better detection results than single-task models.
    Reference | Related Articles | Metrics
    Algorithm of Stereo Matching Based on GAANET
    SONG Hao, MAO Kuanmin, ZHU Zhou
    Computer Science    2024, 51 (4): 229-235.   DOI: 10.11896/jsjkx.230100137
    Abstract23)      PDF(pc) (2825KB)(38)       Save
    End-to-end stereo matching algorithms have become increasingly popular in stereo matching tasks due to their advantages in computational time and matching accuracy.However,feature extraction in such algorithms can result in redundant features,information loss,and insufficient multi-scale feature fusion,thereby increasing computational complexity and decreasing matching accuracy.To address these challenges,an improved ghost adaptive aggregation network(GAANET) is proposed based on the adaptive aggregation network(AANET),and its feature extraction module is improved to make it more suitable for stereo matching tasks.Multi-scale features are extracted in the G-Ghost phase,and partial features are generated through low-cost ope-rations to reduce feature redundancy and preserve shallow features.An efficient channel attention mechanism is implemented to allocate weights to each channel,and an improved feature pyramid structure is introduced to mitigate channel information loss in traditional pyramids and optimize feature fusion,thus enhancing information supplement for features across scales.The proposed GAANET model is trained and evaluated on the SceneFlow,KITTI2015,and KITTI2012 datasets.Experimental resultsdemons-trate that GAANET outperforms the baseline method,with accuracy improvements of 0.92%,0.25%,and 0.20%,respectively,while reducing parameter volume by 13.75% and computational complexity by 4.8%.
    Reference | Related Articles | Metrics
    Human Action Recognition Algorithm Based on Adaptive Shifted Graph Convolutional Neural
    Network with 3D Skeleton Similarity
    YAN Wenjie, YIN Yiying
    Computer Science    2024, 51 (4): 236-242.   DOI: 10.11896/jsjkx.221200120
    Abstract34)      PDF(pc) (2583KB)(37)       Save
    Graph convolutional neural network(GCN) has achieved good results in the field of human action recognition based on 3D skeleton.However,in most of the existing GCN methods,the construction of the behavior diagram is based on the manual setting of the physical structure of the human body.In the training stage,each graph node can only establish the connection accor-ding to the manual setting,which cannot perceive new connections between bone nodes during action,resulting in the unreasonable and inflexible topology of the graph.The shifted graph convolutional neural network(Shift-GCN) makes the receptive field more flexible by changing its structure,and achieves satisfied results in the global shift angle.In order to tackle the above pro-blems of graph structure,an adaptive shift graph convolutional neural network(AS-GCN) is proposed to make up for the above shortcomings.AS-GCN draws on the idea of shifted graph convolutional neural network,and proposes to use the characteristics of each human action to guide the graph network to perform shift operation,so as to select the nodes that need to expand the receptive field as accurately as possible.On the general skeleton-based action recognition dataset NTU-RGBD,the AS-GCN is verified by extensive experiments under the premise of whether the skeleton has physical relationship constraints or not.Compared with the existing advanced algorithms,the accuracy of action recognition of AS-GCN is improved by 12% and 4.84% respectively in CV and CS angles on average with skeleton physical constraints.While under the condition of no skeleton physical constraint,the average improvement is 20% and 14.49% in CV and CS angles,respectively.
    Reference | Related Articles | Metrics
    Progressive Multi-stage Image Denoising Algorithm Combining Convolutional Neural Network and
    Multi-layer Perceptron
    XUE Jinqiang, WU Qin
    Computer Science    2024, 51 (4): 243-253.   DOI: 10.11896/jsjkx.230100140
    Abstract30)      PDF(pc) (3687KB)(63)       Save
    Among the existing image denoising methods based on deep learning,there are problems at the network architecture dimension that single-stage network is hard to represents feature dependency and it is difficult to reconstruct clear images in complex scenarios.The internal features of multi-stage networks are not tightly connected and the original image details are easily lost.At the basic building block dimension,there are problems that the convolutional layer is difficult to handle cross-level features at large noise levels,and the fully connected layer is difficult to capture the spatial details of the image locality.To solve the above problems,this paper proposes solutions from two aspects.On the one hand,a novel cross-stage gating feature fusion is proposed at the architecture dimension,so as to better connect the shallow features of the first-stage network with the deep features of the second-stage network,promote the interaction of information flow and make the internal correlation of the denoising network closer,while avoiding the loss of original spatial details.On the other hand,a dual-axis shifted block combining convolu-tional neural network(CNN) and multi-layer perceptron(MLP) is proposed,which is applied to low-resolution and multi-channel number feature maps to alleviate the problem of insufficient learning ability of CNN on cross-level feature dependencies in complex noise scenarios.And CNN is used to focus on high-resolution feature maps with low channel number to fully extract the spatial local dependencies of noisy images.Many quantitative and qualitative experiments prove that the proposed algorithm achieves the best peak signal-to-noise ratio(PSNR) and structural similarity(SSIM)denoising indicators with a small number of parameters and computational costs in real-world image denoising and Gaussian noise removal tasks.
    Reference | Related Articles | Metrics
    Global Covariance Pooling Based on Fast Maximum Singular Value Power Normalization
    ZENG Ruiren, XIE Jiangtao, LI Peihua
    Computer Science    2024, 51 (4): 254-261.   DOI: 10.11896/jsjkx.230200140
    Abstract28)      PDF(pc) (1950KB)(36)       Save
    Recent research work shows that matrix normalization plays a key role in global covariance pooling,which helps to generate more discriminative representations,thus improving the performance of image recognition tasks.For different normalization methods,the matrix structure-wise normalization can make full use of the geometric structure of the covariance matrix,so it can obtain better performance.However,the structure-wise normalization generally depends on singular value decomposition(SVD) or eigenvalue decomposition(EIG) with high computational cost,which limits parallel computing ability of GPUs,beco-ming a computational bottleneck.Iterative matrix square root normalization(iSQRT) uses Newton-Schulz iteration to normalize the covariance matrix,which is faster than the methods based on SVD and EIG.However,with the increase of the number of itera-tions and dimensions,the time and memory of iSQRT will increase significantly,and this method cannot complete the normalization of general power,which limits its application scope.To solve the above problems,a covariance matrix normalization method based on the maximum singular value power is proposed by dividing the covariance matrix by the power of its maximum singular value which only depends on iterative power method to estimate the maximum singular value of the matrix.Detailed ablation experiments show that,compared with iSQRT,the proposed method is faster and occupies less memory,and is superior to iSQRT in terms of time complexity and space complexity,and its performance is comparable to or better than iSQRT.The proposed method has achieved state-of-the-art performance in large-scale image classification dataset and fine-grained visual recognition datasets,including Aircraft,Cars and Indoor67,where accuracy is 90.7%,93.3% and 83.9% respectively.The result fully demonstrates the robustness and generalization of the proposed method.
    Reference | Related Articles | Metrics
    Speech Emotion Recognition Based on Voice Rhythm Differences
    ZHANG Jiahao, ZHANG Zhaohui, YAN Qi, WANG Pengwei
    Computer Science    2024, 51 (4): 262-269.   DOI: 10.11896/jsjkx.230200063
    Abstract37)      PDF(pc) (2489KB)(46)       Save
    Speech emotion recognition has an important application prospect in financial anti-fraud and other fields,but it is increasingly difficult to improve the accuracy of speech emotion recognition.The existing methods of speech emotion recognition based on spectrograms are difficult to capture the rhythm difference features,which affects the recognition effect.Based on the difference of speech rhythm features,this paper proposes a speech emotion recognition method based on energy frames and time-frequency fusion.The key is to screen high-energy regions of the spectrum in the speech,and reflect the individual voice rhythm differences with the distribution of high-energy speech frames and time-frequency changes.On this basis,an emotion recognition model based on convolutional neural network(CNN) and recurrent neural network(RNN) is established to realize the extraction and fusion of the time and frequency changes of the spectrum.On the open data set IEMOCAP,the experiment shows that compared with the method based on spectrogram,the weighted accuracy WA and the unweighted accuracy UA of the speech emotion recognition based on the difference of speech rhythm increases by 1.05% and 1.9% on average respectively.At the same time,it also shows that individual voice rhythm difference plays an important role in improving the effect of speech emotion recognition.
    Reference | Related Articles | Metrics
    Multi-view Autoencoder-based Functional Alignment of Multi-subject fMRI
    HUANG Shuo, SUN Liang, WANG Meiling, ZHANG Daoqiang
    Computer Science    2024, 51 (3): 141-146.   DOI: 10.11896/jsjkx.230600166
    Abstract49)      PDF(pc) (1936KB)(106)       Save
    One of the major challenges in functional magnetic resonance imaging(fMRI) research is the heterogeneity of fMRI data across different subjects.On the one hand,analyzing multi-subject data is crucial for determining the generalizability and effectiveness of the generated results across subjects.On the other hand,analyzing multi-subject fMRI data requires accurate anatomical and functional alignment among the neural activities of different subjects to enhance the performance of the final results.However,most existing functional alignment studies employ shallow models to handle the complex relationships among multiple subjects,severely limiting the modeling capacity for multi-subject information.To solve this problem,this paper proposes a multi-view auto-encoder functional alignment(MAFA) method based on multi-view auto-encoders.Specifically,our method learns node embedding by reconstructing the response spaces of different subjects,capturing shared feature representations among subjects,and creating a common response space.We also introduce the graph clustering process by introducing self-training clustering objectives using high-confidence nodes as soft labels.Experimental results on four datasets demonstrate that the proposed method achieves the best decoding accuracy compared to other multi-subject fMRI functional alignment methods.
    Reference | Related Articles | Metrics
    Unsupervised Low-light Image Enhancement Model with Adaptive Noise Suppression and Detail Preservation
    GAO Ren, HAO Shijie, GUO Yanrong
    Computer Science    2024, 51 (3): 147-154.   DOI: 10.11896/jsjkx.221200074
    Abstract62)      PDF(pc) (4732KB)(168)       Save
    The visual quality of images taken under low-light environment is usually low,due to many factors such as low lightness and imaging noise.Current low-light image enhancement methods have a common limitation that they only focus on improving lightness condition and suppressing noise,but neglect to preserve image details.To solve this problem,an unsupervised low light image enhancement method is proposed in this paper,aiming to improve the visibility and preserve the fidelity of an image with good efficiency.The model consists of two stages,i.e.,low-light enhancement and noise suppression.In the first stage,an unsupervised image decomposition module and a lightness enhancement module are constructed to achieve the goal of improving visibility.In the second stage,under the guidance of the illumination distribution of an image,we synthesize pairwise training data and train the denoising network to depress the imaging noise from the originally-dim regions and preserve the image details of the originally-bright regions.Compared with other methods,experimental results show that our method achieves better balance between the goals of visibility improvement and fidelity preservation.In addition,our method can be attractive in real-world applications,as it does not need to collect bright-dim image pairs,and it has small model size and fast calculation speed.
    Reference | Related Articles | Metrics
    Appearance Fusion Based Motion-aware Architecture for Moving Object Segmentation
    XU Bangwu, WU Qin, ZHOU Haojie
    Computer Science    2024, 51 (3): 155-164.   DOI: 10.11896/jsjkx.221200153
    Abstract35)      PDF(pc) (4262KB)(115)       Save
    Moving object segmentation aims to segment all moving objects in the current scene,and it is of critical significance for many computer vision applications.At present,many moving object segmentation methods use the motion information from 2D optical flow maps to segment moving objects,which have many defects.For moving objects moving in the epipolar plane or moving objects whose 3D motion direction are consistent with the background,it is difficult to identify these objects by the 2D optical flow maps.Besides,incorrect 2D optical flow also effects the result of moving object segmentation.To solve the above problems,this paper proposes different motion costs to improve the performance of moving object segmentation.In order to detect moving objects with coplanar and collinear motion,this paper proposes a balanced reprojection cost and a multi-angle optical flow contrast cost,which measures the difference between the 2D optical flow of moving objects and that of the background.For ego-motion degeneracy,this paper designs a differential homography cost.To segment moving objects in complex scenes,this paper proposes an appearance fusion based motion-aware architecture.In this architecture,in order to effectively fuse appearance features and motion features of objects,the multi-modality co-attention gate is adapted to achieve better interaction between appearance and motion cues.Besides,to emphasize moving objects,this paper introduces a multi-level motion based attention module to suppress redundant and misleading information.Extensive experiments are conducted on the KITTI dataset,the JNU-UISEE dataset,the KittiMoSeg dataset and the Davis-2016 dataset,and the proposed method achieves excellent performance.
    Reference | Related Articles | Metrics
    Object Detection Method with Multi-scale Feature Fusion for Remote Sensing Images
    ZHANG Yang, XIA Ying
    Computer Science    2024, 51 (3): 165-173.   DOI: 10.11896/jsjkx.230200030
    Abstract76)      PDF(pc) (4906KB)(139)       Save
    Object detection for remote sensing images is an important research direction in the field of computer vision,which is widely used in military and civil fields.The objects in remote sensing images have the characteristics of multiple scales,dense arrangement and similarity between classes,so that the object detection methods used in natural images have many omissions and false detection in remote sensing images.To address this problem,this paper proposes an object detection method with multi-scale feature fusion based on YOLOv5 for remote sensing images.Firstly,a residual unit fusing multi-head self-attention is introduced into the backbone network,through which multi-level feature information is fully extracted and semantic differences among diffe-rent scales were reduced.Secondly,a feature pyramid network fusing lightweight upsampling operators is introduced for obtaining high level semantic features and low-level detail ones.And the feature maps with richer feature information could be acquired by feature fusion,which improves the feature resolution of objects at different scales.The performance of the proposed method is evaluated on the datasets DOTA and NWPU VHR-10,and the accuracy(mAP) of the method isimproved by 1.5% and 2.0%,respectively,compared with the baseline model.
    Reference | Related Articles | Metrics
    Combined Road Segmentation and Contour Extraction for Remote Sensing Images Based on Cascaded U-Net
    LI Yu, YANG Xiangli, ZHANG Le, LIANG Yalin, GAO Xian, YANG Jianxi
    Computer Science    2024, 51 (3): 174-182.   DOI: 10.11896/jsjkx.221200032
    Abstract82)      PDF(pc) (4205KB)(120)       Save
    Aiming at the problem that the deep-learning-based model for road information extraction can only output single-task results and the inadequate use of correlation between multiple tasks,a combined road segmentation and contour extraction method based on cascaded U-Net is proposed,which extracts the road contour after fusing the feature map of road semantic segmentation with the original image.Firstly,the U-Net network structure is used to extract the hierarchical features of optical remote sensing images,and the cascaded U-Net structure is introduced to concatenate the features to extract the pixel-level label and contours of roads respectively.Secondly,the attention mechanism module is added to each stage of U-Net to extract spatial context information and deep level features to improve the detection sensitivity of details.Finally,the joint loss function composed of dice coefficient and cross-entropy error is used for the overall training to extract simultaneously the road semantic segmentation and contour results.On the optical remote sensing dataset of the urban area of Ottawa,Canada,the joint extraction method of road information based on cascaded U-Net achieves 42% precision,58% recall,48.2% F1 score and 71.6% mIoU in the segmentation index,and achieves a global optimal threshold(ODS) of 0.896 in the road detection index.The results show that,the model can meet the requirements of joint extraction of road multi-task information and has better detection accuracy.
    Reference | Related Articles | Metrics
    Review of Transformer in Computer Vision
    CHEN Luoxuan, LIN Chengchuang, ZHENG Zhaoliang, MO Zefeng, HUANG Xinyi, ZHAO Gansen
    Computer Science    2023, 50 (12): 130-147.   DOI: 10.11896/jsjkx.221100076
    Abstract169)      PDF(pc) (6634KB)(2246)       Save
    Transformer is an attention-based encoder-decoder architecture.Due to its long-range sequence modeling and parallel computing capability,Transformer have made a significant breakthrough in natural language processing and is gradually expanding to computer vision(CV) fields,which has become an important research direction in CV tasks.Three sorts of visual Transformer-based CV task,including classification,object detection and segmentation,are focused on by this paper,which summarizes their application and modification.Starting from image classification,this paper first analyses the existing issue in vision Transformer including data size,structure and computational efficiency,then sorts out the corresponding solutions according to the issue.Besides,this paper provides a literature review on object detection and segmentation,which organizes these methods accor-ding to their structures and motivations and summarizes their corresponding pros and cons.Finally,the challenges and future development trends of the Transformer in vision transformer are summarized and discussed in this paper.
    Reference | Related Articles | Metrics
    Prior-guided Blind Iris Image Restoration Algorithm
    WANG Jia, XIANG Liuyu, HUANG Yubo, XIA Yufeng, TIAN Qing, HE Zhaofeng
    Computer Science    2023, 50 (12): 148-155.   DOI: 10.11896/jsjkx.230500217
    Abstract199)      PDF(pc) (3594KB)(2135)       Save
    As one of the most potential biometric technologies,iris recognition has been widely used in various industries.How-ever,the existing iris recognition system is easily disturbed by external factors during the image acquisition process,and the acquired iris images have inevitable problems of insufficient resolution and easy blurring.To address these challenges,a prior-guided blind iris image restoration method is proposed,which utilizes the generative adversarial network and iris priors to recover unknown degraded iris images mixed with degradation factors such as low resolution,motion blur,and out-of-focus blur.The network includes a degradation removal sub-network,a prior estimation sub-network,and a prior fusion sub-network.The prior estimation sub-network models the distribution of the style information of the input as prior knowledge to guide the generative network.Besides,the prior fusion sub-network uses an attentive fusion mechanism to integrate multi-level style features,which improves the utilization of information.Experimental results show that the proposed method outperforms other methods in both qualitative and quantitative indexes,achieves blind recovery of degraded irises,and improves the robustness of iris recognition.
    Reference | Related Articles | Metrics
    Improved Fast Image Translation Model Based on Spatial Correlation and Feature Level Interpolation
    LI Yuqiang, LI Huan, LIU Chun
    Computer Science    2023, 50 (12): 156-165.   DOI: 10.11896/jsjkx.221100027
    Abstract204)      PDF(pc) (3539KB)(2070)       Save
    In recent years,with the popularity of deep learning algorithms,the image translation tasks have achieved remarkable results.Many researches are devoted to reduce model running time while maintaining the quality of image generation,among which ASAPNet model is a typical representative.However,the feature level loss function of this model cannot completely decouple image features and appearance,and most of its calculations are performed at extremely low resolution,resulting in poor image quality.In response to the above issues,this paper proposes an improved ASAPNet model—SRFIT,based on spatial correlation and feature level interpolation.Specifically,according to the principle of self-similarity,the spatially-correlative loss is used to replace the feature matching loss in the original model to alleviate the problem of scene structure differences during image translation,so as to improve the accuracy of image translation.In addition,inspired by the data augmentation method in ReMix,we also increase the amount of data at the image feature level through linear interpolation,which addresses the overfitting problem of the generator.Finally,the results of comparative experiments on two public datasets,facades and cityscapes,show that compared with the current mainstream models,the proposed method shows better performance,it can effectively improve the quality of generated image while maintaining a faster running speed.
    Reference | Related Articles | Metrics
    Feature Fusion and Boundary Correction Network for Salient Object Detection
    CHEN Hui, PENG Li
    Computer Science    2023, 50 (12): 166-174.   DOI: 10.11896/jsjkx.221100203
    Abstract101)      PDF(pc) (4504KB)(2123)       Save
    Saliency object detection aims to find visually significant areas in an image.Existing salient object detection methods have shown strong advantages,but they are still limited by scale perception and boundary prediction.First of all,there are many scales of salient objects in various scenes,which makes it difficult for the algorithm adapt to different scale changes.Secondly,salient objects often have complex contours,which makes detection of boundary pixels more difficult.To solve these problems,this paper proposes a feature fusion and boundary correction network for salient object detection.This network extracts salient features at different levels on the feature pyramid.Firstly,a feature fusion decoder composed of multi-scale feature decoding modules is designed for the scale diversity of the object.By fusing the features of adjacent layer by layer,the network's ability to perceive the scale is improved.At the same time,a boundary correction module is designed to learn the contour features of salient objects to generate high quality salient images with clear boundaries.Experimental results on five commonly used salient object detection datasets show that the proposed algorithm can achieve better results on the average absolute error,F index and S index.
    Reference | Related Articles | Metrics
    Multi-temporal Hyperspectral Anomaly Change Detection Based on Dual Space Conjugate Autoencoder
    LI Shasha, XING Hongjie, LI Gang
    Computer Science    2023, 50 (12): 175-184.   DOI: 10.11896/jsjkx.221100092
    Abstract187)      PDF(pc) (3740KB)(2030)       Save
    Hyperspectral anomaly change detection can find anomaly changes from multi-temporal hyperspectral remote sensing images.These anomaly changes are rare,different from the overall background change trend,difficult to be found,but very intere-sting.For the problems of small-sized data sets,existing noise disturbance,and limitation of linear prediction models,the detection performance of the conventional hyperspectral anomaly change detection methods are greatly degraded.At present,Autoencoder has been successfully applied to hyperspectral anomaly change detection.However,when processing multi-temporal hyperspectral images,a single autoencoder only focuses on the reconstruction quality of images,while usually ignores the complex spectral changes in these images as it obtains bottleneck features.To tackle this problem,the multi-temporal hyperspectral anomaly change detection based on dual space conjugate Autoencoder(DSCAE) method is proposed.The proposed method contains two conjugate autoencoders that construct their own latent features from different directions.In the training process of the proposed method,first,two hyperspectral images at different times respectively obtain their corresponding feature representation in the latent space by their encoders.Then,the predicted image at another time can be obtained by their decoders.Second,different constraints are imposed in the sample space and the latent space,respectively.Moreover,the corresponding loss functions are minimized in the two spaces.Finally,the anomaly loss maps are obtained by the conjugate autoencoders for the two images.The minimization operation is conducted on the two obtained anomaly loss maps to derive the final anomaly change intensity maps to simultaneously decrease the background spectral difference between the two input images and highlight anomaly changes.Experimental results on the benchmark data sets for the hyperspectral anomaly change detection demonstrate that DSCAE achieves better detection performance in comparison with its 10 pertinent methods.
    Reference | Related Articles | Metrics
    Stereo Visual Localization and Mapping for Mobile Robot in Agricultural Environments
    YU Tao, XIONG Shengwu
    Computer Science    2023, 50 (12): 185-191.   DOI: 10.11896/jsjkx.230300116
    Abstract102)      PDF(pc) (1804KB)(2084)       Save
    Visual-based localization and mapping is the key technology for autonomous robots.Visual localization and mapping in agricultural environments faces more challenges,including few distinguishable landmarks for tracking,large-scale scene,unstable movements.To address these problems,a stereo visual localization and mapping method is proposed.Static stereo matching points are used to increase the number andcoverage of map points,which improve the accuracy of depth calculation.A point selection method is proposed to further improve the accuracy and efficiency by sampling the dense map points and removing outliers.Then scale estimation is proposed to reduce the scale error of localization and mapping in large scale agricultural scenes.Keyframe criteria is adapted to avoid the impact of large far-away objects that could cause abnormal keyframe distribution.Finally,a new motion assumption is proposed to recover the system from failure tracking,which improves the system's robustness at the case of unstable movements.Experimental results show that the proposed method achieves better performance than other state-of-the-art vi-sual localization and mapping systems.By addressing the challenges individually,the proposed visual localization and mapping system is more accurate and robust in agricultural environments.
    Reference | Related Articles | Metrics
    Following Method of Mobile Robot Based on Fusion of Stereo Camera and UWB
    FU Yong, WU Wei, WAN Zeqing
    Computer Science    2023, 50 (12): 192-202.   DOI: 10.11896/jsjkx.221000188
    Abstract230)      PDF(pc) (4474KB)(2160)       Save
    This paper studies the autonomous following random robots in a human-machine blending environment.Especially,a stable and effective method is presented for the robot to determine the desired following target and the recognition after the target is lost,that is,to achieve the visual tracking and positioning of pedestrians based on the image of stereo camera and point cloud data.Then,the location information of UWB is introduced to determine the target pedestrian,and a filter algorithm is used to fuse the sensor data to get the coordinate information under the camera coordinate system.Finally,the coordinate transformation is used to convert the location under the robot coordinate system.An improved dynamic window algorithm(MDWA) is also proposed to improve the following tasks performed by the robot.In addition,based on sensor data,a behaviour decision module including following behaviour,recovery behaviour and transition behaviour is proposed.Through the switching between behaviours,the robot can also retrieve the target when it is lost due to the turning of the target or the change of ambient lighting conditions which make the camera invalid.Experimental results show that the proposed following system can automatically determine the desired following target at starting up,and the robot can achieve good obstacle avoidance following in the scene with static obstacles or in the dynamic scene with other non-target pedestrian disturbances in the view.In particular,the robot can independently retrieve the following target in a turning scene or in a scene with varying lighting conditions,and the success rate of the robot in a turning scene is 81%.
    Reference | Related Articles | Metrics
    Hierarchical Graph Convolutional Network for Image Sentiment Analysis
    TAN Qianhui, WEN Jiaxuan, TANG Jihui, SUN Yubao
    Computer Science    2023, 50 (12): 203-211.   DOI: 10.11896/jsjkx.221100177
    Abstract211)      PDF(pc) (4395KB)(2155)       Save
    The image sentiment analysis task aims to use machine learning models to automatically predict the observer's emotional response to images.At present,the sentiment analysis method based on the deep network has attracted wide attention,mainly through the automatic learning of the deep features of the image through the convolutional neural network.However,image emotion is a comprehensive reflection of the global contextual features of the image.Due to the limitation of the receptive field size of the convolution kernel,it is impossible to effectively capture the dependencies between long-distance emotional features.At the same time,the emotional features of different levels in the network cannot be effectively fused and utilized.It affects the accuracy of image sentiment analysis.In order to solve the above problems,this paper proposes a hierarchical graph convolutional network model,and constructs spatial context graph convolution(SCGCN) and dynamic fusion graph convolution(DFGCN).The spatial and channel dimensions are mapped respectively to learn the global context association within different levels of emotional features and the relationship dependence between different levels of features,which could improve the sentiment classification accuracy.The network is composed of four hierarchical prediction branches and one fusion prediction branch.The hierarchical prediction branch uses SCGCN to learn the emotion context expression of single-level features,and the fusion prediction branch uses DFGCN to self-adaptively aggregate the context emotion features of different semantic levels to realize fusion reasoning and classification.Experiment results on four emotion datasets show that the proposed method outperforms existing image emotion classification models in both emotion polarity classification and fine-grained emotion classification.
    Reference | Related Articles | Metrics
    Continuous Dense Normalized Flow Model for Anomaly Detection in Industrial Images
    ZHANG Zouquan, ZHANG Hui, WU Tianyue, CHEN Tiancai
    Computer Science    2023, 50 (12): 212-220.   DOI: 10.11896/jsjkx.221000183
    Abstract256)      PDF(pc) (4429KB)(2036)       Save
    Anomaly detection on the surface of industrial products is an indispensable link in manufacturing.In actual industrial production,there are common phenomena such as low proportion of abnormal samples and complex and changeable unknown abnormal,which in turn cause a series of negative effects such as overfitting and poor generalization ability on few-shot datasets.In recent years,the idea of normalized flow has brought a new approach to the field of industrial image anomaly detection based on deep learning,but the inherent architecture of normalized flow easily leads to insufficient model expressiveness.Aiming at the above difficulties,a continuous dense normalized flow model for industrial image anomaly detection is proposed.First,a feature extraction network pre-training strategy based on contrastive learning is designed,which involves simulated abnormal data and a small amount of real abnormal data in the contrastive learning task,and trains the feature backbone network AlexNet to narrow or widen the distance between specific samples.Secondly,a continuous dense normalized flow model is designed,and it uses a composite architecture of reversible transformation to construct a dense flow module to enhance the fitting ability of the generative model to the distribution.The experimental datasets include MVTec AD,Magnetic Tile Defects and self-made industrial cloth datasets.Compared with other anomaly detection models,our method achieves optimal or sub-optimal detection performance on the three datasets,respectively.
    Reference | Related Articles | Metrics
    Low-dose CT Reconstruction Algorithm Based on Iterative Asymmetric Blind Spot Network
    GUO Guangxing, YIN Guimei, LIU Chenxu, DUAN Yonghong, QIANG Yan, WANG Yanfei, WANG Tao
    Computer Science    2023, 50 (12): 221-228.   DOI: 10.11896/jsjkx.230300014
    Abstract163)      PDF(pc) (3134KB)(2055)       Save
    Aiming at the problem that the method of low-dose CT reconstruction by machine learning method relies too much on pairwise legends,a low-dose CT reconstruction algorithm based on iterative asymmetric blind spot network is proposed.Firstly,low-dose CT is self-supervised by pixel-mixed washing sampling blind spot network,and the preliminarily reconstructed CT images are obtained.Secondly,an iterative model is established,and the result image obtained by the previous network is used as the low-dose input of the network for training to obtain the final network model.Finally,the asymmetric method is used to adjust the stride of the sampling under pixel mixing to minimize aliasing artifacts and obtain the final usable model.Theoretical analysis and experimental results show that compared with the traditional low-dose CT reconstruction algorithm,the iterative asymmetric blind spot network algorithm can greatly reduce the dependence of the low-dose CT reconstruction algorithm on pairwise legends,and can generate images similar to or even better than the traditional method in terms of image quality,texture features and structure.
    Reference | Related Articles | Metrics
    PSwin:Edge Detection Algorithm Based on Swin Transformer
    HU Mingyang, GUO Yan, JIN Yangshuang
    Computer Science    2023, 50 (6): 194-199.   DOI: 10.11896/jsjkx.220700145
    Abstract225)      PDF(pc) (2106KB)(460)       Save
    As a traditional computer vision algorithm,edge detection has been widely used in real-world scenarios such as license plate recognition and optical character recognition.When edge detection is used as the basis for higher-level algorithms,such as target detection,semantic segmentation and other algorithms.Edge detection can also be applied to urban security,autonomous driving and other fields.A good edge detection algorithm can effectively improve the efficiency and accuracy of the above compu-ter vision tasks.The difficulty of the edge extraction task lies in the size of the target and the difference of edge details,so the edge extraction algorithm needs to be able to effectively deal with edges of different scales.In this paper,the Transformer is applied to the edge extraction task for the first time,and a novel feature pyramid network is proposed to make full use of the multi-scale and multi-level features of the backbone network.PSwin uses a self-attention mechanism,which can extract global structural information in images more efficiently than convolutional neural network architectures.When evaluated on the BSDS500 dataset,the proposed PSwin edge detection algorithm achieves the best performance,with an ODS F-measure of 0.826 and an OIS of 0.841.
    Reference | Related Articles | Metrics
    Adaptive Image Dehazing Algorithm Based on Dynamic Convolution Kernels
    LIU Zhe, LIANG Yudong, LI Jiaying
    Computer Science    2023, 50 (6): 200-208.   DOI: 10.11896/jsjkx.220400288
    Abstract165)      PDF(pc) (3864KB)(378)       Save
    Existing image dehazing methods generally have problems such as incomplete dehazing and color distortion.Image dehazing methods based on traditional deep learning models mostly use static inference during testing,which use the same and fixed parameters for different samples,thereby inhibiting the expressive ability of the model and decreasing the dehazing performance.Aiming at the above problems,this paper proposes an adaptive image dehazing algorithm based on dynamic convolution kernel.The proposed model includes three parts:encoding network,adaptive feature enhancement network and decoding network.This paper combines dynamic convolutions,dense residual connections,and attention mechanism to complete the adaptive feature enhancement network,which mainly includes dynamic residual components and dynamic skip-connected feature fusion components.The dynamic residual component is composed of a dynamic residual dense block,a convolutional layer and a dual attention mo-dule.The dynamic residual dense block introduces dynamic convolutions into the residual dense block,and an attention-based weight dynamic aggregator is designed at the same time,which dynamically generates adaptive convolution kernel parameters.The dynamic convolutions have reduced the loss of information and enhanced the expressive ability of the model.The dual attention module combines channel attention and pixel attention to make the model pay more attention to the differences between image channels and areas with uneven distribution of haze.The dynamic skip-connected feature fusion component learns rich contextual information by dynamically fusing the features of different stages via skip-connections,preventing the early features of the network from being forgotten when the information flows into deeper layers.Meanwhile,the feature representations are greatly enriched,which benefits the restorations of the details for fog-free images.Extensive experiments on synthetic datasets and real datasets show that our method not only achieves better objective evaluation scores,but also reconstructs dehazing images with better visual effects,surpassing the performance of compared methods.
    Reference | Related Articles | Metrics
    Hyperspectral Image Denoising Based on Group Sparse and Constraint Smooth Rank Approximation
    ZHANG Lihong, YE Jun
    Computer Science    2023, 50 (6): 209-215.   DOI: 10.11896/jsjkx.220300236
    Abstract284)      PDF(pc) (3589KB)(315)       Save
    In the process of hyperspectral image(HSI) acquisition,there will produce many kinds of noise,and the more the number of noise,the less effective information HSI has.In order to recover HSI's effective messages more effectively from a large number of mixed noises,a constrained smoothing rank approximation for HSI recovery method based on group sparse regularization is proposed in this paper.Among them,the group sparse regularization is defined as the spacial-spectral total variation(SSTV) which based on weighted $\ell_{2,1}$-norm.This regularization not only utilizes the information of spacial-spectral dimension,but also considers the group sparsity inside HSI,which enhances the model's removal effect of mixed noise and the smoothness of spacial-spectral dimension.In addition,the constrained smoothing function is used to approximate the rank function,which makes better use of the low-rank property of HSI and improves the efficiency of the algorithm.The optimization problem is solved by iterative algorithm based on alternating direction multiplier.The results of two simulated data expe-riments and one real data experiment show that compared with the five current mainstream methods,the proposed method has obvious improvement in visual effect and evaluation index.
    Reference | Related Articles | Metrics
    GAN-generated Face Detection Based on Space-Frequency Convolutional Neural Network
    WANG Jinwei, ZENG Kehui, ZHANG Jiawei, LUO Xiangyang, MA Bin
    Computer Science    2023, 50 (6): 216-224.   DOI: 10.11896/jsjkx.220400268
    Abstract161)      PDF(pc) (3227KB)(308)       Save
    The rapid development of generative adversarial networks(GANs) has led to unprecedented success in the field of image generation.The emergence of new GANs such as StyleGAN makes the generated images more realistic and deceptive,posing a greater threat to national security,social stability,and personal privacy.In this paper,a detection algorithm based on a space-frequency joint two-stream convolutional neural network is proposed.Since GAN images will leave clearly discernible artifacts on the spectrum due to the up-sampling operation during the generation process,a learnable frequency-domain filter kernel and frequency domain network are designed to fully learn and extract frequency-domain features.In order to reduce the influence of the information discarded from the image transformation to the frequency domain,a spatial domain network is also designed to learn that the image content itself has differentiated spatial domain features.Finally,the two features are fused to detect the face image generated by GAN.Experimental results on multiple datasets show that the proposed model outperforms existing algorithms in detection accuracy on high-quality generated datasets and generalization across datasets.And for JPEG compression,random cropping,Gaussian blur,and other operations,this method has stronger robustness.In addition,the proposed method also performs well on the local face dataset generated by GAN,which further proves that this model has better generality and wider application prospects.
    Reference | Related Articles | Metrics
    Stratified Pseudo-label Based Image Clustering
    CAI Shaotian, CHEN Xiaojun, CHEN Longteng, QIU Liping
    Computer Science    2023, 50 (6): 225-235.   DOI: 10.11896/jsjkx.220900197
    Abstract288)      PDF(pc) (2941KB)(263)       Save
    Image clustering is an important and open problem in image processing.Recently,some methods combine the powerful representation ability of contrastive learning to carry out end-to-end clustering learning and utilize the pseudo-label technique to improve the robustness of clustering methods.In the existing pseudo-label methods,we need to set a large threshold parameter to obtain highly confident samples to generate one-hot pseudo-labels and often cannot obtain enough highly confident samples.To make up for these defects,we propose a stratified pseudo-label clustering(SPC) method,which aims to train and refine the classification model using both structure and pseudo-labels information.We first introduce three assumptions for designing of deep clustering methods,i.e.,local smoothing assumption,self-training assumption,and low-density separation assumption.The me-thod consists of two stages:1)manifold based consistency learning,which is used to initialize the classification model in the trai-ning stage;and 2)stratified pseudo-label based model tefinement,which generates stratified pseudo-labels to improve the robustness of the clustering model.We first generate a strong pseudo-label dataset and a weak pseudo-label dataset with a threshold parameter,and then propose a label-propagation method and a mix-up method to improve the weak pseudo-label dataset.Finally,we use both strong pseudo-label dataset and weak pseudo-label dataset to refine the clustering model.Compared with the best baseline,the averaged ACC of SPC improves by 7.6% and 5.0% on STL10 and CIFAR100-20 benchmark datasets,respectively.
    Reference | Related Articles | Metrics
    Study on Volume Cloud Simulation Based on Weather Data and Multi-noise Fusion
    LU Chunhai, XU Xinhai, ZHANG Shuai, LI Hao
    Computer Science    2023, 50 (6): 236-242.   DOI: 10.11896/jsjkx.220500070
    Abstract246)      PDF(pc) (3009KB)(290)       Save
    In order to build a realistic simulation environment in the smart drone swarm simulation system,it is necessary to consider modeling and rendering clouds based on weather data.At present,cloud simulations based on real weather data generally adopt physical modeling methods,such as solving NS equations and particle system,which are burdened by heavy calculus equation solving tasks,so they have the disadvantages of large computational volume and inability to achieve real-time simulation in large-scale scenarios.Aiming at this problem,a method of modeling volumetric cloud is proposed.Firstly,weather data is used to generate a texture,then combined with the height dependent functions to define changes in the shape and density of clouds in height,and finally the cloud is modeled in combination with multi-noise.So the weather data is effectively combined with non-physical modeling methods.In the rendering,the color and transparency of each sample point are calculated by using raymarching algorithm to accumulate the density of the cloud from the line of sight direction and the sun,in combination with the law of light absorption and scattering,and finally the cloud is drawn.Experiments show that the simulated volumetric clouds are consistent with the cloud information in the weather data,highly efficient,and close to the real cloud in terms of shape and color.
    Reference | Related Articles | Metrics
      First page | Prev page | Next page | Last page Page 1 of 2, 41 records