Started in January,1974(Monthly)
Supervised and Sponsored by Chongqing Southwest Information Co., Ltd.
ISSN 1002-137X
CN 50-1075/TP
CODEN JKIEBK
Editors
    Content of Image Processing & Multimedia Technology in our journal
        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Motion Contrast Enhancement-based Crowd Motion Segmentation Method
    ZHANG Xinfeng, NI Qili, CHEN Shuhan, YANG Baoqing, LI Bin
    Computer Science    2023, 50 (6A): 211200205-7.   DOI: 10.11896/jsjkx.211200205
    Abstract195)      PDF(pc) (4363KB)(259)       Save
    In surveillance videos of public places,the movement states of the crowds are various and complex,and it is difficult to analyze the movement state of the whole crowd through detecting or segmenting every individual.Therefore,it is an effective way to understand and analyze the movement state of the crowd by dividing the crowd into areas with basically the same movement state.Supervised crowd motion segmentation methods require pixel-level training sets with high labeling costs,and thus unsupervised clustering methods are more promising for crowd motion segmentation.However,since the local features describing crowd movements usually change gradually,leading to the unsupervised methods based on clustering algorithm need to choose different parameters for different crowd scenarios,it is difficult to adapt to a variety of different application scenarios.To this end,this paper proposes a motion contrast improvement-based crowd motion segmentation method.The method is an unsupervised model that first enhances the contrast of different motion states based on the distribution law of movement and noise in the motion field,and then combines the adaptive threshold segmentation algorithm and the marker watershed algorithm to extract the essentially consistent region for each motion state,avoiding the difficulty of parameter selection for unsupervised clustering methods.Based on the results of crowd motion segmentation,this paper presents an energy model to describe the stability of crowd movement.The energy model can enable early warning of abnormal crowd motion state by deducing the change process of the whole crowd motion state.Experiments are conducted on crowd motion segmentation in different types of complex crowd motion state scenes.Experimental results verify the effectiveness and segmentation accuracy of the motion contrast enhancement-based crowd motion segmentation method and the validity of the proposed energy model.
    Reference | Related Articles | Metrics
    Fusion Multi-feature Fuzzy Model for Target Recognition and Its Application
    RUAN Wang, HAO Guosheng, WANG Xia, HU Xiaoting, YANG Zihao
    Computer Science    2023, 50 (6A): 220100138-7.   DOI: 10.11896/jsjkx.220100138
    Abstract276)      PDF(pc) (3395KB)(228)       Save
    In natural recognition scenes,image features are often characterized by complexity,diversity and fuzziness,and lack of consideration of the relationship between features when using multiple features for image recognition,a target recognition fuzzy model integrating multiple image features is proposed.Firstly,the image feature is extracted,the value of the feature is taken as the fuzzy set of the model,and the corresponding membership function is given.Secondly,the evaluation index of the model is gi-ven,and the feasibility of the model is demonstrated according to the index.Thirdly,particle swarm optimization algorithm is used to optimize the parameters of membership function of image features.Finally,the target recognition algorithm based on feature fusion fuzzy model is proposed,which is applied to filling-mark recognition and the hot rolled strip surface defect recognition.Experimental results show that the designed model performs well under the evaluation index,and the algorithm significantly improves the accuracy and robustness of target recognition and the rationality of feature fusion.
    Reference | Related Articles | Metrics
    Remote Sensing Image Classification Based on Improved ResNeXt Network Structure
    YANG Xing, SONG Lingling, WANG Shihui
    Computer Science    2023, 50 (6A): 220100158-6.   DOI: 10.11896/jsjkx.220100158
    Abstract288)      PDF(pc) (3294KB)(322)       Save
    Remote sensing image classification is one of the key directions of remote sensing image information processing,and its classification accuracy greatly limits the overall development of remote sensing technology.Traditional machine learning algorithms and model structures have the disadvantages that they cannot quickly extract feature maps from remote sensing images,and the classification results are not accurate enough.Aiming at this problem,an improved model based on the ResNeXt network model combined with the attention mechanism is proposed to replace the fully connected layer model with the optimized SVM(support vector machine) algorithm.Firstly,it introduces the attention mechanism in computer vision,assigns different weights to different features,improves the ability to extract effective information for the classification part of the image,then combines the ResNeXt network,and finally replaces the end of the convolutional neural network with the optimized SVM algorithm.The fully connected layer is used to improve the classification effect,and at the same time optimize the network performance without increasing the hyperparameters of the model as a whole.Experimental results of the network model on the data set AID show that the improved network model has a significant improvement in the ability to extract deep features,and the optimized network mo-del has a better classification effect for multi-classification tasks.
    Reference | Related Articles | Metrics
    Font Transfer Based on Glyph Perception and Attentive Normalization
    LYU Wenrui, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua
    Computer Science    2023, 50 (6A): 220100205-6.   DOI: 10.11896/jsjkx.220100205
    Abstract194)      PDF(pc) (4438KB)(272)       Save
    The style transfer of font is a very challenging task,and its aim is to transfer the target font to the source font through a certain mapping method,so that it can realize the conversion of fonts.Existing methods in glyph transfer are limited in robustness,it highlights the poor maintenance of the structural integrity of the generated fonts.None of these methods can get satisfactory results,especially with the presence of a huge difference among different glyph styles.To address this problem,an end-to-end font transfer network framework model is proposed,and the attentive normalization is introduced in the model to better extract the high-level semantic features of the font images,thus improving the quality of the generated images.Additionally feature fusion is performed using adaptive instance normalization for font transformation.In terms of maintaining the integrity of the glyph structure,the perception loss and context loss are designed to constrain the generation of the glyph structure.A regularization term is added to the design of the adversarial loss function to stabilize the training of GAN.To verify the validity of the model,experiment is trained and tested in multiple sets using publicly available datasets in FET-GAN,and compared with the latest methods in FET-GAN,CycleGAN and StarGANv2.It is experimentally verified that the model is able to achieve mutual transfer of fonts between a given number of font domains,and both its transfer effect and model generalization ability have some advantages compared with the latest work.
    Reference | Related Articles | Metrics
    Study on Phased Target Detection in CT Image
    WANG Xiaotian, LI Bo, KANG Xiaodong, LIU Hanqing, HAN Junling, YANG Jingyi
    Computer Science    2023, 50 (6A): 220200063-10.   DOI: 10.11896/jsjkx.220200063
    Abstract281)      PDF(pc) (6295KB)(236)       Save
    CT is one of the most commonly used imaging examinations in clinic,and the computer-aided diagnosis of CT images has important clinical significance.In order to optimize target detection in CT images,eight different target detection algorithms are used to detect hepatic hemangioma enhanced CT images,cerebral artery stenosis CTA images and colonic polyp CT images,and the applicability of different algorithms are compared.Firstly,the enhanced CT images of hepatic hemangioma,CTA images of cerebral artery stenosis and CT images of colonic polyps are labeled and datasets are made.Secondly,different parameter optimization algorithms are used,and AP-epoch and AP-FPS curves are drawn to compare the detection performance of different algorithms.Experimental results show that the AP,AP50,AP75 and Recall of PPYOLOv2 are optimal in different data sets,the prediction boundary box is close to the target to be tested,the prediction confidence is high,and it has good generalization ability and robustness.
    Reference | Related Articles | Metrics
    Real-time Detection of Motorcycle Lanes Based on Deep Learning
    WAN Haibo, JIANG Lei, WANG Xiao
    Computer Science    2023, 50 (6A): 220200066-5.   DOI: 10.11896/jsjkx.220200066
    Abstract120)      PDF(pc) (3255KB)(276)       Save
    Motorcycle driving is more dangerous than other driving styles but lacks effective driving assistance systems,such as lane assist systems,obstacle detection,pre-collision system,etc.The position of the lane line when driving is often used for determining whether the motorcycle has deviated.Therefore,lane line detection is very important for developing assisted driving systems,so this paper proposes a real-time detection algorithm for motorcycle lanes based on deep learning.This paper proposes three improvements based on the Lanenet architecture:1) using the absolute position of the lane coordinates as the input feature;2)using the K-means algorithm instead of the Mean-Shift algorithm;3) removing the H-net structure.Due to the lack of public motorcycle lane data sets,the collected motorcycle lane data will be used to fit the model in this paper.Experimental results prove the effectiveness of the proposed algorithm.The detection speed can reach 47.6fps,and the cross-combination ratio can reach 0.71560.Compared with the algorithm in reference [3],the accuracy improves by 15.5% and the speed improves by 53.3%.
    Reference | Related Articles | Metrics
    Person Re-identification Method Based on Progressive Attention Pyramid
    ZHANG Shuaiyu, PENG Li, DAI Feifei
    Computer Science    2023, 50 (6A): 220200084-8.   DOI: 10.11896/jsjkx.220200084
    Abstract153)      PDF(pc) (3632KB)(229)       Save
    Aiming at the problem that the existing person re-identification algorithms do not fully extract person features,resulting in low accuracy of the algorithm in scenes such as person occlusion and posture change,a person re-identification method based on progressive attention pyramid is proposed.This method designs a progressive feature pyramid structure based on the attention mechanism,embeds the channel and spatial attention modules into the feature pyramid structure,and applies them to the channel and spatial dimensions of the feature.Channel attention pyramid aggregates the noteworthy features in different channel dimensions at each level of the backbone network,and the spatial attention pyramid extracts the noteworthy features in different spatial dimensions.Each level of the pyramid follows the principle of “split-attend-concat”,and continuously learns the person feature map under different segmentation levels from the bottom up.Attention allows the network to fully mine key features from different channel dimensions and different spatial dimensions.At the same time,the multi-level feature alignment is realized through the cascade structure and deformable convolution,which further improves the re-identification accuracy of the model.In this paper,the method is tested on two mainstream datasets,Market-1501 and DukeMTMC-reID,respectively.Experimental results show that this method can allow the model to focus on richer person features.Compared with the baseline network,the Rank-1 index of the model increases by 3.2% and 5.8%,and the mAP index increases by 6.8% and 6.6%,respectively.
    Reference | Related Articles | Metrics
    Study on BGA Packaging Void Rate Detection Based on Active Learning and U-Net++ Segmentation
    QI Xuanlong, CHEN Hongyang, ZHAO Wenbing, ZHAO Di, GAO Jingyang
    Computer Science    2023, 50 (6A): 220200092-6.   DOI: 10.11896/jsjkx.220200092
    Abstract320)      PDF(pc) (2066KB)(230)       Save
    Bump void is one of the most common physical defects in BGA packaging,which may lead to electrical failures and shortened lifetime.At present,the commonly used quality inspection is based on manual check on X-ray images,which has low accuracy and high time consumption.Therefore,automated chip detection methods based on deep learning draws increasing attention in industry.This paper proposes an active learning and U-Net++based void rate detection network.Based on active lear-ning,we apply equidistant partition for the whole dataset.For each sub-dataset,we take training-prediction-labeling-extension as pattern to optimize U-Net++network.The average dice coefficient on separated model sets reaches 80.99% on test set,while the overall accuracy rate reaches 94.89%.We innovatively apply active learning in in-line defect detection,and the result shows that,it can effectively enhance the labeling standard of data and model’s division precision.
    Reference | Related Articles | Metrics
    Defect Detection of Transmission Line Bolt Based on Region Attention Mechanism andMulti-scale Feature Fusion
    WU Liuchen, ZHANG Hui, LIU Jiaxuan, ZHAO Chenyang
    Computer Science    2023, 50 (6A): 220200096-7.   DOI: 10.11896/jsjkx.220200096
    Abstract238)      PDF(pc) (5560KB)(286)       Save
    Bolts play a role in fixing the connection between lines in transmission lines.Once loose or detached,it may cause po-wer transmission failures and cause large-scale power outages.Obviously,regular inspection of bolts in transmission lines is essential to ensure the safety and stability of the entire power system.Most of the existing detection methods are based on deep convolutional neural networks.However,the unobvious features and small size of the bolts pose a challenge to the detection work.Aiming at the above problems,this paper proposes a bolt defect detection method for transmission lines based on region attention mechanism and multi-scale feature fusion.Firstly,a region attention module suitable for object detection is proposed,which is embedded in the residual block of ResNet50 to enhance the network’s feature extraction for bolts.Secondly,based on the feature pyramid networks(FPN),a bottom-up path is extended,and shallow features are fully utilized to improve the detection accuracy of small objects.Finally,in order to alleviate the imbalance between samples,the PrIme Sample Attention(PISA) soft sample sampling strategy is introduced.Experimental results show that the proposed method achieves a mean average precision(mAP) of 74.3% and an average recall(AR) of 86.4% with a detection speed of 8.2 FPS when detecting transmission line bolts.Compared with other detection networks,the proposed method improves the detection accuracy of bolt defects without sacrificing too much detection speed.
    Reference | Related Articles | Metrics
    Multimodal MRI Brain Tumor Segmentation Based on Multi-encoder Architecture
    DAI Tianhong, SONG Jieqi
    Computer Science    2023, 50 (6A): 220200108-6.   DOI: 10.11896/jsjkx.220200108
    Abstract264)      PDF(pc) (3214KB)(300)       Save
    Glioma is a primary tumor originating from glial cells in the brain,accounting for about 45% of all intracranial tumors.Accurate segmentation of brain tumor in magnetic resonance imaging(MRI) images is of great clinical significance.In this paper,an automatic brain tumor segmentation method based on multi-encoder architecture is proposed.The model adopts a U-shaped network structure which expands the single contracting path into multiple paths to deeply exploit semantic information of diffe-rent modalities.In order to obtain the multiscale features of images,an inception module combined with dilated convolution is designed as the basic convolutional layer;a lightweight attention mechanism known as efficient channel attention(ECA) block is then introduced into the bottleneck layer and the decoder,so that the model pays more attention to the segmentation-related information and ignores the redundancy of the channel dimension,thereby further improving the segmentation results.Using the Brain Tumor Segmentation Challenge 2018(BraTS 2018) dataset for verification,the proposed model gets average Dice coefficientvalues of 0.880,0.784,and 0.757 for the whole tumor,tumor core and enhancing tumor respectively.Experiment results show that the proposed method achieves accurate and effective multimodal MRI brain tumor segmentation.
    Reference | Related Articles | Metrics
    Image Retrieval Based on Independent Attention Mechanism
    ZHANG Shunyao, LI Huawang, ZHANG Yonghe, WANG Xinyu, DING Guopeng
    Computer Science    2023, 50 (6A): 220300092-6.   DOI: 10.11896/jsjkx.220300092
    Abstract361)      PDF(pc) (3935KB)(356)       Save
    In recent years,deep learning methods has taken a dominant position in the field of content-based image retrieval.To improve features extracted by off-the-shelf backbones and enable the network produce more discriminative image descriptors,the attention module ICSA(independent channel-wise and spatial attention),which is independent with features input into the mo-dule,is proposed.Attention weights of the proposed module keeps the same when input features change,while attention weights are usually computed with input features in other attention mechanisms,which is a main difference between ICSA and other attention modules.This feature also enables the module to be quite small(only 6.7kB,5.2% the size of SENet,2.6% of the size of CBAM) and relatively fast(similar with SENet in speed and 14.9% the time of CBAM).The attention of ICSA is divided as two parts:channel-wise and spatial attention,and they store the weights along orthogonal directions.Experiments on Pittsburgh shows that ICSA made improvement from 0.1% to 2.4% at Recall@1 when with different backbones.
    Reference | Related Articles | Metrics
    Review of Research on Denoising Algorithms of ECG Signal
    HOU Yanrong, LIU Ruixia, SHU Minglei, CHEN Changfang, SHAN Ke
    Computer Science    2023, 50 (6A): 220300094-11.   DOI: 10.11896/jsjkx.220300094
    Abstract184)      PDF(pc) (2774KB)(593)       Save
    One of the most common signal processing problems with the electrocardiogram(ECG),an important indicator for identifying cardiac abnormalities in humans,is the elimination of unwanted noise.These noises can distort the clean signal,which can affect the diagnosis and analysis of the human heart.This paper reviews five different frameworks of ECG signal denoising techniques and the latest research results within these frameworks,and finally summarizes the best noise reduction models in last five years and compares them by performance evaluation criteria such as signal-to-noise ratio.The comparison shows that the deep learning models show good performance in ECG denoising,whether based on single noise or composite noise.Finally,the problems with the current denoising model are discussed and an outlook on the next step of the research is given.
    Reference | Related Articles | Metrics
    Pavement Crack Detection Based on Attention Mechanism and Deformable Convolution
    LONG Tao, DONG Anguo, LIU Laijun
    Computer Science    2023, 50 (6A): 220300214-6.   DOI: 10.11896/jsjkx.220300214
    Abstract301)      PDF(pc) (4060KB)(288)       Save
    Aiming at the pavement crack detection problem under complex background,due to the unsatisfactory detection effect of image segmentation algorithm based on deep learning,and the imbalance of pixel categories in the crack image itself,this paper proposes a pavement crack detection network based on attention mechanism and deformable convolution,which is constructed based on encoder-decoder structure.In order to solve the problem of difficult crack detection in complex background,firstly,deformable convolutional is used to improve the learning ability of linear features of cracks with different shapes.Secondly,the dense connection mechanism is used to strengthen the feature information.Then,in the decoder stage,the feature fusion of transpose convolution and bridge are adopted,and the multi-stage feature fusion is combined to improve the detection accuracy of the network.Finally,the attention module(SimAM) is introduced to pay more attention to the extraction of target features and suppress background features without increasing network parameters.Experiments are carried out on two open crack datasets to ve-rify the effectiveness of the algorithm.The experimental results show that the performance evaluation criteria of the algorithm are better than the comparison algorithms.The mean pixel accuracy and mean intersection over union of the BCrack dataset reached 92.12% and 84.79%,respectively.The mean pixel accuracy and mean intersection over union of the CFD dataset reached 91.02% and 74.75%,respectively.The average accuracy and average intersection ratio of CFD data set is 91.02% and 74.75%,respectively.The algorithm performs well in crack detection under complex background,and can be applied to pavement maintenance engineering.
    Reference | Related Articles | Metrics
    Fabric Defect Detection Algorithm Based on Improved Cascade R-CNN
    BAI Mingli, WANG Mingwen
    Computer Science    2023, 50 (6A): 220300224-6.   DOI: 10.11896/jsjkx.220300224
    Abstract137)      PDF(pc) (3656KB)(268)       Save
    Automatic detection of fabric defects is a difficult problem in textile industry.To solve the problem that the current fabric defect detection algorithms have unsatisfactory detection effect on samples with large scale and aspect ratio changes and numerous small targets,a fabric defect detection algorithm based on improved Cascade R-CNN network is proposed.The main improvements are as follows.Firstly,deformable convolution is incorporated into the feature extraction network ResNet-50 to extract more shape and scale features of defects adaptively.Secondly,balanced feature pyramid is introduced in the feature pyramid network before sampling to narrow the semantic gap between each feature layer before feature fusion and get more expressive multi-scale features.Then,more suitable initial anchor boxes are redesigned according to the scale and aspect ratio of defects.Finally,GIoU Loss with scale invariance is used as the regression loss of cascade detector to obtain more accurate defect prediction boundary boxes.Experimental results show that compared with the algorithm based on Cascade R-CNN,the improved Cascade R-CNN algorithm significantly improves the average precision of fabric defect detection.
    Reference | Related Articles | Metrics
    Attentional Feature Fusion Approach for Siamese Network Based Object Tracking
    LUO Huilan, LONG Jun, LIANG Miaomiao
    Computer Science    2023, 50 (6A): 220300237-9.   DOI: 10.11896/jsjkx.220300237
    Abstract143)      PDF(pc) (4818KB)(206)       Save
    In order to solve the problem of tracking drift due to target occlusion and tracking failure due to background interfe-rence during target tracking,this paper proposes a siamese network-based object tracking method with multi-feature integration,where feature fusion and attention mechanism are introduced to build multiple region-proposal-network based tracking modules.Firstly,two adjacent residual block are squeeze-and-excitation and then effectively fused,as a way to strengthen the feature information.Secondly,the parallel convolution attention module is used to filter the interference information contained in the channel information and spatial information.Finally,an algorithm similar to ensemble learning is proposed by constructing two different trackers,which receive deep semantic features and the aforementioned fused features,respectively,and weight them and train for the final object tracking.In addition,to verify the effectiveness of the algorithm,this paper also investigates the effects of diverse fusion schemes,different training weights to each tracker and the combination ways of the modules in the proposed model.Experi-mental results on the VOT2016 and VOT2018 datasets show that the proposed multi-feature integration method can effectively improve the robustness of the object tracking compared with other siamese network-based object tracking algorithms,while ensuring high accuracy.
    Reference | Related Articles | Metrics
    Endoscopic Image Enhancement Algorithm Based on Luminance Correction and Fusion Channel Prior
    AN Ziheng, XU Chao, FENG Bo, HAN Jubao
    Computer Science    2023, 50 (6A): 220300265-7.   DOI: 10.11896/jsjkx.220300265
    Abstract225)      PDF(pc) (3367KB)(298)       Save
    In order to solve the problems of uneven illumination,blurred blood vessels in submucosal tissue,and low contrast in medical endoscopic images,a novel endoscopic image enhancement algorithm is proposed in this paper.The method is divided into two parts.The first part uses a method based on quadrant clipping histogram gamma correction to achieve brightness enhancement.In this part,the histogram of the brightness channel is first divided into quadrant clipping to obtain a smooth cumulative distribution function(CDF),and then use the truncated CDF way to control the size of the gamma parameter.The second part enhances the contrast and sharpness of the image based on the fusion channel prior.This part first uses discrete wavelet transform to fuse the green channel and red channel of the image to obtain a layer with rich details,which is used to generate the initial transmission map of the Image Formation Model(IFM).After that,the initial transmission image is corrected by the proposed ideal function model,and a clear image is obtained.finally,the contrast enhancement of tissue and blood vessels is realized by combining with CLAHE.The method and several other existing methods are analyzed subjectively and objectively on the MEDS dataset built by the laboratory.The results show that the proposed method can improve the contrast of blood vessels and tissues while avoiding excessive image enhancement.
    Reference | Related Articles | Metrics
    Maximum Overlap Single Target Tracking Algorithm Based on Attention Mechanism
    SUN Kaiwei, WANG Zhihao, LIU Hu, RAN Xue
    Computer Science    2023, 50 (6A): 220400023-5.   DOI: 10.11896/jsjkx.220400023
    Abstract144)      PDF(pc) (3025KB)(201)       Save
    With the development of artificial intelligence,deep learning has attracted extensive attention in the research of computer vision.In the field of single target tracking,the single target tracking algorithm based on deep learning has been studied.The algorithm complexity of deep learning algorithm is relatively high.The complete segmentation of target classification and target state estimation is conducive to the in-depth discussion of each task.However,the current single target tracking algorithm can not deal with the complex tracking environment well.When the model encounters the complex tracking environment,it often tracks a certain area of the background or tracks the surrounding similar targets.In order to solve the above problems.In this paper,a method is proposed:different attention mechanisms are added to the task of target classification and target state estimation respectively,so that the model can better deal with background confusion and occlusion of similar targets.In order to verify the effectiveness of the above methods,this paper has done a lot of comparative experiments on multiple datasets,and compared with the previous single target tracking algorithm based on deep learning.The proposed algorithm improves 3.1% in the EAO index and 2.3% in the Robustness index.It shows the effectiveness and progressiveness of this method.
    Reference | Related Articles | Metrics
    Robot Visual Inertial Optical Measurement Method Based on Improved PL-VIO
    WANG Haifang, LI Mingfei, LI Guangyu, CUI Yangyang
    Computer Science    2023, 50 (6A): 220400171-5.   DOI: 10.11896/jsjkx.220400171
    Abstract253)      PDF(pc) (3220KB)(220)       Save
    An improved point-line vision inertial measurement algorithm(PL-VIO) is proposed to solve the problem of numerous inertial measurement and visual track identification and imprecise image pose and edge accuracy in map object pose recognition.In the front end of vision,the sub-pixel edge extraction method is used to iterate and improve the accuracy of image edge corners,and the edge constraints are applied to the extracted corners to prevent sub-pixel edge detection from crossing the boundary.In order to improve the extraction accuracy and reduce the repeated extraction of line features at the visual backend,the line features and point features extracted by LSD are extracted and optimized.After SFM,the extracted line features are combined and redundant lines are deleted.Experiments are carried out using EuRoc data set based on ROS platform,and the obtained experimental data are imported into Evo.Evo is used to analyze and plot the experimental data,and the error parameters are evaluated.The overall reduction of error parameters in the experimental results verified the superiority and accuracy of the improved PL-VIO algorithm.
    Reference | Related Articles | Metrics
    Cross-dataset Learning Combining Multi-object Tracking and Human Pose Estimation
    ZENG Zehua, LUO Huilan
    Computer Science    2023, 50 (6A): 220400199-7.   DOI: 10.11896/jsjkx.220400199
    Abstract176)      PDF(pc) (4068KB)(250)       Save
    In recent years,multi-object tracking has gained significant progress,especially for pedestrians.By performing joint pose estimation on pedestrians,it is possible to improve the motion prediction of pedestrians by multi-object tracking algorithms,while providing more information for higher-order tasks such as autonomous driving.However,in the current multi-object tra-cking dataset containing human pose estimation labels,the video length is short and the targets are sparse,limits the research of multi-object tracking.In the paper,cross-dataset learning is performed using the multi-object tracking dataset MOT17 and the multi-human pose estimation dataset COCO with more pedestrians.The performance of the multi-object tracking algorithm under joint human pose estimation is effectively improved based on a round-robin training strategy.The use of simultaneous polarized self-attention down-sampling and attention up-sampling enhances the human pose estimation performance of the algorithm while improving the algorithm training speed.
    Reference | Related Articles | Metrics
    Review of 3D Target Detection Methods Based on LiDAR Point Clouds
    QIN Jing, WANG Weibin, ZOU Qijie, WANG Zumin, JI Changqing
    Computer Science    2023, 50 (6A): 220400214-7.   DOI: 10.11896/jsjkx.220400214
    Abstract366)      PDF(pc) (2652KB)(390)       Save
    In recent years,3D target detection using LiDAR point cloud is a research hotspot in the field of computer vision and has attracted much attention in the field of autonomous driving.Compared with 2D,3D combines depth information to better reflect the characteristics of the real world,to effectively solve practical problems such as path planning,motion prediction,target detection,and other aspects.This paper introduces the development background of 3D target detection,summarizes the flow of 3D target detection framework based on LiDAR point cloud data,compares several common data sets containing point cloud information,and classifies the main research methods.The performance and limitations of different methods are analyzed and compared.Finally,the current technical difficulties are summarized and the future development prospects of this field are forecasted.
    Reference | Related Articles | Metrics
    Ultrasonic Image Segmentation Based on SegFormer
    YANG Jingyi, LI Fang, KANG Xiaodong, WANG Xiaotian, LIU Hanqing, HAN Junling
    Computer Science    2023, 50 (6A): 220400273-6.   DOI: 10.11896/jsjkx.220400273
    Abstract154)      PDF(pc) (3444KB)(332)       Save
    Ultrasonic image segmentation is not only an important part of medical image processing,but also a common technical means of clinical diagnosis.In this paper,the SegFormer network model is proposed to realize the accurate segmentation of medical ultrasound images.On the one hand,the ultrasonic label image is transformed into a single channel and processed by binarization to complete the preprocessing of the data set image;on the other hand,the pre-training model is loaded into the pre-training model to fine-tune the trained model parameters,and a random gradient descent optimizer with momentum is selected to accelerate the convergence speed and reduce the oscillation.Experimental results show that,compared with FCN,UNet and DeepLabV3,all the evaluation indexes of the proposed model are the best in the breast nodule ultrasound image data set,and the evaluation indexes of mIoU,Acc,DSC and Kappa is 81.32%,96.22%,88.91% and 77.85% respectively.The experimental results also show that the model is robust in different ultrasonic image data sets.
    Reference | Related Articles | Metrics
    Semi-supervised Semantic Segmentation for High-resolution Remote Sensing Images Based on DataFusion
    GU Yuhang, HAO Jie, CHEN Bing
    Computer Science    2023, 50 (6A): 220500001-6.   DOI: 10.11896/jsjkx.220500001
    Abstract157)      PDF(pc) (3623KB)(274)       Save
    Due to the need for pixel-wise annotation,semantic segmentation usually requires higher labor costs than tasks such as classification and object recognition.Especially in land classification based on high-resolution remote sensing images,complex backgrounds and dense targets make semantic annotation intolerably expensive,which seriously limits the practicability of semantic segmentation algorithms.In addition,although traditional semi/weak supervised learning methods can effectively reduce trai-ning costs,it is difficult to have high application value for the low quality of the segmentation results.In order to solve the above two pain points,this paper proposes a new semi-supervised semantic segmentation model using a self-correcting fusion strategy.By introducing data fusion technology and self-correction mechanism,the dependence of the segmentation model on pixel-wise annotation can be effectively reduced.Our method obtains mean F1-scores of 86.5% and 81.7% on Potsdam and Vaihingen datasets with only 15% pixel-wise annotation.Experimental results show that the proposed model can greatly reduce the cost of training process,and achieve high-quality segmentation results comparable to fully-supervised prediction.
    Reference | Related Articles | Metrics
    Electiric Bike Helment Wearing Detection Alogrithm Based on Improved YOLOv5
    XIE Puxuan, CUI Jinrong, ZHAO Min
    Computer Science    2023, 50 (6A): 220500005-6.   DOI: 10.11896/jsjkx.220500005
    Abstract450)      PDF(pc) (2835KB)(535)       Save
    In electric vehicle traffic accidents,craniocerebral injury is the main cause of death of electric vehicle riders,and most electric vehicle riders rarely wear helmets.Therefore,it is of strong practical significance to supervise the helmet wearing situation of electric vehicle riders by combining the target detection algorithm with road cameras.For the current problems of electric vehicle helmet wearing detection:the high leakage rate of targets blocking each other,and the high leakage rate of smaller targets,this paper proposes an improved YOLOv5 target detection algorithm to achieve the detection of electric vehicle helmet wearing.The method first adds the channel attention mechanism ECA-Net to the YOLOv5 network,so that the model can detect the target features,thus improving the model detection performance;the Bi-FPN weighted bidirectional feature pyramid module is used toachieve a balance of the importance of features at different levels,which is conducive to improving the small target miss detection problem;the loss function of Alpha-CIoU Loss is used to improve the accuracy of model localization.Experimental results show that the detection accuracy of the method is higher than other models for the helmet wearing situation of electric vehicle riders in all three scenarios,with an average accuracy of 95.8%,which is higher than the original network detection accuracy,and achieves high accuracy detection of electric vehicle helmet wearing situation.
    Reference | Related Articles | Metrics
    Remote Sensing Image Change Detection of Construction Land Based on Siamese AttentionNetwork
    LI Tao, WANG Hairui
    Computer Science    2023, 50 (6A): 220500040-5.   DOI: 10.11896/jsjkx.220500040
    Abstract228)      PDF(pc) (2730KB)(256)       Save
    Aiming at the problems of under segmentation or over segmentation and rough edge segmentation in the process of urban construction land change detection using traditional semantic segmentation network,this paper proposes a high-resolution remote sensing image change detection method based on twin attention network.In the coding part,twin neural network is used for feature acquisition to retain more image features of different phases.In the deep coding stage,the hole convolution feature pyramid is introduced to realize the extraction and fusion of multi-scale features and increase the receptive field of the network.In the decoding part,the attention mechanism CBAM is used to highlight the useful features and enhance the useful information to improve the accuracy of edge segmentation.Finally,experiment is carried out on the data set of land use change in Loudi City.Experiment shows that the accuracy rate of this method is 92.56%,the accuracy rate is 89.15%,the recall rate is 85.61%,the IOU is 77.53%,the Miou is 83.76%,the F1 score is 87.34%,and the kappa coefficient is 31.42% on the land use change detection data set of Loudi city.The performance index is better than FCN network,u-net network and CBAM u-net network.Experimental results show that this method can effectively solve the problems of under segmentation or over segmentation of change detection results and rough edge segmentation.
    Reference | Related Articles | Metrics
    Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism
    LI Fan, JIA Dongli, YAO Yumin, TU Jun
    Computer Science    2023, 50 (6A): 220500104-5.   DOI: 10.11896/jsjkx.220500104
    Abstract129)      PDF(pc) (2357KB)(263)       Save
    Few shot learning is proposed to solve the problem of small size of data set required for model learning or high cost of data annotation in deep learning.Image classification has always been an important research content in the research field,and there may be insufficient annotation data.In view of the lack of image annotation data,researchers have put forward many solutions,one of which is to classify small sample images by using graph neural network.In order to better play the role of graph neural network in the field of small sample learning,aiming at the unstable situation of graph neural network convolution operation,residual graph convolution network is used to improve the graph neural network,and residual graph convolution network is designed to improve the stability of graph neural network.Based on the convolutional network of residual graph,the self-attention mechanism of residual graph is designed in combination with the self-attention mechanism,and the relationship between nodes is deeply mined to improve the efficiency of information transmission and improve the classification accuracy of the classification model.After testing,the training efficiency of the improved Res-GNN is improved.The classification accuracy in 5way-1shot task is 1.1% higher than that of GNN model,and 1.42% higher than that of GNN model in 5way-5shot task.In the 5way-1shot task,the classification accuracy of ResAT-GNN is 1.62% higher than that of GNN model.
    Reference | Related Articles | Metrics
    Overview of Application of Virtual Reality in Sports Simulation:New Developments Since 2003
    JI Qingge, CHEN Haodong, HE Suishen, ZHU Yonglin, ZHU Jiefu, ZHANG Huankai
    Computer Science    2023, 50 (6A): 220500168-10.   DOI: 10.11896/jsjkx.220500168
    Abstract163)      PDF(pc) (2747KB)(308)       Save
    Many sports are difficult to carry out due to the limitations of venues,weather,economy and other factors.With the rapid development of computer technology,virtual reality technology is widely used in sports,aiming to break through the above limitations.This paper introduces the technology of the combination of virtual reality and sports simulation from 2003 to now,and classifies it into competitive sports simulation,entertainment sports simulation,medical sports simulation and sports related scene simulation from the point of view of sports type.From the examples of sports simulation in recent years,it can be found that the sports simulation based on VR technology is not limited to the field of competitive sports,and is constantly developing in the direction of mass entertainment and interdisciplinary.VR based entertainment sports simulation and medical sports simulation are more mature,which can bring the public a VR experience closer to daily life.Although sports related scene simulation is a relatively new field,due to its commercial characteristics,the applications of virtual advertising and event system develop rapidly.Finally,a new outlook on the future of virtual reality technology in sports simulation is presented.
    Reference | Related Articles | Metrics
    Target Detection Algorithm Based on Compound Scaling Deep Iterative CNN by RegressionConverging and Scaling Mixture
    WANG Guogang, WU Yan, LIU Yibo
    Computer Science    2023, 50 (6A): 220500230-9.   DOI: 10.11896/jsjkx.220500230
    Abstract138)      PDF(pc) (4141KB)(209)       Save
    A novel algorithm named as target detection algorithm based on compound scaling deep iterative CNN by regression converging and scaling mixture is proposed to avoid the disadvantages of low robustness,label marginalization and poor convergence performance of the regression loss function in the EfficientDet algorithm.After utilizing the 2×2 scaling mixture regularization strategy to enhance the training samples,the proposed method avoids the over fitting and improves the generalization ability of the model.The convergence speed,the positioning accuracy and the CNN regression accuracy are improved,since the aspect ratio and the center distance are taken into account in the penalty items of the CIOU loss function that can predict the bounding frame coordinate and suppress the redundant boxes.The proposed method improves the label fault tolerance rate because the cross entropy loss with label smoothing for class is established after generating the label smoothing regularization distribution,which is a weighted sum of the marginal label distribution and the uniform distribution by setting the smoothing parameter.Experiments are performed on the PASCAL VOC 2007 and 2012 datasets,and the results show that while the number of the network model parameters remain unchanged,the mean average precision of the proposed algorithm reaches 88.31 %,which is 3.29% higher than that of the original network(EfficientDet-D2,84.12%).Compared with YOLOv4,YOLOv3,SSD,Faster R-CNN and Fast R-CNN,the mean average precision increases by 5.2%,10.71 %,14.01%,15.11% and 18.30 %,respectively,and the number of network model parameters is reduced by 55.94×106,52.91×106,16.09×106,55.18×106 and 53.11×106,respectively.Not only the algorithm improves the detection accuracy and the F1 score,but also it takes 0.73 s to detect each test image,which meets the real-time requirements during the detecting phase.
    Reference | Related Articles | Metrics
    Cardiac MRI Image Segmentation Based on Faster R-CNN and U-net
    HAN Junling, LI Bo, KANG Xiaodong, YANG Jingyi, LIU Hanqing, WANG Xiaotian
    Computer Science    2023, 50 (6A): 220600047-9.   DOI: 10.11896/jsjkx.220600047
    Abstract281)      PDF(pc) (3612KB)(305)       Save
    In order to solve the problem that the segmentation accuracy of the existing MRI neural network is reduced due to the diversity of input image information.An MRI image segmentation method using Faster R-CNN and U-net mechanism is proposed.Selecting the public cardiac MRI segmentation challenge datasets ACDC and SCD,cleaning and modifing the format of the dataset and sending them to the subsequent neural network.First,Faster R-CNN is applied to target image detection to preprocess the original input image and remove redundant background information.Second,performing U-net segmentation on the preprocessed images.At the same time,in order to test whether the performance and accuracy of the segmentation network are improved after the introduction of Faster R-CNN,ablation experiments and comparison experiments are conducted.In the ablation experiment,the detection and cropping module in the U-net segmentation network is removed,and the U-net and its improved network are selected to do a set of ablation experiments respectively.Experiments show that the average intersection ratio and Dice coefficient of the new method is 0.89 and 0.94 on the ACDC dataset,respectively,which is 7.3% and 5% higher.On the SCD dataset,it is 0.96 and 0.98,which is 5% and 3% higher,respectively.Automatic preprocessing and segmentation of MRI images is achieved.
    Reference | Related Articles | Metrics
    Superpixel Segmentation Iterative Algorithm Based on Ball-k-means Clustering
    LIU Yao, GUAN Lihe
    Computer Science    2023, 50 (6A): 220600114-7.   DOI: 10.11896/jsjkx.220600114
    Abstract173)      PDF(pc) (2730KB)(272)       Save
    Considering the problem of superpixel segmentation,this paper propose an iterative algorithm of superpixel segmentation based on Ball-k-means clustering to further improve the edge fit of superpixels.Firstly,the superpixels are regarded as five-dimensional hyperspheres,and the image is evenly segmented to obtain the initial superpixels.Secondly,the neighbor superpixels are searched according to the radius and distance between the centers of adjacent superpixels.Then,using the distances between the superpixels and their neighbor superpixel centers,the superpixels are divided into a stable region and multiple ring active regions.Finally,the pixels in each annular active area are divided into the nearest neighbor superpixel only according to their distance from the center of some neighbor superpixels,so as to realize the superpixel segmentation iteratively.In order to reduce the distance calculation and speed up the convergence,a judgment theorem of the relation between the nearest neighbor superpixels is given,and an adaptive partition updating strategy is designed for the superpixel class labels of pixels.Experimental comparison and analysis on BSD500 data set show that the proposed algorithm has better segmentation effect on different types of images,with higher edge fitting degree,less influence by parameters,and more stable segmentation results.
    Reference | Related Articles | Metrics
    Two-stage Method for Restoration of Heritage Images Based on Muti-scale Attention Mechanism
    LIU Haowei, YAO Jingchi, LIU Bo, BI Xiuli, XIAO Bin
    Computer Science    2023, 50 (6A): 220600129-8.   DOI: 10.11896/jsjkx.220600129
    Abstract264)      PDF(pc) (5256KB)(339)       Save
    The use of virtual technology is important for the restoration of relics,which are often damaged by improper preservation or physical restoration methods.Existing traditional image restoration techniques and deep learning-based restoration methods are mainly suitable for images with simple structural textures,small damaged areas,or natural images with regular damage,and cannot be directly applied to heritage images.Using landscape painting image restoration as an example,a two-stage method for restoration of heritage images based on a multi-scale attention mechanism is proposed in this paper to address the problems of complex structural textures,discreet colouring and small size of existing datasets of heritage images.The method firstly performs coarse restoration of the overall structure and base tones of the image based on the global attention mechanism,then performs local fine restoration of small structures and fine textures of the image using the local attention mechanism and the residual module,as well as global fine restoration of large structures and textures using the contextual attention mechanism on the result of coarse restoration to borrow information accurately at a distance.Finally,the local and global fine restoration results are fused to achieve the restoration of heritage images.The proposed method has the advantage of improving the peak signal-to-noise ratio by 3.76 dB and the structural similarity by 0.034 compared with the comparative methods on average.Both the subjective and objective analysis of the experimental results show that the method has some advantages in semantic rationality,information accuracy and visual naturalness compared with the existing methods,and has a high potential for application in the field of heritage restoration.
    Reference | Related Articles | Metrics
      First page | Prev page | Next page | Last page Page 1 of 6, 166 records