Computer Science

Select

Computer Science 2022, 49 (2): 1-3. DOI: 10.11896/jsjkx.qy20220201

Abstract （308）

PDF（pc）（1202KB）（507）

Save

Related Articles | Metrics

Select

Micro-expression Recognition Method Combining Feature Fusion and Attention Mechanism

LI Xing-ran, ZHANG Li-yan, YAO Shu-jing

Computer Science 2022, 49 (2): 4-11. DOI: 10.11896/jsjkx.210900028

Abstract （645）

PDF（pc）（2093KB）（1202）

Save

Micro-expression refers to an uncontrollable muscle movement on the face when people try to hide or suppress their true emotions.Due to the short duration,small motion range,and difficulty in concealing and restraining,the recognition accuracy of such emotional facial expressions is restricted.In order to cope with these challenges,this paper proposes a novel micro-expression recognition method combining feature fusion and attention mechanism,considering optical flow features and face features,and further adding attention mechanism to improve the recognition performance.The processing steps of this method are as follows:1)Extract the optical flow and optical strain from Onset to Apex in each micro-expression segment,input the vertical optical flow,horizontal optical flow and optical strain into a shallow 3DCNN,and extract the optical flow features.2)Taking the deep convolution neural network ResNet-10 as the backbone network,the convolution attention module is added to extract face features.3)Combine the two feature vectors for classification.The experimental results reveal that the proposed method is superior to the traditional methods and existing deep learning methods in micro-expression recognition.

Reference | Related Articles | Metrics

Select

Survey on Generalization Methods of Face Forgery Detection

DONG Lin, HUANG Li-qing, YE Feng, HUANG Tian-qiang, WENG Bin, XU Chao

Computer Science 2022, 49 (2): 12-30. DOI: 10.11896/jsjkx.210900146

Abstract （834）

PDF（pc）（2379KB）（1812）

Save

The rapid development of deep learning technology provides powerful tools for the research of deepfake.Forged videos and images are more and more difficult for human eyes to distinguish between real and fake.Videos and images on the internet may have a huge negative impact on social life,such as financial fraud,the spread of fake news,and personal bullying.At present,the fake face detection technology based on deep learning has reached a high accuracy on multiple benchmark databases such as FaceForensics++,but the detection accuracy on cross-databases is much lower than accuracy on the source database,that is,it is difficult for many detection methods to generalize to different types of forgeries,or unknown types of forgeries,which also motivates more scholars to focus on generalization methods.The generalization research of face forgery detection focuses on methods based on deep learning.Firstly,the commonly used datasets including real-world datasets and multi-task datasets for forgery detection are discussed and compared.Secondly,it classifies and summarizes the generalization of video and image tampering detection from three aspects:data,features,and learning strategies.The data refers to data augmentation in deepfake detection.The features include single-domain features such as frequency domain features and multi-domain features.The learning strategies consist of transfer learning,multi-task learning,meta-learning,and incremental learning.And the advantages and shortcomings of three different types are analyzed.Finally,the future development direction and challenges of face tampering detection generalization are discussed.

Reference | Related Articles | Metrics

Select

Generation Model of Gender-forged Face Image Based on Improved CycleGAN

SHI Da, LU Tian-liang, DU Yan-hui, ZHANG Jian-ling, BAO Yu-xuan

Computer Science 2022, 49 (2): 31-39. DOI: 10.11896/jsjkx.210600012

Abstract （646）

PDF（pc）（3239KB）（1367）

Save

Deepfake can be used to combine human voices,faces and body movements into fake content,switch gender and change age,etc.There are some problems of gender-forged face images based on generative adversarial image translation networks such as the irrelevant image domain changes easily and insufficient face details in generated images.To solve these problems,an gene-ration model of gender-forged face image based on improved CycleGAN is proposed.Firstly,the generator is optimized by using the attention mechanism and adaptive residual blocks to extract richer facial features.Then,with the aim to improve the ability of the discriminator,the loss function is modified by the idea of relative loss.Finally,a model training strategy based on age constraints is proposed to reduce the impact of age changes on the generated images.Performing experiments on the CelebA and IMDB-WIKI datasets,the experimental results show that,compared with the original CycleGAN method and the UGATIT method,theproposed method can generate more real gender-forged face images.The average content accuracy of fake male images and fake female images is 82.65% and 78.83%,and the average FID score is 32.14 and 34.50,respectively.

Reference | Related Articles | Metrics

Select

Review of 3D Face Reconstruction Based on Single Image

HE Jia-yu, HUANG Hong-bo, ZHANG Hong-yan, SUN Mu-ye, LIU Ya-hui, ZHOU Zhe-hai

Computer Science 2022, 49 (2): 40-50. DOI: 10.11896/jsjkx.210500215

Abstract （606）

PDF（pc）（2361KB）（2022）

Save

In the field of computer vision,3D face reconstruction is a valuable research direction.High quality reconstruction of 3D faces can find applications in face recognition,anti-proofing,animation and medical cosmetology.In the last two decades,although great progress has been made 3D face reconstruction based on a single image,the results of reconstruction using traditionalalgorithms are still facing the challenge of facial expression,occlusion and ambient light,and there will be problems such as poor reconstruction accuracy and robustness.With the rapid development of deep learning in 3D face reconstruction,various methods which are superior to traditional reconstruction algorithms have emerged.Firstly,this paper focuses on deep-learning-based reconstruction algorithms.The algorithms are divided into four categories according to different network architecture,and the most popular methods are described in detail.Then commonly used 3D face data sets are introduced,and performance of representative methods are evaluated.Finally,conclusions and prospects of the single-image-based 3D face reconstruction are given.

Reference | Related Articles | Metrics

Select

Research Progress of Face Editing Based on Deep Generative Model

TANG Yu-xiao, WANG Bin-jun

Computer Science 2022, 49 (2): 51-61. DOI: 10.11896/jsjkx.210400108

Abstract （513）

PDF（pc）（3231KB）（1470）

Save

Face editing is widely used in public security pursuits,face beautification and other fields.Traditional statistical me-thods and prototype-based methods are the main means to solve face editing.However,these traditional technologies face pro-blems such as difficult operation and high computational cost.In recent years,with the development of deep learning,especially the emergence of generative networks,a brand new idea has been provided for face editing.Face editing technology using deep generative models has the advantages of fast speed and strong model generalization ability.In order to summarize and review the related theories and research on the use of deep generative models to solve the problem of face editing in recent years,firstly,we introduce the network framework and principles adopted by the face editing technology based on deep generative models.Then,the methods used in this technology are described in detail,and we summarize it into three aspects:image translation,introduction of conditional information within the network,and manipulation of potential space.Finally,we summarize the challenges faced by this technology,which consists of identity consistency,attribute decoupling,and attribute editing accuracy,and point out the issues of the technology that need to be resolved urgently in future.

Reference | Related Articles | Metrics

Select

Human Skeleton Action Recognition Algorithm Based on Dynamic Topological Graph

XIE Yu, YANG Rui-ling, LIU Gong-xu, LI De-yu, WANG Wen-jian

Computer Science 2022, 49 (2): 62-68. DOI: 10.11896/jsjkx.210900059

Abstract （778）

PDF（pc）（1920KB）（1003）

Save

Traditional human skeleton action recognition algorithms manually construct topological graphs to model the action sequence contained in multiple video frames and learn each video frame to reflect the data changes,which may lead to the high computational cost,low network generalization performance and catastrophic forgetting.To solve these problems,a human skeleton action recognition algorithm based on dynamic topological graph is proposed,in which the human skeleton topological graph is dynamically constructed based on continuous learning.Specifically,human skeleton sequence data with multi-relationship characte-ristics are recoded into relationship triplets,and feature embedding is learned in a decoupling manner via the long short-term me-mory network.When handling new skeleton relationship triplets,we dynamically construct the human skeleton topological graph by a partial update mechanism,and then send it to the skeleton action recognition algorithm based on spatio-temporal graph convolution network for action recognition.Experimental results demonstrate that the proposed algorithm achieves 40%,85% and 90% recognition accuracy on three benchmark datasets,namely Kinetics-Skeleton,NTU-RGB+D(X-Sub) and NTU-RGB+D(X-View),respectively,which improve the accuracy of human skeleton action recognition.

Reference | Related Articles | Metrics

Select

Predicting Tumor-related Indicators Based on Deep Learning and H&E Stained Pathological Images:A Survey

YAN Rui, LIANG Zhi-yong, LI Jin-tao, REN Fei

Computer Science 2022, 49 (2): 69-82. DOI: 10.11896/jsjkx.210900140

Abstract （976）

PDF（pc）（5973KB）（2628）

Save

Accurate diagnosis of tumor is very important for customizing treatment plans and predicting prognosis.Pathological diagnosis is considered the “gold standard” for tumor diagnosis,but the development of pathology still faces great challenges,such as the lack of pathologists,especially in underdeveloped areas and small hospitals,has led to long-term overload of pathologists.At the same time,pathological diagnosis relies heavily on the professional knowledge and diagnostic experience of pathologists,and this subjectivity of pathological diagnosis has led to a surge in diagnostic inconsistencies.The breakthrough of whole slide images (WSI) technology and deep learning methods provides new development opportunities for computer-aided diagnosis and prognosis prediction.Histopathological sections stained with hematoxylin-eosin (H&E) can show cell morphology and tissue structure very well,and are simple to make,inexpensive,and widely used.What can be predicted from pathological images alone? After the deep learning method was applied to the field of pathological images,this question got a new answer.In this paper,we first summarize the overall research framework of tumor-related indicators prediction based on deep learning and pathological images.According to the development sequence of the overall research framework,it can be summarized into three progressive stages:WSI predictions based on manually selected single patch,WSI predictions based on majority voting,and WSI predictions with general applicability;Secondly,four supervised or weakly supervised learning methods commonly used in WSI prediction are briefly introduced:convolutional neural network (CNN),recurrent neural network (RNN),graph neural network (GNN),multiple instance learning (MIL).Then,we reviewed the related deep learning methods used in this field,what are the tumor-related indicators that can be predicted through pathological images,and the latest research progress.We mainly reviewed the literature from two aspects:predicting tumor-related indicators (tumor classification,tumor grading,tumor area recognition) that pathologists can read and recognize,and predicting tumor-related indicators (genetic variation prediction,molecular subtype prediction,treatment effect evaluation,survival time prediction) that pathologists cannot read and recognize.Finally,the general problems in this field are summarized,and the possible development direction in the future is suggested.

Reference | Related Articles | Metrics

Select

Multi-target Category Adversarial Example Generating Algorithm Based on GAN

LI Jian, GUO Yan-ming, YU Tian-yuan, WU Yu-lun, WANG Xiang-han, LAO Song-yang

Computer Science 2022, 49 (2): 83-91. DOI: 10.11896/jsjkx.210800130

Abstract （476）

PDF（pc）（3708KB）（1349）

Save

Although deep neural networks perform well in many areas,research shows that deep neural networks are vulnerable to attacks from adversarial examples.There are many algorithms for attacking neural networks,but the attack speed of most attack algorithms is slow.Therefore,the rapid generation of adversarial examples has gradually become the focus of research in the area of adversarial examples.AdvGAN is an algorithm that uses the network to attack another network,which can generate adversarial samples extremely faster than other methods.However,when carrying out a targeted attack,AdvGAN needs to train a network for each target,so the efficiency of the attack is low.In this article,we propose a multi-target attack network(MTA) based on the generative adversarial network,which can complete multi-target attacks and quickly generate adversarial examples by training only once.Experiments show that MTA has a higher success rate for targeted attacks on the CIFAR10 and MNIST datasets than AdvGAN.We have also done adversarial sample transfer experiments and attack experiments under defense.The results show that the transferability of the adversarial examples generated by MTA is stronger than other multi-target attack algorithms,and our MTA method also has a higher attack success rate under defense.

Reference | Related Articles | Metrics

Select

Survey of Research Progress on Adversarial Examples in Images

CHEN Meng-xuan, ZHANG Zhen-yong, JI Shou-ling, WEI Gui-yi, SHAO Jun

Computer Science 2022, 49 (2): 92-106. DOI: 10.11896/jsjkx.210800087

Abstract （941）

PDF（pc）（4336KB）（2534）

Save

With the development of deep learning theory,deep neural network has made a series of breakthrough progress and has been widely applied in various fields.Among them,applications in the image field such as image classification are the most popular.However,research suggests that deep neural network has many security risks,especially the threat from adversarial examples,which seriously hinder the application of image classification.To address this challenge,many research efforts have recently been dedicated to adversarial examples in images,and a large number of research results have come out.This paper first introduces the relative concepts and terms of adversarial examples in images,reviews the adversarial attack methodsand defense me-thods based on the existing research results.In particular,it classifies them according to the attacker's ability and the train of thought in defense methods.This paper also analyzes the characteristics and the connections of different categories.Secondly,it briefly describes the adversarial attacks in the physical world.In the end,it discusses the challenges of adversarial examples in images and the potential future research directions.

Reference | Related Articles | Metrics

Select

Text-to-Image Generation Technology Based on Transformer Cross Attention

TAN Xin-yue, HE Xiao-hai, WANG Zheng-yong, LUO Xiao-dong, QING Lin-bo

Computer Science 2022, 49 (2): 107-115. DOI: 10.11896/jsjkx.210600085

Abstract （678）

PDF（pc）（3673KB）（1175）

Save

In recent years,the research on the methods of text to image based on generative adversarial network (GAN) continues to grow in popularity and have made some progress.The key of text-to-image generation technology is to build a bridge between the text information and the visual information,and promote the model to generate realistic images consistent with the corresponding text description.At present,the mainstream method is to complete the encoding of the descriptions of the input text by pre-training the text encoder,but these methods do not consider the semantic alignment with the corresponding image in the text encoder,and adopt the independent encoding of the input text,ignoring the semantic gap between the language space and the image space.To address the problem,in this paper,a generative adversarial network based on the cross-attention encoder (CAE-GAN) is proposed.The network uses a cross-attention encoder to translate and align text information with visual information,and captures the cross-modal mapping relationship between text and image information,so as to improve the fidelity of the gene-rated images and the matching degree with input text description.The experimental results show that,compared with the DM-GAN model,the inception score (IS) of CAE-GAN model increases by 2.53% and 1.54% on CUB and coco datasets,respectively.The fréchet inception distance score decreases by 15.10% and 5.54%,respectively,indicating that the details and the quality of the images generated by the CAE-GAN model are more perfect.

Reference | Related Articles | Metrics

Select

Study on Super-resolution Reconstruction Algorithm of Remote Sensing Images in Natural Scene

CHEN Gui-qiang, HE Jun

Computer Science 2022, 49 (2): 116-122. DOI: 10.11896/jsjkx.210700095

Abstract （584）

PDF（pc）（3459KB）（913）

Save

Due to the lack of paired datasets in the field of remote sensing image super-resolution reconstruction,current methods obtain low resolution images by bicubic interpolation,in which the degradation model is too idealized,resulting in unsatisfied reconstruction results in real low resolution remote sensing images situations.This paper proposes a super resolution reconstruction algorithm for real remote sensing images.For datasets that lack paired images,this paper builds a more reasonable degradation model,in which a prior of degradation in the imaging process (like blur,noise,down sampling,etc.) is randomly shuffled to generate realistic low-resolution images for training,simulating the generation process of low-resolution remote sensing images.Also,this paper improves a reconstruction algorithm based on generative adversarial networks(GAN) to enhance texture details by introducing attention mechanism.Experiments on UC Merced dataset show a promotion of 1.407 1 dB/0.067 2,0.821 1 dB/0.023 5 compared with ESRGAN and RCAN on the evaluation index of PSNR/SSIM,experiments on Alsat2B dataset promote 1.758 4 dB/0.048 5 compared with the baseline,which show the effective of the degradation model and reconstruction architecture.

Reference | Related Articles | Metrics

Select

Survey on Video Super-resolution Based on Deep Learning

LENG Jia-xu, WANG Jia, MO Meng-jing-cheng, CHEN Tai-yue, GAO Xin-bo

Computer Science 2022, 49 (2): 123-133. DOI: 10.11896/jsjkx.211000007

Abstract （1059）

PDF（pc）（2634KB）（1586）

Save

Video super-resolution (VSR) aims to reconstruct a high-resolution video from its corresponding low-resolution version.Recently,VSR has made great progress driven by deep learning.In order to further promote VSR,this survey makes a comprehensive summary of VSR,and makes a taxonomy,analysis and comparison of existing algorithms.Firstly,since different frameworks are very important for VSR,we group the VSR approaches into two categories according to different frameworks:iterative- and recurrent-network based VSR approaches.The advantages and disadvantages of different networks are further compared and analyzed.Secondly,we comprehensively introduce the VSR datasets,summarize existing algorithms and further compare these algorithms on some benchmark datasets.Finally,the key challenges and the application of VSR methods are analyzed and prospected.

Reference | Related Articles | Metrics

Select

Ray Tracing Checkerboard Rendering in Molecular Visualization

LI Jia-zhen, JI Qing-ge, ZHU Yong-lin

Computer Science 2022, 49 (2): 134-141. DOI: 10.11896/jsjkx.210900126

Abstract （507）

PDF（pc）（3275KB）（662）

Save

Using advanced ray tracing technology in molecular visualization to render images can greatly enhance researchers' observation and perception of molecular structure.However,existing ray tracing methods have the problems of insufficient real-time performance and poor rendering quality.In this paper,a ray tracing checkerboard rendering method is proposed.The ray tracing method is optimized by using the checkerboard rendering technology.The process of the proposed method is divided into four phases:reprojection,rendering,reconstruction and hole filling.In these phases,improvements to the checkerboard rendering are proposed,including forward reprojection,molecular shading bounding box,dynamic image reconstruction and eight-neighbor interpolation hole filling strategy.The experiment in this paper is carried out on 6 molecules with different atomic numbers.Experimental results of the comparison between the proposed method and the current advanced methods on supercomputers show that the real-time frame rate of our method is significantly higher than that of the Tachyon-OSPRay method based on CPU calculation,which is 1.58 times to 1.86 times that of the Tachyon-OSPRay method.Moreover,the proposed method has better frame rate performance than the Tachyon-Optix method based on GPU-accelerated calculation under the condition of relatively few atoms.

Reference | Related Articles | Metrics

Select

Video Anomaly Detection Based on Implicit View Transformation

LENG Jia-xu, TAN Ming-pi, HU Bo, GAO Xin-bo

Computer Science 2022, 49 (2): 142-148. DOI: 10.11896/jsjkx.210900266

Abstract （274）

PDF（pc）（2298KB）（642）

Save

Existing deep learning-based video anomaly detection methods all detect anomalies in video clips under a single view,ignoring the importance of view information in video anomaly detection.Under a single view,when anomalies are occluded or not obvious,the performance of existing algorithms will suffer drops.To avoid this problem,the author firstly introduces the concept of view transformation into video anomaly detection,which improves the robustness of the model by judging abnormalities from multiple views.However,due to the lack of multi-view supervision information in the dataset,it is difficult to achieve explicit view transformation.Specifically,in order to reflect the idea of view transformation,the author proposes a video anomaly detection method based on implicit view transformation,using the optical flow information between frames to warp the implicit view information of the previous frame to the target frame,so as to realize the implicit view transformation from the target frame to the previous frame.And then,the method performs secondary anomaly detection on the target frame after view transformation.Experimental results show that the proposed method responds more sensitively to abnormal data and has a more robust normal data fitting ability.The AUC values on the UCSD Ped2 and CUHK Avenue datasets reached 97.0% and 88.9%,respectively.

Reference | Related Articles | Metrics

Select

Graph Convolutional Skeleton-based Action Recognition Method for Intelligent Behavior Analysis

MIAO Qi-guang, XIN Wen-tian, LIU Ru-yi, XIE Kun, WANG Quan, YANG Zong-kai

Computer Science 2022, 49 (2): 156-161. DOI: 10.11896/jsjkx.220100061

Abstract （451）

PDF（pc）（1737KB）（588）

Save

Smart education is a new education model using modern information technology,and smart behavior analysis is the core component.In the complex classroom scenarios,traditional action recognition algorithms are seriously deficient in accuracy and timeliness.A graph convolutional method based on separation and attention mechanism (DSA-GCN) is proposed to solve the above problems.First,in order to solve the challenge that traditional algorithms are inherently inadequate in aggregating information in the channel domain,multidimensional channel mapping is performed by point-wise convolution,combining the ability of ST-GC to preserve the original spatio-temporal information with the separation ability of depth-separable convolution in spatial and channel feature learning to enhance model feature learning and abstract expressivity.Second,a multi-dimensional fused attention mechanism is used to enhance the model dynamic sensitivity in the spatial convolution domain using self-attention and channel attention mechanisms,and to enhance the key frame discrimination in the temporal convolution domain using temporal and channel attention fusion method.Experiment results show that DSA-GCN achieves better accuracy and effectiveness performance on NTU RGB+D and N-UCLA datasets,and prove the improvement of the ability to aggregate channel information.

Reference | Related Articles | Metrics