Computer Science

Select

Advances in End-to-End Optimized Image Compression Technologies

LIU Dong, WANG Ye-fei, LIN Jian-ping, MA Hai-chuan, YANG Run-yu

Computer Science 2021, 48 (3): 1-8. DOI: 10.11896/jsjkx.201100134

Abstract （1280）

PDF（pc）（2746KB）（3381）

Save

Image compression is the application of data compression technologies on digital images,aiming to reduce redundancy in image data,so as to store and transmit data with a more efficient format.In traditional image compression methods,image compression is divided into several steps,such as prediction,transform,quantization and entropy coding,and each step is optimized by manually designed algorithm separately.In recent years,end-to-end image compression methods based on deep neural networks have achieved fruitful results.Compared with the traditional methods,end-to-end image compression can be optimized jointly,which often achieves higher compression efficiency than the traditional methods.In this paper,the end-to-end image compression methods and network structures are introduced,and the key technologies of end-to-end image compression are described,including quantization technology,probability modeling and entropy coding technology,as well as encoder-side bit allocation technology.Then it introduces the research of extended applications of end-to-end image compression,including scalable coding,variable bit rate compression,visual perception and machine perception oriented compression.Finally,the compression efficiency of end-to-end image compression is compared with the traditional methods,and the compression performance is demonstrated.Experimental results show that the compression efficiency of the state-of-the-art end-to-end image compression method is much higher than that of the traditional image coding methods including JPEG,JPEG2000 and HEVC intra.Compared with the newest coding standard VVC intra,the end-to-end image compression method can save up to 48.40% of the coding rate while maintain the same MS-SSIM.

Reference | Related Articles | Metrics

Select

Research Progress on Deep Learning-based Image Deblurring

PAN Jin-shan

Computer Science 2021, 48 (3): 9-13. DOI: 10.11896/jsjkx.201200043

Abstract （1147）

PDF（pc）（1258KB）（3779）

Save

With the increasing development of portable and smart digital imaging devices,the way to capture photos is more convenient and flexible.Digital images play an important role in video surveillance,medical diagnosis,space exploration,and so on.However,the captured images usually contain significant blur and noise due to the limited quality of the camera sensors,the skill of the photographers,and the imaging environments.How to restore the clear images from blurry ones so that they can facilitate the following intelligent analysis tasks is important but challenging.Image deburring is a classical ill-posed problem.Represented methods for this problem include the statistical prior-based methods and data-driven methods.However,conventional statistical prior-based methods have limited ability for modeling the inherent properties of the clear images.The data-driven methods,especially the deep learning methods,provide an effective way to solve image deblurring.This paper focuses on the deep learning-based image deblurring methods.It first introduces the research progress of the image deblurring problem,and then analyzes the challenges of the image deblurring problem.Finally,it discusses the research trends of the image deblurring problem.

Reference | Related Articles | Metrics

Select

Survey on Image Inpainting Research Progress

ZHAO Lu-lu, SHEN Ling, HONG Ri-chang

Computer Science 2021, 48 (3): 14-26. DOI: 10.11896/jsjkx.210100048

Abstract （1528）

PDF（pc）（2724KB）（5678）

Save

Image inpainting is a challenging research topic in the field of computer vision.In recent years,the development of deep learning technology has promoted the significant improvement in the performance of image inpainting,which makes image inpainting a traditional subject attracting extensive attention from scholars once again.This paper is dedicated to review the key technologies of image inpainting research.Due to the important role and far-reaching impact of deep learning technology in solving “large-area missing image inpainting”,this paper briefly introduces traditional image inpainting methods firstly,then focuses on inpainting models based on deep learning,mainly including model classification,comparison of advantages and disadvantages,scope of application and performance comparison on commonly used datasets,etc.Finally,the potential research directions and development trends of image inpainting are analyzed and prospected.

Reference | Related Articles | Metrics

Select

Adversarial Attacks and Defenses on Multimedia Models:A Survey

CHEN Kai, WEI Zhi-peng, CHEN Jing-jing, JIANG Yu-gang

Computer Science 2021, 48 (3): 27-39. DOI: 10.11896/jsjkx.210100079

Abstract （684）

PDF（pc）（1638KB）（1990）

Save

In recent years,with the rapid development and wide application of deep learning,artificial intelligence is profoundly changing all aspects of social life.However,artificial intelligence models are also vulnerable to well-designed “adversarial examples”.By adding subtle perturbations that are imperceptible to humans on clean image or video samples,it is possible to generate adversarial examples that can deceive the model,which leads the multimedia model to make wrong decisions in the inference process,and bring serious security threat to the actual application and deployment of the multimedia model.In view of this,adversarial examples generation and defense methods for multimedia models have attracted widespread attention from both academic and industry.This paper first introduces the basic principles and relevant background knowledge of adversarial examples generation and defense.Then,it reviews the recent progress on both adversarial attack and defense on multimedia models.Finally,it summarizes the current challenges as well as the future directions for adversarial attacks and defenses.

Reference | Related Articles | Metrics

Select

Advances on Visual Object Tracking in Past Decade

ZHANG Kai-hua, FAN Jia-qing, LIU Qing-shan

Computer Science 2021, 48 (3): 40-49. DOI: 10.11896/jsjkx.201100186

Abstract （987）

PDF（pc）（2602KB）（2072）

Save

Visual object tracking is a task in which the target region of the first frame in a video sequence is given,and then the target area is automatically matched in subsequent frames.Generally speaking,due to the complex factors such as scene occlusion,illumination change and object deformation,the appearance of the target and scene will change dramatically,which makes the tracking task itself is extremely challenging.In the past decade,with the extensive application of deep learning in the field of computer vision,the field of target tracking has also developed rapidly,resulting in a series of excellent algorithms.In view of this rapid development stage,this paper aims to provide a comprehensive review of visual object tracking research,mainly including the following aspects:the improvement of the basic framework of tracking,the improvement of target representation,the improvement of spatial context,the improvement of temporal context,the improvement of data sets and evaluation indicators.This paper also analyzes the advantages and disadvantages of these methods,and puts forward the possible future research trends.

Reference | Related Articles | Metrics

Select

Survey on Video-based Face Recognition

BAI Zi-yi, MAO Yi-rong , WANG Rui-ping

Computer Science 2021, 48 (3): 50-59. DOI: 10.11896/jsjkx.210100210

Abstract （634）

PDF（pc）（2916KB）（2187）

Save

Face recognition is a key technology in the field of biometrics,which has been widely concerned by researchers in the past decades.Video-based face recognition task refers specifically to extract the key information of human faces from a video to complete the personal identification.Compared with the image-based face recognition task,the changing patterns of faces in videos are much more diverse,and there are great differences among the whole video frames as well.Current research focuses on how to extract the key features of faces from lengthy videos.Firstly,this paper introduces the research value and challenges of video-based face recognition.Then,the developing venation of the current research work is explored.Based on the video modeling manners,traditional image set based methods are divided into four categories:linear subspace modeling,affine subspace modeling,nonlinear manifold modeling and statistical modeling.Besides,the methods based on image fusion under the background of deep learning are also introduced.This paper also briefly reviews existing datasets for video-based face recognition and the commonly used performance metrics.Finally,gray features and deep features are used to evaluate the representative works on YTC dataset and IJB-A dataset.Experimental results show that deep neural network can extract robust features of each frame after being trained with large-scale data,which greatly improves the performance of video-based face recognition.Moreover,the effective vi-deo modeling can help to identify the potential human face changing patterns.Therefore,more discriminative information can be found from the large number of samples contained in the video sequence,and the inference of noise samples can be eliminated,which suggests the advantages of video-based face recognition to be applied to a large range of practical application scenarios.

Reference | Related Articles | Metrics

Select

Review of Sign Language Recognition, Translation and Generation

GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng

Computer Science 2021, 48 (3): 60-70. DOI: 10.11896/jsjkx.210100227

Abstract （1058）

PDF（pc）（2250KB）（3354）

Save

Sign language research is a typical cross-disciplinary research topic,involving computer vision,natural language processing,cross-media computing and human-computer interaction.Sign language research mainly includes isolated sign language recognition,continuous sign language translation and sign language video generation.Sign language recognition and translation aim to convert sign language videos into textual words or sentences,while sign language generation synthesizes sign videos based on spoken or textual sentences.In other words,sign language translation and generation are inverse processes.This paper reviews the latest progress of sign language research,introduces its background and challenges,reviews typical methods and cutting-edge research on sign language recognition,translation and generation tasks.Combining with the problems in the current methods,the future research direction of hand language is prospected.

Reference | Related Articles | Metrics

Select

Survey of Cross-media Question Answering and Reasoning Based on Vision and Language

WU A-ming, JIANG Pin, HAN Ya-hong

Computer Science 2021, 48 (3): 71-78. DOI: 10.11896/jsjkx.201100176

Abstract （554）

PDF（pc）（1726KB）（1659）

Save

Cross-media question answering and reasoning based on vision and language is one of the popular research hotspots of artificial intelligence.It aims to return a correct answer based on understanding of the given visual content and related questions.With the rapid development of deep learning and its wide application in computer vision and natural language processing,cross-media question answering and reasoning based on vision and language has also achieved rapid development.This paper systematically surveys the current researches on cross-media question answering and reasoning based on vision and language,and specifi-cally introduces the research progress of image-based visual question answe-ring and reasoning,video-based visual question answering and reasoning,and visual commonsense reasoning.Particularly,image-based visual question answering and reasoning is subdivided into three categories,i.e.,multi-modal fusion,attention mechanism,and reasoning based methods.Meanwhile,visual commonsense reasoning is subdivided into reasoning and pre-training based methods.Moreover,this paper summarizes the commonly used datasets of question answering and reasoning,as well as the experimental results of representative methods.Finally,this paper looks forward to the future development direction of cross-media question answering and reasoning based on vision and language.

Reference | Related Articles | Metrics

Select

Overview of Research on Cross-media Analysis and Reasoning Technology

WANG Shu-hui, YAN Xu, HUANG Qing-ming

Computer Science 2021, 48 (3): 79-86. DOI: 10.11896/jsjkx.210200086

Abstract （625）

PDF（pc）（2405KB）（3605）

Save

Cross-media presents complex correlation characteristics across modalities and data sources.Cross-media analysis and reasoning technology is aimed at multimodal information understanding and interaction tasks.Through the construction of cross-modal and cross-platform semantic transformation mechanisms,as well as further question-and-answer interactions,it is constantly approaching complex cognitive goals and modeling high-level cross the logical reasoning process of modal information,finally multimodal artificial intelligence is realized.This paper summarizes the research background and development history of cross-media analysis and reasoning technology,and summarizes the key technologies of cross-modal tasks involving vision and language.Based on the existing research,this paper analyzes the existing problems in the field of multimedia analysis,and finally discusses the future development trend.

Reference | Related Articles | Metrics

Select

Survey on Visual Question Answering and Dialogue

NIU Yu-lei, ZHANG Han-wang

Computer Science 2021, 48 (3): 87-96. DOI: 10.11896/jsjkx.201200174

Abstract （1053）

PDF（pc）（1426KB）（2267）

Save

Visual question answering and dialogue are important research tasks in artificial intelligence,and the representative problems in the intersection of computer vision and natural language processing.Visual question answering and dialogue tasks require the machine to answer single-round or multi-round questions based on the specified visual content.Visual question answering and dialogue require the machine’s abilities of perception,cognition and reasoning,and have application prospects in cross-modal human-computer interaction applications.This paper reviews recent research progress of visual question answering and dialogue,and summarizes datasets,algorithms,challenges,and problems.Finally,this paper discusses the future research trend of visual question answering and dialogue.

Reference | Related Articles | Metrics

Select

Survey of Multimedia Social Events Analysis

QIAN Sheng-sheng, ZHANG Tian-zhu, XU Chang-sheng

Computer Science 2021, 48 (3): 97-112. DOI: 10.11896/jsjkx.210200023

Abstract （652）

PDF（pc）（3323KB）（1576）

Save

With the rapid development of network technology,various Internet-based communication channels,such as self-media,Weibo,BBS,are becoming perfect platforms for people to easily generate and share rich social multimedia content online.Social event data have the characteristics of multi-platform,multi-modal,large-scale and high noise,which bring huge challenges for the analysis and research based on multimedia social events.Therefore,how to process social media data,study social event analysis methods,and design effective social event analysis models become key issues in social event analysis research.This paper presents a review of relevant research in multimedia social event analysis in recent years,focusing on multimedia social event representation methods and their applications in the fields of fake news detection,multimedia hot event detection,tracking and evolution analysis,as well as social media crisis event response.In addition,the datasets involved in different applications are introduced in detail.In the last section,this paper discusses possible future research topics in multimedia social event analysis.

Reference | Related Articles | Metrics