Started in January,1974(Monthly)
Supervised and Sponsored by Chongqing Southwest Information Co., Ltd.
ISSN 1002-137X
CN 50-1075/TP
CODEN JKIEBK
Editors
    Content of ChinaMM2018 in our journal
        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Deep Residual Network Based HEVC Compressed Videos Enhancement
    HE Xiao-yi,DUAN Ling-yu,LIN Wei-yao
    Computer Science    2019, 46 (3): 88-91.   DOI: 10.11896/j.issn.1002-137X.2019.03.011
    Abstract534)      PDF(pc) (2331KB)(1434)       Save
    This paper proposed a HEVC-compressed videos enhancement method based on deep residual network.This method utilizes several stacked residual blocks to achieve feature extraction,followed by feature enhancement and reconstruction.Compared with the existing methods which only use a few convolutional layers,the proposed method can capture the feature of input compressed frames in a more distinctive and stable way.Experimental results show that the proposed method leads to over 6.92% BD-rate saving on 20 benchmark sequences and achieves the best performance among the compared methods.
    Reference | Related Articles | Metrics
    Perceptual Model Based on GLCM Combined with Depth
    YE Peng, WANG Yong-fang, XIA Yu-meng, AN Ping
    Computer Science    2019, 46 (3): 92-96.   DOI: 10.11896/j.issn.1002-137X.2019.03.012
    Abstract555)      PDF(pc) (1545KB)(810)       Save
    Just Noticeable Distortion (JND) model is a kind of perceptual model,which is one of the most effective methods to remove the visual redundancy in image/video compression.Because the calculation of the contrast masking effect (CM) is not perfect and the consideration of depth information is not accurate in the existing JND model,this paper proposed a JND model combined with depth based on gray level co-occurrence matrix (GLCM).Firstly,the image is decomposed into the edge part and the texture part by the total variance(TV) method,the edge part is processed by Canny operator and the texture part is processed by GLCM.A more accurate CM model is formed by incorporating above two parts.Further,a new JND model based on gray-level co-occurrence Matrix is established by combining the background brightness masking effect.Besides,based on the human depth perception,a novel depth weighting model is proposed.Finally,a new perceptual model combined with depth based on GLCM is established.The experimental results show that the proposed model is more consistent with the human visual perception.Comparing with the existing JND model,the proposed model can tolerate more distortion and has much better perceptual quality.
    Reference | Related Articles | Metrics
    Improved MDP Tracking Method by Combining 2D and 3D Information
    WANG Zheng-ning, ZHOU Yang, LV Xia, ZENG Fan-wei, ZHANG Xiang, ZHANG Feng-jun
    Computer Science    2019, 46 (3): 97-102.   DOI: 10.11896/j.issn.1002-137X.2019.03.013
    Abstract496)      PDF(pc) (2271KB)(785)       Save
    Online multi-object tracking (MOT) plays an important role in autonomous driving and ADAS system.Most of recent MOT methods concentrate on tracking in image domain.Although they can solve most of problems by building adaptive online models or optimizing energy functions,it’s still an obstacle for researchers to handle mutual occlusion in complex traffic scenes.In this paper,an improved tracking method was proposed by introducing 3D information to the Markov decision processes (MDP) tracker.The original MDP similarity feature was extended from image domain to spatial domain with 2D-3D combined feature,and a new optical flow descriptor,called multi-image FB error,was addressed to replace the original multi-aspect FB error.This methodwas tested on KITTI benchmark and the results verified that the comprehensive performance of the proposed method is refined significantly in comprehensive performance compared with the original method.
    Reference | Related Articles | Metrics
    Profit Optimization for Multi-content Video Streaming over Mobile Network Based on User Preference
    XU Jing-ce, LIANG Bing, LI Meng-nan, JI Wen, CHEN Yi-qiang
    Computer Science    2019, 46 (3): 103-107.   DOI: 10.11896/j.issn.1002-137X.2019.03.014
    Abstract489)      PDF(pc) (1477KB)(821)       Save
    In recent years,the emergence of 4G and 5G network has greatly improved the bandwidth of mobile device data transmission,while the performance of video playback devices has been also improved,which increases the user’s demand on the quality of video streaming gradually.Thus,improving the profit of video streaming over mobile network is becoming more and more important.This paper analyzed the effect of user preference on the profit of multi-content videostreaming system.Moreover,this paper proposed the profit model of End Users based on user preference by consi-dering the traffic cost and formulated the optimization problem of total system profit into weighted profit optimization problem.Considering that the users with different preferences have different effects on the total profit of video streaming system,this paper proposed a weight selection algorithm of End Users based on preference-bitrate ratio to select the optimal weights under the condition of current user preferences.Then the optimal bitrate under the condition of current user preference was obtained by solving the optimization problem of optimal weighted profit of End Users.The experimental results show that the proposed method improves the total profit of system by 5%~10% compared with the exis-ting method.
    Reference | Related Articles | Metrics
    Adaptive Weighted Bi-prediction Method Based on Reference Quality
    YANG Min-jie, ZHU Ce, GUO Hong-wei, JIANG Ni
    Computer Science    2019, 46 (3): 108-112.   DOI: 10.11896/j.issn.1002-137X.2019.03.015
    Abstract288)      PDF(pc) (1340KB)(792)       Save
    In modern video codecs,bi-prediction technique plays a significant role for removing temporal redundancy by exploiting temporal correlations between pictures.The bi-prediction signal is formed simply by averaging two uni-prediction signals using a fixed weight value 0.5.However,it will produce serious distortion in some condition that illumination changes rapidly from one reference picture to another or the prediction quality of one motion-compensated prediction block may differ from the other due to the factors such as quantization.To solve the above problems,an adaptive weighted bi-prediction method based on reference quality was proposed in this paper.In this scheme,the greater weight value will be assigned to the reference block if the quality of the reference block is better,and vice versa.The simulation results show that compared with JEM5.0.1,the proposed weighted bi-prediction can achieve about 0.25% and 0.3% Bjntegaard delta (BD) bitrate savings on average under random access main (RA) and low-delay B main(LDB) confi-gurations,respectively,while the increased encoding and decoding complexities are moderate.
    Reference | Related Articles | Metrics
    Deep Learning Based Fast VideoTranscoding Algorithm
    XU Jing-yao, WANG Zu-lin, XU Mai
    Computer Science    2019, 46 (3): 113-118.   DOI: 10.11896/j.issn.1002-137X.2019.03.016
    Abstract560)      PDF(pc) (1725KB)(1084)       Save
    Due to the good rate-distortion performance,as the latest video compression standard,high efficiency video coding (HEVC) has been adopted by more and more terminals.However,there are still a large number of H.264 streams in the field of video compression.Therefore,H.264 to HEVC video transcoding is a meaningful research issue.The simplest way to achieve H.264 to HEVC transcoding is to directly cascade the H.264 decoder and the HEVC encoder.Due to high complexity of the HEVC coding process,this transcoding method is time-consuming.Therefore,this paper proposed a fast H.264 to HEVC transcoding method based on deep learning to predict the CTU(Coding Tree Unit) partition of HEVC,avoiding the brute-force search of CTU partition for rate-distortion optimization(RDO).First,a large-scale database of H.264 to HEVC transcoding is built for ensuring the training of deep learning model.Second,the correlation between HEVC CTU partition and H.264 domain features is analyzed,and the similarity of CTU partition across frames is found out.Then,a three-level classifier based on LSTM (Long Short-Term Memory) is designed to predict the CTU partition.The experimental results show that the H.264 to HEVC fast transcoding algorithm proposed in this paper achieves 60% reduction in complexity compared to the original transcoder,while the peak signal-to-noise ratio is only reduced by 0.039kdB,so the proposed method outperforms the state-of-the-art transcoding methods.
    Reference | Related Articles | Metrics
    Deep Convolutional Prior Guided Robust Image Separation Method and Its Applications
    JIANG Zhi-ying, LIU Ri-sheng
    Computer Science    2019, 46 (3): 119-124.   DOI: 10.11896/j.issn.1002-137X.2019.03.017
    Abstract482)      PDF(pc) (3044KB)(938)       Save
    Single image layer separation aims to divide the observed image into two independent and practical components based on the requirement of tasks.Many tasks in computer vision can be understood as the separation of two different layers essentially,such as single image rain streak removal,intrinsic image decomposition and reflection removal.Therefore,an excellent image layer decomposition method would promote the solution of these problems greatly.Since there is only one known variable,two variables need to be recovered.This problem is fundamentally ill-posed.Most exis-ting approaches tend to design complex priors according to the different characteristics between the two separated layers.However,loss function with complex prior regularization is hard to be optimized.Performance is also compromised by the fixed iteration schemes and less data fitting ability.More importantly,these conventional prior based methods can only be applied to one specific task as they are weak in generalization.To partially mitigate the limitations mentioned above,this paper developed a flexible optimization technique to incorporate deep architectures into optimization iterations for adaptive image layer separation.As we all know,the convolutional neural network is a network structure composed of convolutions and other non-linear operations.
    第3期姜智颖,等:深度卷积先验引导的鲁棒图像层分离方法及其应用
    The convolution operation uses different convolution kernels to extract different features for a given image,so the convolution kernel has very strong capabilities for feature extraction.Recently,the advantages of deep learning in feature extraction have been gradually reflected and are increasingly used in the low-level image processing.Therefore,the proposed method uses deep convolutional prior instead of traditional model prior to characterize different layers.At the same time,in order to reduce the network’s dependence on training data and improve the effectiveness of the algorithm on different tasks,deep information is combined with traditional optimization framework.Specifically,energy function using MAP (Maximum A Posteriori) is built and then the model is transfered to three subproblems based on ADMM (Alternating Direction Method of Multipliers).The first two subproblems are to estimate two approximate separated layers,and the other subproblem is to solve the final result.In other words,deep convolutional networks are used to guide the process of model optimization.In this way,the proposed method not only retains the advantage of feature extraction in deep structure,but also maintains the stability of traditional model optimization and improves the effectiveness of networks.Finally,this method is applied to a variety of ima-ge restoration tasks,including single image rain streak removal and reflection removal.By comparing this method with several tasks-specific methods including conventional model methods and deep learning methods respectively,this me-thod shows great advantages in both visual effects and numerical results.It reveals that this method has a strong genera-lization in multi-tasks and outperforms other methods in each task.
    Reference | Related Articles | Metrics
    Liver CT Image Feature Extraction Method Based on Improved Multi-scale LBP Algorithm
    LIU Xiao-hong, ZHU Yu-quan, LIU Zhe, SONG Yu-qing, ZHU Yan, YUAN De-qi
    Computer Science    2019, 46 (3): 125-130.   DOI: 10.11896/j.issn.1002-137X.2019.03.018
    Abstract486)      PDF(pc) (2200KB)(1284)       Save
    Liver cancer,Malignant liver tumors,can be divided into primary and secondary categories.Recent census data prove that the current annual mortality of liver cancer has ranked third in the world.The diagnosis of early liverdi-sease is beneficial to the treatment of liver cancer.The local binary pattern(LBP) algorithm has been widely used in the diagnosis of liver lesions.Although the traditional LBP method is simple,efficient,and easy to understand,but it lacks multi-scale information which leads to incomplete information description and lack of key information.In view of the defect that high order directional derivative local binary pattern(DLBP) algorithm will lose key information,extended multi-scale LBP algorithm(MSLBP) was proposed.The method firstly preprocesses the liver CT image to extract ROI region,then uses the extended multi-scale LBP feature extraction method to extract features.This method fuses the high-order sampling point information with its neighboring point information as the final information of the sampling point to participate in the operation.At the same time,the operation of averaging the diagonal regions highlights the neighborhood and describes the texture information of the liver image from a larger range.Finally,the classification algorithm is executed.The experimental results show that the accuracy of the proposed method can reach 90.1%,which is 8.7% higher than the original LBP feature extraction method.
    第3期刘晓虹,等:基于改进多尺度LBP算法的肝脏CT图像特征提取方法
    It has certain clinical application significance and can be used to help doctors diagnose.In the image preprocessing section,since medical images are different from natural images,the DICOM images gotten from hospital cannot be used directly.The first step of image preprocessing is to set Pixel Padding Value to zero.The second step of image preprocessing is converting pixel values to CT values using the equation 7 in section 2.1 according to header file information of the DICOM image.Then,an improved multi-scale LBP feature extraction was performed.The multi-scale feature is extracted while the relationship between neighboring pixels is considered.The LBP model used in this paper is a uniform LBP,with a total of 59 features.In order to prove the effectiveness of the improved multi-scale algorithm,this paper used complete local binary pattern(CLBP),four-patch LBP(FPLBP),dominant rotated local binary pattern(drLBP),local binary pattern(LBP) and other feature extraction methodsto extract the texture features of liver CT images,and then compared the experimental results,as shown in Table 1 in Section 4.2.Through the statistics of feature dimensions for all methods,it is proved that the multi-scale LBP method proposed in this paper has low dimensionality and high efficiency.The experimental results show that the proposed method can extend the multi-scale characteristics of LBP well,and describe the macro-texture structure information of a larger area while maintaining the same dimension.At the same time,the relationship information between adjacent pixels is taken into account,which makes up for the lack of sufficient information description and improves the accuracy of the algorithm.
    Reference | Related Articles | Metrics
    Video Advertisement Classification Method Based on Shot Segmentation and Spatial Attention Model
    TAN Kai, WU Qing-bo, MENG Fan-man, XU Lin-feng
    Computer Science    2019, 46 (3): 131-136.   DOI: 10.11896/j.issn.1002-137X.2019.03.019
    Abstract409)      PDF(pc) (3571KB)(829)       Save
    As video advertisement is increasingly used in some areas such as search and user recommendation,advertisement video classification becomes an important issue and poses a significant challenge for computer vision.Different from the existing video classification task,there are two challenges of advertisement video classification.First,advertised products appear in advertisement video aperiodically and sparsely.This means that most of frames are irrelevant to advertisement category,which can potentially cause interference with classification models.Second,there are complex background in advertisement video which makes it hard to extract useful information of product.To solve these problems,this paper proposed an advertisement video classification method based on shot segmentation and spatial attention model (SSSA).To address interference of irrelevant frames,a shot based partitioning method was used to sample frames.To solve the influence of complex background on feature extraction,the attention mechanism was embedded into SSSA to locate products and extract discriminative feature from the attention area which is mostly related to the advertised products.An attention predictionnetwork (APN) was trained to predict the attention map.To verify the proposed model,this paper introduced a new thousand-level dataset for advertisement video classification named TAV,and the gaze data were also collected to train the APN.Experiments evaluated on the TAV dataset demonstrate that the performance of the proposed model improves about 10% compared with the state-of-the-art video classification methods.
    Reference | Related Articles | Metrics
    Real-time High-confidence Update Complementary Learner Tracking
    FAN Rong-rong, FAN Jia-qing, LIU Qing-shan
    Computer Science    2019, 46 (3): 137-141.   DOI: 10.11896/j.issn.1002-137X.2019.03.020
    Abstract324)      PDF(pc) (3056KB)(955)       Save
    To address the issue that the complementary learner tracking algorithm (Staple) cannot perform well when the target suffers from severe occlusions,a high-confidence update complementary learner tracker (HCLT) was proposed.Firstly,at the input frame,a standard correlation filter is employed to calculate the correlation filter (CF) response.Secondly,the confidence value based on the CF response is calculated and the update of the correlation filter is stopped when the current confidence value exceeds the mean confidence value.Then,if the number of the continuous no-updated frames comes up to ten,the tracker will be forced to update the filter.Finally,the final response is obtained by combining the CF response with the color response,and the location of maximum response is the tracking result.Expe-riment results show that compared with several state-of-the-art trackers including complementary learner(Staple),end-to-end representation correlation filter net tracker(CFNet),attentional correlation filter network tracker(ACFN) and hedged deep tracking(HDT),the proposed algorithm is the best in terms of success rate,outperforming the baseline tracker Staple by 1.0 percentage points and 0.4 percentage points interms of success rate and expected average overlap(EAO)on OTB100 dataset and VOT2016 dataset,respectively.Besides,the performance on heavy occlusion and severe illumination variation sequences demonstrates the effectiveness of proposed tracker when handling drastic appearance variations.
    Reference | Related Articles | Metrics
    Improved R-λ Model Based Rate Control Algorithm
    GUO Hong-wei, LUO Hong-jun, LIU Shuai, NIU Lin, YANG Bo
    Computer Science    2019, 46 (3): 142-147.   DOI: 10.11896/j.issn.1002-137X.2019.03.021
    Abstract391)      PDF(pc) (1455KB)(835)       Save
    Rate control is an important module in video coding systems,which makes an encoder output specific bit rates,and minimizes the distortion of encoded videos.The R-λ model based rate control is recommended in high efficiency video coding (HEVC) international video coding standard,which mainly includes two schemes,i.e.,the fixed ratio bit allocation and the adaptive ratio bit allocation.In order to improve both the accuracy of rate control and the rate-distortion (R-D) performance,this paper proposed an improved R-λ model based rate control algorithm.Firstly,an accurate R-D model update method is designed according to the coding structure of group of picture (GOP).Secondly,the GOP-level bit allocation scheme is improved according to the relationship of R-D dependency.Finally,the calculation formulas of the dynamic Lagrange multiplier at GOP-level and the dynamic bit weight for the frame to be encoded are proposed.Experimental results demonstrate that the bit rate relative errors of the proposed method are only about 0.006% and 0.005%,and achieves average 1.2% and 1.3% R-D performance gains compared with the adaptive ratio bit allocation scheme under the low delay configuration of P and B frames,respectively.
    Reference | Related Articles | Metrics
    3-D Model Retrieval Algorithm Based on Residual Network
    LI Yin-min, XUE Kai-xin, GAO Zan, XUE Yan-bin, XU Guang-ping, ZHANG Hua
    Computer Science    2019, 46 (3): 148-153.   DOI: 10.11896/j.issn.1002-137X.2019.03.022
    Abstract468)      PDF(pc) (2263KB)(1069)       Save
    In recent years,view-based 3D model retrieval has become a key research direction in the field of computer vision.The 3D model retrieval algorithm includes feature extraction and model retrieval where robust features play a decisive rolein retrieval algorithm.Up to now,the traditional hand-crafted features and deep learning features were proposed,but very few people systematically compare them.Therefore,in this work,the performance of different artificial design features and deep learning features was evaluated and analyzed.Based on the premise of full comparison,multiple data sets,multiple evaluation criteria,and different search algorithms were used to conduct experiments.The effects of different layers of deep network on performance were further compared,and a 3D model retrieval algorithm based on residual network was proposed.Several conclusions could be obtained from the experimental results on multiple public datasets.1)When comparing the deep learning features of VGG network and residual network with traditional hand-crafted features,the improvement of comprehensive performance can reaches 3% to 20%.2)Compared with the deep features extracted by VGG network,the comprehensive performance of the residual network is increased by 1% to 5%.3)The performance of different layer features in the VGG network is also different,and the comprehensive performance of the deep and shallow features is increased by 1% to 6%.4)As the depth of the network increase,the overall perfor-mance of the extracted features of the residual network has limited improvement,and is more robust than other contrasting features.
    Reference | Related Articles | Metrics
    Deinterlacing Algorithm Based on Scene Change and Content Characteristics Detection
    ZHU Xiao-tao, LI Yan-ping, HUANG Yuan, HUANG Qian
    Computer Science    2019, 46 (3): 154-158.   DOI: 10.11896/j.issn.1002-137X.2019.03.023
    Abstract517)      PDF(pc) (1541KB)(698)       Save
    This paper proposed a deinterlacing algorithm based on scene change and content characteristics detection.Firstly,scene changes and video content characteristics were detected.Secondly,optimized motion estimation was performed based on scene change detection results.Thirdly,the image blocks were locally partitioned and different interpolation methods were applied.Experimental results show that the algorithm can not only improve the vertical image resolution with lower algorithm complexity,but also obtain high-quality progressive sequences for interlaced video sequences of different video content.
    Reference | Related Articles | Metrics
    Detection Method of Insulator in Aerial Inspection Image Based on Modified R-FCN
    ZHAO Zhen-bing, CUI Ya-ping, QI Yin-cheng, DU Li-qun, ZHANG Ke, ZHAI Yong-jie
    Computer Science    2019, 46 (3): 159-163.   DOI: 10.11896/j.issn.1002-137X.2019.03.024
    Abstract601)      PDF(pc) (2799KB)(919)       Save
    In the case of partial occlusion of insulator target in aerial inspection images,the region-based fully convolutional networks (R-FCN) model is used for detection,however,the insulator target detection effect is poor and the detection frame cannot completely fit the target.Based on this,this paper proposed an insulator target detection method based on modified R-FCN in aerial inspection image.Firstly,according to the aspect ratio feature of insulator targets,the aspect ratios of proposals in the R-FCN model are modified to 1∶4,1∶2,1∶1,2∶1,4∶1.Then,in view of the occlusion problem in insulator image,an adversarial spatial dropout network (ASDN) layer is introduced into the R-FCN model to generate the samples of incomplete target feature by masking part of feature map,which can improve the detection performance of the model for samples with poor target feature.The average detection rate of R-FCN model reaches 77.27% on the dataset containing 7433 insulator targets.The average detection rate of the modified R-FCN detection method is 84.29%,which improves 7.02%,and the detection frame is more suitable for the target.
    Reference | Related Articles | Metrics
      First page | Prev page | Next page | Last page Page 1 of 1, 14 records