计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 113-118.doi: 10.11896/j.issn.1002-137X.2019.03.016

• 2018 中国多媒体大会 • 上一篇    下一篇

基于深度学习的视频转码快速算法

徐婧瑶,王祖林,徐迈   

  1. (北京航空航天大学电子信息工程学院 北京 100191)
  • 收稿日期:2018-07-11 修回日期:2018-09-15 出版日期:2019-03-15 发布日期:2019-03-22
  • 通讯作者: 徐迈(1981-),男,博士,教授,CCF会员,主要研究方向为视频通信、图像处理、计算机视觉,E-mail:maixu@buaa.edu.cn(通信作者)。
  • 作者简介:徐婧瑶(1994-),女,硕士生,主要研究方向为视频转码和深度学习;王祖林(1965-),男,博士,教授,主要研究方向为图像处理、通信信号处理等
  • 基金资助:
    国家自然科学基金(61573037)资助

Deep Learning Based Fast VideoTranscoding Algorithm

XU Jing-yao, WANG Zu-lin, XU Mai   

  1. (College of Electronics and Information Engineering,Beihang University,Beijing 100191,China)
  • Received:2018-07-11 Revised:2018-09-15 Online:2019-03-15 Published:2019-03-22

摘要: 由于良好的率失真表现,新一代视频压缩标准HEVC(High Efficiency Video Coding)得到了越来越多终端设备的支持。然而目前仍有大量的H.264码流存在,因此H.264到HEVC的高效视频转码具有重要的实际意义。实现H.264到HEVC转码最简单的方法,是将H.264解码端和HEVC编码端直接级联起来。由于HEVC编码过程的复杂度较高,这种方法的转码时间较长。针对H.264到HEVC转码耗时的问题,文中提出一种基于深度学习的方法来预测HEVC的CTU(Coding Tree Unit)块划分结果,从而避开HEVC对CTU所有块划分情况循环遍历以寻找率失真最优划分结构的过程,实现H.264到HEVC的快速转码。首先建立了一个H.264到HEVC转码的大型数据库,为训练深度学习模型提供数据保障;随后对H.264压缩域特征和HEVC的CTU块划分模式进行相关性分析,并发掘了CTU块划分模式在时序上的相似性,进而提出基于时间递归神经网络LSTM(Long Short-Term Memory)的三级分类器来预测HEVC的CTU划分。实验结果表明,与直接级联转码器相比,文中提出的H.264到HEVC快速转码算法实现了60%的时间节省,同时峰值信噪比仅下降了0.039kdB,其性能胜过近年来的转码算法的性能。

关键词: H.264, HEVC, 深度学习, 视频转码

Abstract: Due to the good rate-distortion performance,as the latest video compression standard,high efficiency video coding (HEVC) has been adopted by more and more terminals.However,there are still a large number of H.264 streams in the field of video compression.Therefore,H.264 to HEVC video transcoding is a meaningful research issue.The simplest way to achieve H.264 to HEVC transcoding is to directly cascade the H.264 decoder and the HEVC encoder.Due to high complexity of the HEVC coding process,this transcoding method is time-consuming.Therefore,this paper proposed a fast H.264 to HEVC transcoding method based on deep learning to predict the CTU(Coding Tree Unit) partition of HEVC,avoiding the brute-force search of CTU partition for rate-distortion optimization(RDO).First,a large-scale database of H.264 to HEVC transcoding is built for ensuring the training of deep learning model.Second,the correlation between HEVC CTU partition and H.264 domain features is analyzed,and the similarity of CTU partition across frames is found out.Then,a three-level classifier based on LSTM (Long Short-Term Memory) is designed to predict the CTU partition.The experimental results show that the H.264 to HEVC fast transcoding algorithm proposed in this paper achieves 60% reduction in complexity compared to the original transcoder,while the peak signal-to-noise ratio is only reduced by 0.039kdB,so the proposed method outperforms the state-of-the-art transcoding methods.

Key words: Deep learning, H.264, HEVC, Video transcoding

中图分类号: 

  • TN919.81
[1]LIU Z,YU X,GAO Y,et al.CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network[J].IEEE Transactions on Image Processing,2016,25(11):5088-5103.
[2]SHEN L,LIU Z,ZHANG X,et al.An Effective CU Size Decision Method for HEVC Encoders[J].IEEE Transactions on Multimedia,2013,15(2):465-470.
[3]ZHANG D,TONG J,ZAND D.Fast CU partition for H.264/AVC to HEVC transcoding based on fisher discriminant analysis[C]∥Visual Communications and Image Processing.IEEE,2017:1-4.
[4]PEIXOTO E,IZQUIERDO E.A complexity-scalable transcoder from H.264/AVC to the new HEVC codec[C]∥IEEE International Conference on Image Processing.IEEE,2012:737-740.
[5]NAGARAGHATTA A,ZHAO Y,MAXWELL G,et al.Fast
H.264/AVC to HEVC transcoding using mode merging and mode mapping[C]∥IEEE International Conference on ConsumerElectronics.Berlin:IEEE,2016:165-169.
[6]FRANCHE J F,COULOMBE S.Fast H.264 to HEVC
transcoder based on post-order traversal of quadtree structure[C]∥IEEE International Conference on Image Processing.IEEE,2015:477-481.
[7]PEIXOTO E,MACCHIAVELLO B,HUNG E M,et al.An H.264/AVC to HEVC video transcoder based on mode mapping[C]∥IEEE International Conference on Image Processing.IEEE,2014:1972-1976.
[8]PEIXOTOE,SHANABLEH T,IZQUIERDOE.H.264/AVC to HEVC Video Transcoder Based on Dynamic Thresholding and Content Modeling[J].IEEE Transactions on Circuits & Systems for Video Technology,2014,24(1):99-112.
[9]PEIXOTO E,MACCHIAVELLO B,QUEIROZ R L D,et al.Fast H.264/AVC to HEVC transcoding based on machine learning[C]∥Telecommunications Symposium.IEEE,2014:1-4.
[10]JIANG W,CHEN Y,TIAN X.Fast transcoding from H.264 to HEVC based on region feature analysis[J].Multimedia Tools & Applications,2014,73(3):2179-2200.
[11]DAZ-HONRUBIA A J,MARTNEZ J L,PUERTA J M,et al.Fast quadtree level decision algorithm for H.264/HEVC transcoder[C]∥IEEE International Conference on Image Processing.IEEE,2015:2497-2501.
[12]DAZ-HONRUBIA A J,MARTNEZ J L,CUENCA P,et al.
Adaptive Fast Quadtree Level Decision Algorithm for H.264 to HEVC Video Transcoding[J].IEEE Transactions on Circuits & Systems for Video Technology,2016,26(1):154-168.
[13]CORREA G,AGOSTINI L,CRUZ L A D S.Fast H.264/AVC to HEVC transcoder based on data mining and decision trees[C]∥IEEE International Symposium on Circuits and Systems.IEEE,2016:2539-2542.
[14]ZHU L,ZHANG Y,LI N,et al.Machine learning based fast H.
264/AVC to HEVC transcoding exploiting block partition similarity[J].Journal of Visual Communication & Image Representation,2016,38(C):824-837.
[15]Xiph.org.Xiph.org video test media[OL].https://media.xiph.org/video/derf/.
[16]XU M,DENG X,LI S,et al.Region-of-Interest Based Conversational HEVC Coding with Hierarchical Perception Model of Face[J].IEEE Journal of Selected Topics in Signal Processing,2014,8(3):475-489.
[17]OHM J R,SULLIVAN G J,TAN T K,et al.Comparison of the
Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC)[J].IEEE Transactions on Circuits & Systems for Video Technology,2012,22(12):1669-1684.
[18]INGMA D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv preprint arXiv:141206980,2014.
[19]CORREA G,ASSUNCAO P A,AGOSTINI L V,et al.Fast
HEVC Encoding Decisions Using Data Mining[J].IEEE Tran-sactions on Circuits & Systems for Video Technology,2015,25(4):660-673.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[9] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[10] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[11] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[12] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[13] 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰.
基于多源迁移学习的大坝裂缝检测
Dam Crack Detection Based on Multi-source Transfer Learning
计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124
[14] 楚玉春, 龚航, 王学芳, 刘培顺.
基于YOLOv4的目标检测知识蒸馏算法研究
Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4
计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204
[15] 祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋.
改进Faster R-CNN的光学遥感飞机目标检测
Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN
计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!