计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 113-118.doi: 10.11896/j.issn.1002-137X.2019.03.016

• 2018 中国多媒体大会 • 上一篇    下一篇

基于深度学习的视频转码快速算法

徐婧瑶,王祖林,徐迈   

  1. (北京航空航天大学电子信息工程学院 北京 100191)
  • 收稿日期:2018-07-11 修回日期:2018-09-15 出版日期:2019-03-15 发布日期:2019-03-22
  • 通讯作者: 徐迈(1981-),男,博士,教授,CCF会员,主要研究方向为视频通信、图像处理、计算机视觉,E-mail:maixu@buaa.edu.cn(通信作者)。
  • 作者简介:徐婧瑶(1994-),女,硕士生,主要研究方向为视频转码和深度学习;王祖林(1965-),男,博士,教授,主要研究方向为图像处理、通信信号处理等
  • 基金资助:
    国家自然科学基金(61573037)资助

Deep Learning Based Fast VideoTranscoding Algorithm

XU Jing-yao, WANG Zu-lin, XU Mai   

  1. (College of Electronics and Information Engineering,Beihang University,Beijing 100191,China)
  • Received:2018-07-11 Revised:2018-09-15 Online:2019-03-15 Published:2019-03-22

摘要: 由于良好的率失真表现,新一代视频压缩标准HEVC(High Efficiency Video Coding)得到了越来越多终端设备的支持。然而目前仍有大量的H.264码流存在,因此H.264到HEVC的高效视频转码具有重要的实际意义。实现H.264到HEVC转码最简单的方法,是将H.264解码端和HEVC编码端直接级联起来。由于HEVC编码过程的复杂度较高,这种方法的转码时间较长。针对H.264到HEVC转码耗时的问题,文中提出一种基于深度学习的方法来预测HEVC的CTU(Coding Tree Unit)块划分结果,从而避开HEVC对CTU所有块划分情况循环遍历以寻找率失真最优划分结构的过程,实现H.264到HEVC的快速转码。首先建立了一个H.264到HEVC转码的大型数据库,为训练深度学习模型提供数据保障;随后对H.264压缩域特征和HEVC的CTU块划分模式进行相关性分析,并发掘了CTU块划分模式在时序上的相似性,进而提出基于时间递归神经网络LSTM(Long Short-Term Memory)的三级分类器来预测HEVC的CTU划分。实验结果表明,与直接级联转码器相比,文中提出的H.264到HEVC快速转码算法实现了60%的时间节省,同时峰值信噪比仅下降了0.039kdB,其性能胜过近年来的转码算法的性能。

关键词: HEVC, H.264, 视频转码, 深度学习

Abstract: Due to the good rate-distortion performance,as the latest video compression standard,high efficiency video coding (HEVC) has been adopted by more and more terminals.However,there are still a large number of H.264 streams in the field of video compression.Therefore,H.264 to HEVC video transcoding is a meaningful research issue.The simplest way to achieve H.264 to HEVC transcoding is to directly cascade the H.264 decoder and the HEVC encoder.Due to high complexity of the HEVC coding process,this transcoding method is time-consuming.Therefore,this paper proposed a fast H.264 to HEVC transcoding method based on deep learning to predict the CTU(Coding Tree Unit) partition of HEVC,avoiding the brute-force search of CTU partition for rate-distortion optimization(RDO).First,a large-scale database of H.264 to HEVC transcoding is built for ensuring the training of deep learning model.Second,the correlation between HEVC CTU partition and H.264 domain features is analyzed,and the similarity of CTU partition across frames is found out.Then,a three-level classifier based on LSTM (Long Short-Term Memory) is designed to predict the CTU partition.The experimental results show that the H.264 to HEVC fast transcoding algorithm proposed in this paper achieves 60% reduction in complexity compared to the original transcoder,while the peak signal-to-noise ratio is only reduced by 0.039kdB,so the proposed method outperforms the state-of-the-art transcoding methods.

Key words: HEVC, H.264, Video transcoding, Deep learning

中图分类号: 

  • TN919.81
[1] LIU Z,YU X,GAO Y,et al.CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network[J].IEEE Transactions on Image Processing,2016,25(11):5088-5103.
[2] SHEN L,LIU Z,ZHANG X,et al.An Effective CU Size Decision Method for HEVC Encoders[J].IEEE Transactions on Multimedia,2013,15(2):465-470.
[3] ZHANG D,TONG J,ZAND D.Fast CU partition for H.264/AVC to HEVC transcoding based on fisher discriminant analysis[C]∥Visual Communications and Image Processing.IEEE,2017:1-4.
[4] PEIXOTO E,IZQUIERDO E.A complexity-scalable transcoder from H.264/AVC to the new HEVC codec[C]∥IEEE International Conference on Image Processing.IEEE,2012:737-740.
[5] NAGARAGHATTA A,ZHAO Y,MAXWELL G,et al.FastH.264/AVC to HEVC transcoding using mode merging and mode mapping[C]∥IEEE International Conference on ConsumerElectronics.Berlin:IEEE,2016:165-169.
[6] FRANCHE J F,COULOMBE S.Fast H.264 to HEVCtranscoder based on post-order traversal of quadtree structure[C]∥IEEE International Conference on Image Processing.IEEE,2015:477-481.
[7] PEIXOTO E,MACCHIAVELLO B,HUNG E M,et al.An H.264/AVC to HEVC video transcoder based on mode mapping[C]∥IEEE International Conference on Image Processing.IEEE,2014:1972-1976.
[8] PEIXOTOE,SHANABLEH T,IZQUIERDOE.H.264/AVC to HEVC Video Transcoder Based on Dynamic Thresholding and Content Modeling[J].IEEE Transactions on Circuits & Systems for Video Technology,2014,24(1):99-112.
[9] PEIXOTO E,MACCHIAVELLO B,QUEIROZ R L D,et al.Fast H.264/AVC to HEVC transcoding based on machine learning[C]∥Telecommunications Symposium.IEEE,2014:1-4.
[10] JIANG W,CHEN Y,TIAN X.Fast transcoding from H.264 to HEVC based on region feature analysis[J].Multimedia Tools & Applications,2014,73(3):2179-2200.
[11] DAZ-HONRUBIA A J,MARTNEZ J L,PUERTA J M,et al.Fast quadtree level decision algorithm for H.264/HEVC transcoder[C]∥IEEE International Conference on Image Processing.IEEE,2015:2497-2501.
[12] DAZ-HONRUBIA A J,MARTNEZ J L,CUENCA P,et al.Adaptive Fast Quadtree Level Decision Algorithm for H.264 to HEVC Video Transcoding[J].IEEE Transactions on Circuits & Systems for Video Technology,2016,26(1):154-168.
[13] CORREA G,AGOSTINI L,CRUZ L A D S.Fast H.264/AVC to HEVC transcoder based on data mining and decision trees[C]∥IEEE International Symposium on Circuits and Systems.IEEE,2016:2539-2542.
[14] ZHU L,ZHANG Y,LI N,et al.Machine learning based fast H.264/AVC to HEVC transcoding exploiting block partition similarity[J].Journal of Visual Communication & Image Representation,2016,38(C):824-837.
[15] Xiph.org.Xiph.org video test media[OL].https://media.xiph.org/video/derf/.
[16] XU M,DENG X,LI S,et al.Region-of-Interest Based Conversational HEVC Coding with Hierarchical Perception Model of Face[J].IEEE Journal of Selected Topics in Signal Processing,2014,8(3):475-489.
[17] OHM J R,SULLIVAN G J,TAN T K,et al.Comparison of theCoding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC)[J].IEEE Transactions on Circuits & Systems for Video Technology,2012,22(12):1669-1684.
[18] INGMA D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv preprint arXiv:141206980,2014.
[19] CORREA G,ASSUNCAO P A,AGOSTINI L V,et al.FastHEVC Encoding Decisions Using Data Mining[J].IEEE Tran-sactions on Circuits & Systems for Video Technology,2015,25(4):660-673.
[1] 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络[J]. 计算机科学, 2021, 48(1): 226-232.
[2] 于文家, 丁世飞. 基于自注意力机制的条件生成对抗网络[J]. 计算机科学, 2021, 48(1): 241-246.
[3] 仝鑫, 王斌君, 王润正, 潘孝勤. 面向自然语言处理的深度学习对抗样本综述[J]. 计算机科学, 2021, 48(1): 258-267.
[4] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[5] 何鑫, 许娟, 金莹莹. 行为关联网络:完整的变化行为建模[J]. 计算机科学, 2020, 47(9): 123-128.
[6] 叶亚男, 迟静, 于志平, 战玉丽, 张彩明. 基于改进CycleGan模型和区域分割的表情动画合成[J]. 计算机科学, 2020, 47(9): 142-149.
[7] 邓良, 许庚林, 李梦杰, 陈章进. 基于深度学习与多哈希相似度加权实现快速人脸识别[J]. 计算机科学, 2020, 47(9): 163-168.
[8] 暴雨轩, 芦天亮, 杜彦辉. 深度伪造视频检测技术综述[J]. 计算机科学, 2020, 47(9): 283-292.
[9] 袁野, 和晓歌, 朱定坤, 王富利, 谢浩然, 汪俊, 魏明强, 郭延文. 视觉图像显著性检测综述[J]. 计算机科学, 2020, 47(7): 84-91.
[10] 王文刀, 王润泽, 魏鑫磊, 漆云亮, 马义德. 基于堆叠式双向LSTM的心电图自动识别算法[J]. 计算机科学, 2020, 47(7): 118-124.
[11] 刘燕, 温静. 基于注意力机制的复杂场景文本检测[J]. 计算机科学, 2020, 47(7): 135-140.
[12] 张志扬, 张凤荔, 谭琪, 王瑞锦. 基于深度学习的信息级联预测方法综述[J]. 计算机科学, 2020, 47(7): 141-153.
[13] 蒋文斌, 符智, 彭晶, 祝简. 一种基于4Bit编码的深度学习梯度压缩算法[J]. 计算机科学, 2020, 47(7): 220-226.
[14] 陈晋音, 张敦杰, 林翔, 徐晓东, 朱子凌. 基于影响力最大化策略的抑制虚假消息传播的方法[J]. 计算机科学, 2020, 47(6A): 17-23.
[15] 程哲, 白茜, 张浩, 王世普, 梁宇. 使用深层卷积神经网络提高Hi-C 数据分辨率[J]. 计算机科学, 2020, 47(6A): 70-74.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .