计算机科学 ›› 2020, Vol. 47 ›› Issue (5): 120-123.doi: 10.11896/jsjkx.190900111
郑伟哲1, 仇鹏2, 韦娟2
ZHENG Wei-zhe1, QIU Peng2, WEI Juan2
摘要: 目前大多数声音识别检测的研究都是基于强标签数据集的,但在真实环境的声音识别与检测任务中,音频标签不完整并且含有大量噪声,使得获取强标签音频数据比较困难,进而影响对声音的准确识别与检测。为此,在卷积循环神经网络模型的基础上,提出了一种多尺度注意力融合机制。该机制使用注意力门控单元,在降低声音时频图特征中噪声影响的同时,能够更多地利用有效特征。同时,通过结合多个尺寸的卷积核进行特征融合,进一步提升对声音特征的有效提取。此外,采用一种结合帧检测结果的加权法对声音信号进行识别。最后,在弱标签环境下,从AudioSet数据库中选取一个包含17种城市交通工具声音的弱标签数据集进行检测识别,所提模型对测试集声音识别结果的F1值为58.9%,检测结果的F1值为43.7%。结果表明,在弱标签城市交通工具声数据集下,网络模型相比传统的声音识别检测模型具有更高的识别检测精度;同时,重要性加权识别方法、多尺度注意力融合方法均可提升模型对声音识别检测的精度。
中图分类号:
[1]KUMAR A,RAJ B.Audio event detection using weakly labeled data[C]//Proceedings of the 2016 ACM on MultimediaConfe-rence.ACM,2016:1038-1047. [2]TSENG S Y,LI J,WANG Y,et al.Multiple Instance DeepLearning for Weakly Supervised Small-Footprint Audio Event Detection[C]//Proc.Interspeech.2018:1-5. [3]CHOU S Y,JANG J S,YANG Y H.Frame CNN:A weakly supervised learning framework for frame-wise acoustic event detection and classification [R].DCASE2017 Challenge,2017. [4]DIMITROV S,BRITZ J,BRANDHERM B,et al.Analyzingsounds of home environment for device recognition[C]//European Conference on Ambient Intelligence.Cham:Springer,2014:1-16. [5]BOGDANOV D,WACK N,GóMEZ E,et al.Essentia:an open-source library for sound and music analysis[C]//Proceedings of the 21st ACM international conference on Multimedia.ACM,2013:855-858. [6]ANWAR,M Z,KALEEM Z,et al.Machine learning inspired sound-based amateur drone detection for public safety applications [J].IEEE Transactions on Vehicular Technology,2019(68):2526-2534. [7]PARASCANDOLO G,HEITTOLA T,HUTTUNEN H,et al.Convolutional recurrent neural networks for polyphonic sound event detection [J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2017,2(6):1291-1303. [8]ZHOU Q,FENG Z R,BENETOS E.Adaptive Noise Reduction for Sound Event Detection Using Subband-Weighted NMF [J].sensors,2019,19(14):3206. [9]CAKIR E,VIRTANEN T.End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input[C]//2018 International Joint Conference on Neural Networks (IJCNN).2018. [10]XIA X,TOGNERI R,SOHEL F,et al.Random Forest Classification based Acoustic Event Detection Utilizing Contextual-Information and Bottleneck Features[J].Pattern Recognition,2018(81):1-13. [11]CHOI K,FAZEKAS G,SANDLER M.Automatic tagging using deep convolutional neural networks [J].arXiv:1606.00298. [12]XU Y,KONG Q,HUANG Q,et al.Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging[C]//Proc.Interspeech.2017:3083-3087. [13]XU Y,KONG Q,HUANG Q,et al.Convolutional gated recurrent neural network incorporating spatial features for audio tagging[C]//2017 International Joint Conference on Neural Networks (IJCNN).IEEE,2017:3461-3466. [14]XU Y,KONG Q,WANG W,et al.Large-scale weakly super-vised audio classification using gated convolutional neural network[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:121-125. [15]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:1-9. [16]GEMMEKE J F,ELLIS D P W,FREEDMAN D,et al.Audio set:An ontology and human-labeled dataset for audio events[C]//2017 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2017:776-780. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 吴子仪, 李邵梅, 姜梦函, 张建朋. 基于自注意力模型的本体对齐方法 Ontology Alignment Method Based on Self-attention 计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190 |
[3] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[4] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[5] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[6] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[7] | 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100 |
[8] | 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094 |
[9] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[10] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[11] | 方义秋, 张震坤, 葛君伟. 基于自注意力机制和迁移学习的跨领域推荐算法 Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning 计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011 |
[12] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[13] | 魏恺轩, 付莹. 基于重参数化多尺度融合网络的高效极暗光原始图像降噪 Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising 计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179 |
[14] | 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉. 基于边框距离度量的增量目标检测方法 Incremental Object Detection Method Based on Border Distance Measurement 计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132 |
[15] | 陈坤峰, 潘志松, 王家宝, 施蕾, 张锦. 基于双目叠加仿生的微换衣行人再识别 Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation 计算机科学, 2022, 49(8): 165-171. https://doi.org/10.11896/jsjkx.210600140 |
|