计算机科学 ›› 2021, Vol. 48 ›› Issue (6A): 299-305.doi: 10.11896/jsjkx.200500157

• 智能计算 • 上一篇    下一篇

基于边界感知的复杂名词短语的识别和转换研究

刘小蝶   

  1. 北京联合大学 北京100101
  • 出版日期:2021-06-10 发布日期:2021-06-17
  • 通讯作者: 刘小蝶(liuxiaodie@buu.edu.cn)
  • 基金资助:
    国家自然科学基金(71974095)

Recognition and Transformation for Complex Noun Phrases Based on Boundary Perception

LIU Xiao-die   

  1. Beijing Union University,Beijing 100101,China
  • Online:2021-06-10 Published:2021-06-17
  • About author:LIU Xiao-die,born in 1984,Ph.D,lecturer.Her main research interests include Chinese information processing and corpus linguistics.
  • Supported by:
    National Natural Science Foundation of China(71974095).

摘要: 为了改善专利机器翻译中复杂名词短语的翻译效果,提出了一种基于规则的复杂名词短语识别和转换方法。通过分析汉英复杂名词短语的语义块和组合单元,利用边界感知策略,抽取汉语语言特征词,为汉语复杂名词短语中组合单元边界识别编制了57条识别规则,设计了合并策略,得到汉语复杂名词短语的形式化结构。通过对比汉英复杂名词短语的差异,确定了汉英复杂名词短语的转换策略。最后,将识别规则、合并策略和转换策略应用到一个机器翻译系统中。测试结果表明,所提方法可以有效地实现复杂名词短语的识别和转换,提高专利文本中复杂名词短语的机器翻译效果。

关键词: 边界感知, 规则, 机器翻译, 名词短语, 识别, 专利, 转换

Abstract: This paper proposes a rules-based method for recognizing and transforming the complex Noun Phrases to improve the translation quality of them in patent machine translation.By analyzing the semantic chunks and the structural units of Chinese and English complex Noun Phrases,under the guide of the boundary perception,this paper extracts the feature words,builds 57 re-cognition rules,designs combination strategies and realizes the formalization of Chinese complex Noun Phrases.By comparing Chinese and English complex Noun Phrases,this paper summarizes the differences between them,and determines the transformation strategies based on that.At last,it applies the method to an existing machine translation system to test our work.Experimental results show that our rules and strategy are very efficient,and improve the translation quality in patent machine translation.

Key words: Boundary perception, Machine translation, Noun phrase, Patent, Recognize, Rules, Transform

中图分类号: 

  • TP391
[1] 张冬梅,晋耀红.面向专利机器翻译的要素句蜕识别和转换研究[J].计算机科学,2014,41(S1):67-71.
[2] 池毓焕.多元逻辑组合的汉英对比初探.第二届NNC与语言学研讨会论文集[M].北京:海洋出版社,2004:308-312.
[3] 熊亮.非句蜕广义对象语义块的分析与处理[D].北京:中国科学院声学所,2006.
[4] 李千驹.基于HNC理论的汉英逻辑组合变换研究[D].北京:北京师范大学,2008.
[5] 李颖,王侃,池毓焕.面向汉英机器翻译的语义块构成变换[M].北京:科学出版社,2009:91-124.
[6] 詹卫东.面向中文信息处理的现代汉语短语结构规则研究[M].北京:清华大学出版社出版,2000:61-75.
[7] 李素建,刘群.汉语组块的定义和获取.语言计算与基于内容的文本处理[M].北京:清华大学出版社,2003:110-115.
[8] 胡乃全,朱巧明,周国栋.混合的汉语基本名词短语识别方法[J].计算机工程,2009,35(20):199-201.
[9] 田雪,黄德根.一种混合的汉语简单名词短语识别方法[J].小型微型计算机系统,2017,38(4):749-754.
[10] 姜亚辉,姬东鸿.结合半监督与主动学习的复杂名词短语识别[J].计算机工程与设计,2015,36(2):498-501,506.
[11] ZHU Y,JIN Y H.A Chinese-English patent machine translation system based on the theory of hierarchical network of concepts[J].The Journal of China Universities and Telecommunications,2012,19:140-146.
[12] 刘小蝶,朱筠,晋耀红.中文专利中有标记并列结构的自动识别研究[J].计算机工程,2018,44(6):162-168,175.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 胡安祥, 尹小康, 朱肖雅, 刘胜利.
基于数据流特征的比较类函数识别方法
Strcmp-like Function Identification Method Based on Data Flow Feature Matching
计算机科学, 2022, 49(9): 326-332. https://doi.org/10.11896/jsjkx.220200163
[3] 陈坤峰, 潘志松, 王家宝, 施蕾, 张锦.
基于双目叠加仿生的微换衣行人再识别
Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation
计算机科学, 2022, 49(8): 165-171. https://doi.org/10.11896/jsjkx.210600140
[4] 张露萍, 徐飞.
具有突触规则的脉冲神经膜系统综述
Survey on Spiking Neural P Systems with Rules on Synapses
计算机科学, 2022, 49(8): 217-224. https://doi.org/10.11896/jsjkx.220300078
[5] 杨炳新, 郭艳蓉, 郝世杰, 洪日昌.
基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用
Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition
计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070
[6] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[7] 孟月波, 穆思蓉, 刘光辉, 徐胜军, 韩九强.
基于向量注意力机制GoogLeNet-GMP的行人重识别方法
Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism
计算机科学, 2022, 49(7): 142-147. https://doi.org/10.11896/jsjkx.210600198
[8] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[9] 李瑭, 秦小麟, 迟贺宇, 费珂.
面向多无人系统的安全协同模型
Secure Coordination Model for Multiple Unmanned Systems
计算机科学, 2022, 49(7): 332-339. https://doi.org/10.11896/jsjkx.210600107
[10] 费星瑞, 谢逸.
基于HMM-NN的用户点击流识别
Click Streams Recognition for Web Users Based on HMM-NN
计算机科学, 2022, 49(7): 340-349. https://doi.org/10.11896/jsjkx.210600127
[11] 单晓英, 任迎春.
基于改进麻雀搜索优化支持向量机的渔船捕捞方式识别
Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm
计算机科学, 2022, 49(6A): 211-216. https://doi.org/10.11896/jsjkx.220300216
[12] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[13] 郝强, 李杰, 张曼, 王路.
基于改进YOLOv3的空间非合作目标部件识别算法
Spatial Non-cooperative Target Components Recognition Algorithm Based on Improved YOLOv3
计算机科学, 2022, 49(6A): 358-362. https://doi.org/10.11896/jsjkx.210700048
[14] 郭星辰, 俞一彪.
具有仿冒攻击检测的鲁棒性说话人识别
Robust Speaker Verification with Spoofing Attack Detection
计算机科学, 2022, 49(6A): 531-536. https://doi.org/10.11896/jsjkx.210500147
[15] 曹扬晨, 朱国胜, 孙文和, 吴善超.
未知网络攻击识别关键技术研究
Study on Key Technologies of Unknown Network Attack Identification
计算机科学, 2022, 49(6A): 581-587. https://doi.org/10.11896/jsjkx.210400044
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!