基于关联规则的自动构词算法研究

doi:10.11896/j.issn.1002-137X.2014.11.049

Abstract

Abstract: Words are the basic elements of Chinese text,and Chinese language model plays a key role in Chinese text mining.Text classification is a data mining technology with high dimensions and most of the classifying algorithms are sensitive to the dimensions.As a result,the classification depends on the quantity of vocabularies.Besides,most of current Chinese language models are based on statistical theory,such as N-gram model and other improved models.Howe-ver,these statistical models are disadvantaged with computational complexity.In order to improve the quantity and efficiency,this paper gave Chinese words a new definition based on association rules,and proposed the Auto-word algorithm,by which a word vocabulary is constructed automatically and used for Chinese text mining.Finally,the efficiency of the Auto-word algorithm was proved by experiment.

Key words: Constructing words automatically,Statistical language model,Association rules,Longest common subsequence,Text classification

WANG Jian-quan and JI Shao-bo. Research and Application on Auto-word Building[J].Computer Science, 2014, 41(11): 256-259.

References

[1] 苏菲,王丹力,戴国忠.基于标记的规则统计模型与未登录词识别算法[J].计算机工程与应用,2004(15):43-45
[2] 李伟,吴及,吕萍.基于前后向语言模型的语音识别词图生成算法[J].计算机应用,2010,30(10):2563-2566
[3] 刘群.统计机器翻译综述[J].中文信息学报,2003,17(4):1-12
[4] 张苗,张德贤.多类支持向量机文本分类方法[J].计算机技术与发展,2008,18(3):139-141
[5] 刘红岩,陈剑,陈国青.数据挖掘中的数据分类算法综述[J].清华大学学报:自然科学版,2002,42(6):727-730
[6] 张启宇,朱玲,张雅萍.中文分词算法研究综述[J].情报探索,2008(11):53-56
[7] 俞士汶,朱学锋,王惠,等.现代汉语语法信息词典详解[M].北京:清华大学出版社,1998
[8] 肖镜辉,刘秉权,王晓龙.面向汉语建模的自适应词表生成算法[J].自动化学报,2008,24(1):40-47
[9] 刘君强,孙晓莹,潘云鹤.关联规则挖掘技术研究的新进展[J].计算机科学,2004,31(1):110-113
[10] Agrawal R,Srikant R.Fast algorithms for mining association rules[C]∥Proc.20th Int.Conf.Very Large Data Bases(VLDB).1994,1215:487-499
[11] Amir A,Feldman R,Kashi R.A new and versatile method for association generation [M]∥Principles of Data Mining and Knowledge Discovery.Springer Berlin Heidelberg,1997:221-231
[12] 王映龙,杨炳儒,宋泽锋,等.基因序列相似程度的LCS算法研究[J].计算机工程与应用,2007,41(31):45-47
[13] Fung P.Extracting key terms from Chinese and Japanese texts[J].Computer Processing of Oriental Languages,1998,12(1):99-121
[14] 程苗,陈华平.基于Hadoop的Web日志挖掘[J].计算机工程,2011,37(11):37-39

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research and Application on Auto-word Building

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0