计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 417-422.

• 大数据与数据挖掘 • 上一篇    下一篇

时态文本数据流特征流行趋势模型及算法

孟志青, 许微微   

  1. 浙江工业大学管理学院 杭州310023
  • 出版日期:2019-06-14 发布日期:2019-07-02
  • 通讯作者: 孟志青(1962-),男,博士,教授,主要研究方向为数据挖掘、最优决策理论,E-mail:mengzhiqing@zjut.edu.cn
  • 作者简介:许微微(1992-),女,硕士生,主要研究方向为数据挖掘、自然语言处理。
  • 基金资助:
    本文受浙江省自然科学基金项目(LY15G010007)资助。

Temporal Text Data Stream Feature Trend Model and Algorithm

MENG Zhi-qing, XU Wei-wei   

  1. School of Management,Zhejiang University of Technology,Hangzhou 310023,China
  • Online:2019-06-14 Published:2019-07-02

摘要: 当今在电商和社交等平台上每天会产生大量的文本数据流。快速提取文本数据流的特征并将其用于发现一些事物的趋势变化来指导企业运营十分重要,比如服装企业必须尽可能快速而又准确地感知流行信息,服装特征的流行趋势对设计生产与经营起着至关重要的作用。以线上商品的文本数据流为研究对象,结合线上的销售文本实时数据流,定义了商品的时态文本数据流特征趋势模型,然后提出了一种文本数据流特征趋势发现的实时挖掘算法。将该算法应用到服装销售的文本描述以提取流行特征应用,可以获得有效的服装流行趋势,为企业制定生产计划、选择营销策略提供了决策支持。使用电商平台的真实销售数据进行实验,结果证明:该算法提取流行特征的准确率较高、速度较快,具有重要的理论与实际意义。

关键词: 时态文本模型, 实时挖掘算法, 特征快速提取, 文本数据流

Abstract: Today,on the platform of e-commerce and social networking,there will be a lot of text data streams.It is very important to extract the characteristics of text data flow quickly to find some trend for guiding the operation of enterprises.For example,clothing enterprises must perceive popular information as quickly and accurately as possible.Fashion trends are of vital importance to the design,production and operation.Taken the text data flow of online goods as the research object,combining the online sales text real-time data flow,this paper defined a characteristic trend model of the temporal text data flow.Then,it proposed a real-time mining algorithm for text data stream feature trend finding.The algorithm was applied on the description of clothing sales text to extract popular feature applications.It can obtain an effective fashion trend and provide decision support for enterprises to formulate production plans and select marketing strategies.On the real sales data of the e-commerce platform,the experiment results prove that the algorithm has good accuracy and fast speed.Therefore,the proposed algorithm has important theoretical and practical significance.

Key words: Feature extraction, Real-time mining algorithm, Temporal text model, Text data stream

中图分类号: 

  • TP311
[1]RU H Y.Application research of fashion trend in product design[D].Shanghai:Donghua University,2015.
[2]TENG Z Y.Exploration of China’s teaching in prediction of fashion vogue[J].Journal of Textile Research,2011(5):112-117.
[3]CHANG L X,et al.Hue prediction on Intercolor for women’s spring/summer using GM(1,1) models[J].Journal of Textile Research,2015,36(4):128-133.
[4]YU Y,HUI C L,CHOI T M.An empirical study of intelligent expert systems on forecasting of fashion color trend[J].Expert Systems with Applications,2012,39(4):4383-4389.
[5]CHEN Y Y L.Study of cloud comfuting based clothing fashion trend forecasting mechanism[D].Shanghai:Donghua University,2016.
[6]LIU S,et al.Fashion Parsing With Weak Color-Category Labels[J].IEEE Transactions on Multimedia,2014,16(1):253-265.
[7]NOGUEIRA K,VELOSO A A,SANTOS J A D.Pointwise and pairwise clothing annotation:combining features from social media[J].Multimedia Tools and Applications,2016,75(7):4083-4113.
[8]HUANG C N,ZHAO H.Chinese word segmentation:a decade review[J].Journal Of Chinese Information Processing,2007(3):8-19.
[9]SALTON G,WONG A,YANG C S.A vector space model for automatic indexing[J].Communications of the Acm,1975,18(11):613-620.
[10]ERRA U,SENATORE S,MINNEUA F,et al.Approximate TF-IDF based on topic extraction from massive message stream using the GPU[J].Information Sciences,2015,292:143-161.
[11]HENZINGER M R,RAGHAVAN P,RAJAGOPALAN S. Computing on Data Streams[OL].http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63.7109&rep=rep1&type=pdf.
[12]BABCOCK B,BABU S,DATAR M,et al.Models and issues in data stream systems[C]∥ACM Sigact-Sigmod-Sigart Symposium on Principles of Database Systems.Madison,Wisconsin,USA,2002:3-5.
[13]孟志青,蒋敏,姜华.时态数据挖掘算法[M].北京:经济科学出版社,2014.
[14]JONES K S.A statistical interpretation of term specificity and its application in retrieval[J].Journal of Documentation,1972(1):11-21.
[15]SALTON G,BUCKLEY C.Term-weighting approaches in automatic text retrieval [J].Information Processing & Management,1988,24(5):513-523.
[16]YANG M,QI W,YAN X B,et al.A study on the effectiveness of online commodity reviews[J].Journal Of Management Sciences In China,2012,15(5):65-75.
[17]汝海洋.流行趋势在产品设计上的运用研究[D].上海:东华大学,2015.
[18]滕兆媛.基于实践的中国服装流行趋势预测教育探索[J].纺织学报,2011(5):112-117.
[19]常丽霞.灰色GM(1,1)模型在国际春夏女装流行色色相预测中的应用[J].纺织学报,2015(4):128-133.
[20]陈于依澜.基于云计算的服装流行趋势预测机制研究[D].上海:东华大学,2016.
[21]黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007(3):8-19.
[22]杨铭,祁巍,闫相斌,等.在线商品评论的效用分析研究[J].管理科学学报,2012,15(5):65-75.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 王子凯, 朱健, 张伯钧, 胡凯.
区块链与智能合约并行方法研究与实现
Research and Implementation of Parallel Method in Blockchain and Smart Contract
计算机科学, 2022, 49(9): 312-317. https://doi.org/10.11896/jsjkx.210800102
[3] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[4] 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚.
融合双向门控循环单元和注意力机制的软件自承认技术债识别方法
Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism
计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075
[5] 潘志勇, 程宝雷, 樊建席, 卞庆荣.
数据中心网络BCDC上的顶点独立生成树构造算法
Algorithm to Construct Node-independent Spanning Trees in Data Center Network BCDC
计算机科学, 2022, 49(7): 287-296. https://doi.org/10.11896/jsjkx.210500170
[6] 李瑭, 秦小麟, 迟贺宇, 费珂.
面向多无人系统的安全协同模型
Secure Coordination Model for Multiple Unmanned Systems
计算机科学, 2022, 49(7): 332-339. https://doi.org/10.11896/jsjkx.210600107
[7] 黄觉, 周春来.
基于本地化差分隐私的频率特征提取
Frequency Feature Extraction Based on Localized Differential Privacy
计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229
[8] 叶跃进, 李芳, 陈德训, 郭恒, 陈鑫.
基于国产众核架构的非结构网格分区块重构预处理算法研究
Study on Preprocessing Algorithm for Partition Reconnection of Unstructured-grid Based on Domestic Many-core Architecture
计算机科学, 2022, 49(6): 73-80. https://doi.org/10.11896/jsjkx.210900045
[9] 赵静文, 付岩, 吴艳霞, 陈俊文, 冯云, 董继斌, 刘嘉琪.
多线程数据竞争检测技术研究综述
Survey on Multithreaded Data Race Detection Techniques
计算机科学, 2022, 49(6): 89-98. https://doi.org/10.11896/jsjkx.210700187
[10] 陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香.
面向国产异构众核架构的CFD非结构网格计算并行优化方法
Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture
计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157
[11] 王毅, 李政浩, 陈星.
基于用户场景的Android 应用服务推荐方法
Recommendation of Android Application Services via User Scenarios
计算机科学, 2022, 49(6A): 267-271. https://doi.org/10.11896/jsjkx.210700123
[12] 傅丽玉, 陆歌皓, 吴义明, 罗娅玲.
区块链技术的研究及其发展综述
Overview of Research and Development of Blockchain Technology
计算机科学, 2022, 49(6A): 447-461. https://doi.org/10.11896/jsjkx.210600214
[13] 蒋成满, 华保健, 樊淇梁, 朱洪军, 徐波, 潘志中.
Python虚拟机本地代码的安全性实证研究
Empirical Security Study of Native Code in Python Virtual Machines
计算机科学, 2022, 49(6A): 474-479. https://doi.org/10.11896/jsjkx.210600200
[14] 袁昊男, 王瑞锦, 郑博文, 吴邦彦.
基于Fabric的电子病历跨链可信共享系统设计与实现
Design and Implementation of Cross-chain Trusted EMR Sharing System Based on Fabric
计算机科学, 2022, 49(6A): 490-495. https://doi.org/10.11896/jsjkx.210500063
[15] 陈钧吾, 余华山.
面向无尺度图的Δ-stepping算法改进策略
Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs
计算机科学, 2022, 49(6A): 594-600. https://doi.org/10.11896/jsjkx.210400062
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!