Computer Science ›› 2011, Vol. 38 ›› Issue (12): 182-186.
Previous Articles Next Articles
Online:
Published:
Abstract: Mining user identity information from emails is an important research topic in data mining. Most approaches extract users' names only from the email headers, but names appearing in email bodies are usually more suitable for representing the sender's or recipient’s identity. This paper focused on extracting users’name aliases in the body of plain-text emails. Firstly,to effectively elicit salutation and signature block from email bodies,a salutation and signature blocks locating algorithm based on statistical and rules restricted methods was proposed. I}hen to extract all valid aliases in the salutation and signature lines, a novel approach was proposed based on name boundary word template built on the characteristics of alias neighboring words,which can verify and amend aliases identified by named entity recognition or part-of-speech tagging tools. Results on Enron corpus indicate that the approaches proposed can efficiently and automatically extract user's aliases from email bodies.
Key words: Entity resolution, Email body, Alias Extraction, Salutation and signature blocks locating, Name boundary word template
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://www.jsjkx.com/EN/
https://www.jsjkx.com/EN/Y2011/V38/I12/182
Cited