基于数据挖掘的垃圾E-mail过滤方法
Junk E-mail Filtering Method Based on Data Mining
-
摘要: 分析了己有的垃圾E-mail过滤规则的算法的本质,给出了将邮件表示成事务的方法。在将垃圾邮件的训练集表示成事务集后,可以用FP-Tree算法从训练集中挖掘出频繁特征集。同时探讨了如何将一个频繁特征集表示成规则及如何使用规则的问题。Abstract: The authors analyze the essence of some existing algorithms of junk E-mail filtering rules and give the way to represent an E-mail as a transaction. After the datasets are expressed as the transaction sets, the frequent feature sets can be mined from the datasets by means of FP-Tree algorithm. Finally, the problem about how to express the frequent feature sets as rules and how to apply the rules is also discussed.