Email Spam Classifier using Naïve Baye's Algorithm

Prepesh Tuladhar
2021
BSc.CSIT
Semester 7
Downloads 9

Electronic Mail is one of the most convenient and reliable method for communication and inexpensive way for communication regardless of the distance. However, increasing volume of unsolicited emails is degrading the productivity. Electronic spamming is the use electronic messaging systems to send an unsolicited message (spam), especially advertisements, as well as sending messages repeatedly on the same website. While the most widely recognized form of spam is email spam, the term is applied to similar abuses in other media: instant messaging spam, Web search engine spam, spam in blogs, wiki spam, mobile phone messaging spam, junk fax transmissions, social spam, spam mobile apps, etc. The Naïve Bayesian Spam Classification has been used in this project where it designed to detect spam emails and separate them from ham emails. A Bayesian network is representation of probabilistic relationships. The algorithm was trained using Enron Dataset which is a very known spam email dataset. This project will show that Bayesian filtering can be simply implemented for a reasonably accurate text classifier and that can be modified to make a significant impact on the accuracy of the filter. A web application is developed which would input the emails by the user and receives the predicted probability that if the given email is spam or ham. The output obtained will provide the prediction if the email is either spam or ham as per the datasets. Experimental results have been collected using Enron dataset that consists of total 33,316 data including both spam and ham. The accuracy obtained using Naïve Bayes classifier is Bayes classifier is 98.81%.

Email Classification
Naïve Bayes Classifier
Enron Dataset
spam
ham
emails

Similar Projects