Search for a command to run...
Email phishing and spam pose considerable cybersecurity risks. They require trustworthy, effective, and feasible detection methods. This research work proposes a model-based methodology for e-mail spam detection and phishing is based on artificial intelligence (AI). It works with a binary classification system with two phases. At the first stage, the system classifies email contents into malicious and non-malicious. In the next stage, it scans embedded URLs, which may or may not be phishing hooks. This modular design reduces the complexity of the feature space and enables separate optimizations for the email and the URL analysis. The system is trained with 18650 email samples and 549346 url samples from publicly accessible datasets, with 70% for training and 30% for testing. The preprocessing step consisted in eliminating duplicates and null values, text normalizing, balancing classes, stemming and feature extraction using TF-IDF for email and CountVectorizer for url. Four lightweight ML algorithms were evaluated: Naive Bayes, Decision Tree, Random Forest and K-Nearest Neighbors. The result indicated that the Naive Bayes achieved the highest baseline accuracy of 96% in email classification and 97% in URL classification. Random Forest, on the other hand, was more resilient to adversarial attacks and demonstrated better generalization. The selected model was deployed with Gmail for real time inbox detection with an accuracy of 85% in real world applications. The results demonstrate that by integrating lightweight machine learning, modular design, and relatively clean pre-processing, a new generation of effective, scalable detectors for both phishing and spam e-mail can be constructed.