A Multi-Modal Phishing Detection System

20260 citationsJournal Articlegreen Open Access

Authors

Amrit Lal P Siva · Federal Ministry of Innovation, Science and Technology

Abstract

Phishing attacks have evolved significantly, employing visual mimicry, semantic deception, and network-level manipulation to bypass traditional detection systems. Conventional approaches based on URL blacklists or single-modal feature analysis often fail against zero-day and dynamically generated phishing pages. This paper presents a multi-modal phishing detection framework that integrates URL lexical features, HTML/DOM structural attributes, visual cues, semantic content, and network-based indicators. Structured features are processed using a stacked ensemble model comprising Logistic Regression, LightGBM, and Linear SVM classifiers. Webpage screenshots are analyzed using a fine-tuned EfficientNet-B0 model to extract visual embeddings, while semantic representations are generated using DeBERTa-v3 Base to identify deceptive language patterns. These heterogeneous features are fused through dense neural layers to produce a final phishing probability score. The system incorporates cost-sensitive learning to address class imbalance and integrates explainability mechanisms, including Grad-CAM visualization and DOM-level feature highlighting. The proposed architecture aims to deliver a scalable, adaptive, and interpretable solution for detecting modern phishing attacks across multiple content modalities.

Topics & Keywords

Spam and Phishing Detection Misinformation and Its Impacts Advanced Malware Detection Techniques

UN Sustainable Development Goals

Peace, Justice and strong institutions

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.19335618

Field-Weighted Citation Impact: 0.00