Search for a command to run...
The sophistication of cyber attacks and privacy issues related to data sharing is improving and requires a decentralized approach. Conventional centralized approaches to IDS pose a threat to the privacy of data and data sovereignty. Contrarily, federated learning enables several clients to learn simultaneously without sharing their sensitive information, which is one of the most promising solutions to studying cyber threats in real time. This framework also adds value to IDS by using CTI, which is incorporated into the training process to make it more accurate in its detection while still maintaining privacy. Each client uses the local model, which is a random forest model that is trained on local datasets without sharing the raw data. Multiple aggregation methods, such as FedAvg, FedOPT, FedProx, and FedXGBoost, are then used to combine the local models into a global model. These techniques are judged with regard to accuracy and Cohen’s Kappa Score. The performance of various models in the NF-UNSW-NB15-v2 dataset experiments was tested. The local model took a value of 0.9941–0.9934 with Kappa scores of 0.8336–0.8088, showing strong performance in different configurations. The FedXGBoost aggregated global model was best in terms of its highest accuracy of 99.22 (Kappa score of 0.8417). More experiments were done on the DFedForest and DFedForest++ models. DFedForest++, incorporating diversity in local models alongside validation accuracy, achieved 99.76% accuracy, surpassing DFedForest (with 71% accuracy in local models). This framework operationalizes CTI through feature augmentation—appending three CTI-derived features (is_known_malicious_ip, is_suspicious_port, and ttp_match_score from MITRE ATT&CK v14 and AlienVault OTX) to each NetFlow record locally at each client before federated training begins. These results highlight the advantages of federated learning in providing collaborative, privacy-preserving solutions for cyber threat detection and emphasize the potential of CTI integration for improving the accuracy and robustness of IDS models across decentralized environments.