Search for a command to run...
Malware continues to evolve in ways that reduce the effectiveness of signature matching and evade analysis environments, creating demand for early detection methods that can flag suspicious binaries before behavior is observed. This article surveys information-theoretic approaches for early malware detection, focusing on the progression from classical uncertainty measures to algorithmic notions of complexity. We first discuss Shannon entropy as a lightweight indicator of packing, encryption, and obfuscation, and explain how entropy computed over whole files, executable sections, or sliding windows can localize anomalous regions in portable executable binaries. We then examine algorithmic complexity through the lens of Kolmogorov complexity and outline practical approximations using general-purpose compression. Compression-based measures provide a way to estimate structural regularities in binaries that are not captured by frequency statistics alone. Building on this idea, we introduce normalized compression distance as a featureless similarity measure that enables clustering and nearest-neighbor style detection without handcrafted features. The survey highlights how entropy, compressibility, and compression-based similarity can be combined into hybrid pipelines that support triage, prioritization, and explainable inspection, while also noting key limitations. High entropy is not unique to malware and can arise in legitimate packed installers, multimedia resources, or encrypted payloads, leading to false alarms if used in isolation. Compression-based methods can be computationally demanding and sensitive to file size, compressor choice, and adversarial manipulation. By synthesizing these techniques and their practical considerations, this article provides guidance for designing robust early-warning detectors and for integrating information-theoretic signals with complementary static and learning-based components in operational settings.
Published in: Asian Journal of Research in Computer Science
Volume 19, Issue 3, pp. 24-36