Search for a command to run...
We examine the hypothesis that the decision boundary between malware and non-malware is fractal. We introduce a novel encoding method derived from text mining for converting disassembled programs first into opstrings and then filter these into a reduced opcode alphabet. These opcodes are enumerated and encoded into real floating point number format and used for characterizing frequency of occurrence and distribution properties of malware functions to compare with non-malware functions. We use the concept of invariant moments to characterize the highly non-Gaussian structure of the opcode distributions. We then derive Data Model based classifiers from identified features and interpolate and extrapolate the parameter sample space for the derived Data Models. This is done to examine the nature of the parameter space classification boundary between families of malware and the general non-malware category. Preliminary results strongly support the fractal boundary hypothesis, and a summary of our methods and results are presented here.
Published in: Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
DOI: 10.1117/12.941769