Salvatore J. Stolfo, Ke Wang, Wei-Jen Li, Columbia University
Malcode can be easily hidden in document files and embedded in application executables. We demonstrate this opportunity of stealthy malcode insertion in several experiments using a standard COTS Anti-Virus (AV) scanner. In the case of zero-day malicious exploit code, signature-based AV scanners would fail to detect such malcode even if the scanner knew where to look. We propose the use of statistical binary content analysis of files in order to detect suspicious anomalous file segments that may suggest infection by malcode. Experiments are performed to determine whether the approach of n-gram analysis may provide useful evidence of an infected file that would subsequently be subjected to further scrutiny. Our goal is to develop an efficient means of detecting suspect infected files for application to online network communication or scanning a large store of collected information, such as a data warehouse of shared documents.