Kaspersky Lab today announces that it has successfully patented a system and method for the efficient and exact comparison of software elements. Patent №8499167, granted by the United States Patent and Trademark office, describes a file comparison method that determines the degree of similarity between files in a more efficient manner. It also successfully detects malicious objects that have been modified in an attempt to bypass the detection mechanisms in security software.
Kaspersky Lab experts detect about 200,000 new malware samples every day; a year ago that figure was 125,000. Because of the rapid rise in the amount of malware, the ability to detect new threats fast and accurately is of greater importance to antivirus vendors.
One way to check for a malicious presence is to compare an unknown file with an existing collection of malicious objects. If a comparison shows the new file is very similar to one or several files in the collection, in most cases that file turns out to be malicious. Comparing a new file stream with a collection of known samples helps effectively combat the huge number of malicious software appearing every day. However, the comparison mechanism itself is not ideal.
Two files can be compared using the knowledge of their structure – this method is widely used in the antivirus industry. However, cybercriminals often litter their files with random data, which changes the file structure and means they don’t show any similarities to malware samples in existing collections.
At the same time, emulation is often used to determine a file’s functions and to check if there are any similar files in the malware collection. This approach involves a file being run in a virtual environment where information about its behavior is collected. This data is later compared with existing information on malware behavior. If there are similarities, the file is considered malicious. However, emulation is a resource-intensive and comparatively long process. Moreover, some files can recognise when they are being launched in a virtual environment and stop functioning.
The experts at Kaspersky Lab took all these characteristics into account when developing the new file comparison technology.
The Strings Theory
The newly patented technology is based on the idea that a file’s functionality can be determined based on analysing the strings contained in it, even before the file is executed. Strings in a file provide information on how it will be executed in the operating system (file names, registry keys, web links), i.e., the file’s ‘synopsis’ of sorts. Historically, practical implementation of this idea ran into a problem: which strings in a file should be analysed? How do you identify those which indicate that a file has malicious functionality? How should the search be performed to produce results in reasonable time and with minimal consumption of resources?
The technology patented by Kaspersky Lab describes an algorithm for comparing files string-by-string to determine how similar their functionality is. When the antivirus lab receives an unknown file, a special program analyses the strings contained in it, then filters out those which are not relevant based on a set of rules and then compares the remaining strings with a collection of malicious files analysed in a similar way.
“What sets our technology apart is that it can quickly compare file ‘synopses’ to an enormous malware database and ‘knows’ where in the file to look for the key elements, an analysis of which can provide an insight into the file’s functionality,” commented Alexey Malanov, a Kaspersky Lab malware expert and developer of the newly-patented technology.
Although the technology was not patented until recently, Kaspersky Lab experts have been using it to detect new malware samples for a long time.