Classification of malware using clustering that orders events in accordance with the time of occurance

التفاصيل البيبلوغرافية
العنوان: Classification of malware using clustering that orders events in accordance with the time of occurance
Patent Number: 7,809,670
تاريخ النشر: October 05, 2010
Appl. No: 11/608625
Application Filed: December 08, 2006
مستخلص: The present invention is directed to a method and system for automatically classifying an application into an application group which is previously classified in a knowledge base. More specifically, a runtime behavior of an application is captured as a series of events which are monitored and recorded during the execution of the application. The series of events are analyzed to find a proper application group which shares common runtime behavior patterns with the application. The knowledge base of application groups is previously constructed based on a large number of sample applications. The construction of the knowledge base is done in such a manner that each sample application can be classified into application groups based on a set of classification rules in the knowledge base. The set of classification rules are applied to a new application in order to classify the new application into one of the application groups.
Inventors: Lee, Tony (San Francisco, CA, US); Mody, Jigar J. (Bellevue, WA, US); Lin, Ying Lena (Sammamish, WA, US); Marinescu, Adrian M. (Sammamish, WA, US); Polyakov, Alexey A. (Sammamish, WA, US)
Assignees: Microsoft Corporation (Redmond, WA, US)
Claim: 1. A computer system for an application group classification, the computer system comprising: one or more databases for including a plurality of application groups and a set of application classifying rules, wherein each application group has been classified based on the set of application classifying rules and includes one or more member applications sharing a set of common behavior patterns according to an event sequence, wherein each event includes information about the event's action including event object information, event subject information, action parameters and status of the action; and a computing device, in communication with the one or more databases, that performs the following: receives a request to classify an application into a corresponding application group among the plurality of application groups; obtains an event sequence according to a predefined specific set of events that identify two or more specified malware features, the predefined events including at least one of “registry open,” “registry write,” “registry query,” “file open,” “file write,” “virtual memory allocation,” “virtual memory write,” “network access,” and “process open,” wherein the event sequence is collected while the application is being executed, and wherein the collected events are ordered according to the time of the occurrence of program actions and environment state transitions; determines the corresponding application group for the application by applying the set of application classifying rules to the obtained event sequence; and provides information about the determined application group as a response to the request.
Claim: 2. The computer system of claim 1 , wherein sample applications are classified into the plurality of application groups through a clustering process.
Claim: 3. The computer system of claim 1 , wherein the computing device adds the application into the determined application group as a member application.
Claim: 4. The computer system of claim 1 , wherein the set of application classifying rules calculate and compare a degree of similarity between the application and each application group to determine the application group for the application.
Claim: 5. The computer system of claim 1 , wherein each member of the corresponding application group has an event sequence similar to the obtained event sequence.
Claim: 6. The computer system of claim 5 , wherein the degree of similarity is measured via an event edit cost analysis.
Claim: 7. A computer implemented method for classifying an application into an application group based on behavior patterns of the application, the computer implemented method comprising: collecting an event sequence of an application; obtaining a set of application groups, wherein each application group includes one or more member applications sharing a set of common behavior patterns according to an event sequence, wherein each event includes information about the event's action including event object information, event subject information, action parameters and status of the action; determining an application group corresponding to the application based on the collected event sequence, wherein the event sequence is obtained according to a predefined specific set of events that identify two or more specified malware features, the predefined events including at least one of “registry open,” “registry write,” “registry query,” “file open,” “file write,” “virtual memory allocation,” “virtual memory write,” “network access,” and “process open,” wherein the event sequence is collected while the application is being executed, and wherein the collected events are ordered according to the time of the occurrence of program actions and environment state transitions; and updating the determined application group to include the application; and providing information about the determined application group.
Claim: 8. The computer implemented method of claim 7 , wherein the set of common behavior patterns is represented by a representative event sequence.
Claim: 9. The computer implemented method of claim 8 , wherein determining an application group corresponding to the application includes calculating a similarity distance between the application and each application group; and wherein the similarity distance is calculated by comparing the representative event sequence of each application group and the collected event sequence.
Claim: 10. The computer implemented method of claim 9 , further comprising: identifying an application group which has a closest similarity distance from the application, wherein the identified application group is selected to be the application group corresponding to the application.
Claim: 11. The computer implemented method of claim 10 , further comprising: if no application group is determined to correspond to the application, determining whether a new application group needs to be created; and if a new application group needs to be created, creating the new application group to include the application.
Claim: 12. The computer implemented method of claim 10 , wherein when the similarity distances between the application and the set of application groups exceed a threshold, no application group is determined to correspond to the application.
Claim: 13. The computer implemented method of claim 11 , further comprising: if a new group needs not to be created, providing information indicating that no application group is determined for the application.
Claim: 14. The computer implemented method of claim 10 , further comprising: if more than one application groups are determined to have the closest similarity distance, reorganizing the set of application groups so that only one application group has the closest similarity distance from the application.
Claim: 15. The computer implemented method of claim 12 , wherein the similarity distance is calculated via a distance measure which defines a similarity distance among event sequences.
Claim: 16. An application classification system for automatically classifying an application in response to a classification request, the application classification system comprising: a knowledge base component for providing information about a plurality of application classifications and a set of classification rules; an event sequence component for collecting an runtime event sequence of an application in response to a classification request, wherein each event includes information about the event's action including event object information, event subject information, action parameters and status of the action, and wherein the event sequence is obtained according to a predefined specific set of events that identify two or more specified malware features, the predefined events including at least one of “registry open,” “registry write,” “registry query,” “file open,” “file write,” “virtual memory allocation,” “virtual memory write,” “network access,” and “process open,” wherein the event sequence is collected while the application is being executed, and wherein the collected events are ordered according to the time of the occurrence of program actions and environment state transitions; and a classification component for identifying an application classification in which the application is to be classified based on the collected runtime event sequence and for adding the application to the identified application classification, wherein the identified application classification has a reprehensive event sequence similar to the collected runtime event sequence; and wherein the information about the determined application group includes an indication that the application is a variant of a known malware application.
Claim: 17. The application classification system of claim 16 , wherein the set of classification rules are defined for an enterprise system.
Claim: 18. The application classification system of claim 17 , wherein the knowledge base component builds a local knowledge base representing the information about a plurality of new application classifications based on the set of classification rules.
Claim: 19. The application classification system of claim 18 , wherein the local knowledge base is utilized to determine the application has runtime behaviors which are allowed within the enterprise system.
Claim: 20. The application classification system of claim 16 , wherein the identified application classification is provided as a response to the classification request.
Current U.S. Class: 706/59
Patent References Cited: 7574409 August 2009 Patinkin
2004/0111708 June 2004 Calder et al.
2004/0199827 October 2004 Muttik et al.
2006/0212942 September 2006 Barford et al.
2006/0282425 December 2006 Aggarwal et al.
2009/0276383 November 2009 Salahshour et al.



Other References: Wei et al., Profiling and Clustering Internet Hosts, 2006, International Conference on Data Mining, pp. 1-7. cited by examiner
Yang et al., A Study on Retrospective and On-line Event Detection, 1998, ACM 1-58113-015-5. cited by examiner
Guan et al., Y-Means: A clustering method for intrusion detection, 2003, CCECE 2003 CCGEI, MOntreal, IEEE. cited by examiner
Lee, T., and J.J. Mody, “Behavioral Classification,” Proceedings of the European Institute for Computer Antivirus Research Conference, Hamburg, Germany, Apr. 29-May 3, 2006, pp. 1-17. cited by other
Primary Examiner: Vincent, David R
Attorney, Agent or Firm: Workman Nydegger
رقم الانضمام: edspgr.07809670
قاعدة البيانات: USPTO Patent Grants