Topology-based reasoning apparatus for root-cause analysis of network faults

التفاصيل البيبلوغرافية
العنوان: Topology-based reasoning apparatus for root-cause analysis of network faults
Patent Number: 7,043,661
تاريخ النشر: May 09, 2006
Appl. No: 09/978846
Application Filed: October 17, 2001
مستخلص: A root cause analysis system operative in conjunction with fault management apparatus, the system includes a topology-based reasoning system (TRS) operative to topologically analyze alarms in order to identify at least one root cause thereof.
Inventors: Valadarsky, Doron (Hod Hasharon, IL); Pridor, Adir (Beer Sheva, IL); Kaftan, Nir (Hod Hasharon, IL); Mallal, Shlomo (Jerusalem, IL); Pavlenko, Alexander (Bat Yam, IL); Lifshitz, Yuval (Menorah, IL); Bregman, Lev (Beer Sheva, IL); Virtser, Alexander (Tel Aviv, IL); Kouperchimdt, Igor (Petach Tikva, IL)
Assignees: TTI-Team Telecom International Ltd. (Petach Tikva, IL)
Claim: 1. A root cause analysis system operative in conjunction with fault management apparatus, the system comprising: a topology-based reasoning system (TRS) operative to topologically analyze alarms in order to identify at least one root cause thereof, wherein conditional probabilities derived from assuming that various probable causes are the root cause, are compared, in order to select one of the probable causes as the root cause.
Claim: 2. A system according to claim 1 wherein said probable causes comprise faults in a network.
Claim: 3. A system according to claim 1 wherein said probable causes comprise faults in a data communications network.
Claim: 4. A system according to claim 1 and also comprising fault management apparatus.
Claim: 5. A system according to claim 1 and also comprising a rule-based expert system (BS) operative to analyze alarms in order to identify root causes thereof.
Claim: 6. A system according to claim 1 wherein the operation of TRS is at least partly rule-based.
Claim: 7. A system according to claim 6 wherein at least one rule governing operation of the TRS can be changed without disrupting TRS operation.
Claim: 8. A system according to claim 1 wherein the operation of TRS is at least partly relation-based.
Claim: 9. A system according to claim 1 wherein the TRS is operative to represent the topology of a network being analyzed for faults as a graph.
Claim: 10. A system according to claim 9 wherein said graph is held in memory.
Claim: 11. A system according to claim 10 wherein said memory comprises random access memory.
Claim: 12. A system according to claim 9 wherein said TRS is operative to traverse said graph in order to find the root cause and in order to find all the alarms that belong to a root cause.
Claim: 13. A system according to claim 1 and also comprising a large network whose topology is being analyzed.
Claim: 14. A system according to claim 13 wherein said large network comprises a national network.
Claim: 15. A system according to claim 13 wherein said large network compnses a network operated by PTT (post-telecom and telegraph).
Claim: 16. A system according to claim 1 wherein said TRS is operative to identify root causes for more than one network fault simultaneously.
Claim: 17. A system according to claim 1 wherein said TRS is operative to check all probable causes before selecting a root-cause.
Claim: 18. A system according to claim 1 wherein said TRS is operative to make at least one root-cause decision based at least partly on the following parameter: the distance between the location in the network of the suspected root-cause and the point of origin of each alarm related to it.
Claim: 19. A system according to claim 1 wherein said TRS is operative to make at least one root-cause decision based at least partly on the following parameter: the amount of alarms in the incoming group of alarms that are explained by that root-cause.
Claim: 20. A system according to claim 1 wherein said TRS is operative to make at least one root-cause decision based at least partly on the following parameter: the amount of alarms received out of all the alarms that system expects for that root cause.
Claim: 21. A system according to claim 1 wherein said TRS is operative to anticipate at least one expected alarm associated with at least one respective fault type, and wherein said TRS is capable of deciding that at least one particular fault type has occurred, even if less than all of the expected alarms associated with said at least one particular fault type have actually occurred within an incoming alarm stream.
Claim: 22. A system according to claim 1 wherein said TRS is operative to identify at least one incoming alarm generated by at least one network maintenance activity.
Claim: 23. A system according to claim 1 wherein said TRS is application independent.
Claim: 24. A system according to claim 1 wherein said probable causes comprise faults in a telecommunications network.
Claim: 25. A system according to claim 1 wherein said TRS is operative to analyze at least one incoming alarm to determine a network element associated therewith.
Claim: 26. A system according to claim 25 wherein said TRS comprises a table storing the manufacturer of each network element and wherein, upon encountering an incoming alarm associated with a network element, said TRS is operative to look up the manufacturer associated with said alarm.
Claim: 27. A system according to claim 25 wherein said network element comprises a logical object.
Claim: 28. A system according to claim 25 wherein said network element comprises a physical object.
Claim: 29. A system according to claim 1 wherein said TRS is operative to identify a first root cause of at least one alarm and subsequently identify a second root cause based on at least one additional alarm which has arrived subsequently.
Claim: 30. A system according to claim 1 wherein the TRS is operative to represent, using a graph, the topology of an expected alarm flow pattern through a network being analyzed for faults.
Claim: 31. A system according to claim 1 wherein said TRS is operative to cluster incoming alarms into groups based at least partly on the alarms' time of arrival.
Claim: 32. A system according to claim 1 wherein said TRS is operative to utilize knowledge regarding a bell-shaped distribution of alarms associated with each fault, in order to cluster alarm into groups each associated with a different fault.
Claim: 33. A system according to claim 1 wherein said TRS is operative to cluster incoming alarms into groups based at least partly on the topologic distance between the alarms.
Claim: 34. A system according to claim 1 wherein said TRS is operative to update network topology data on-line, without disruption to TRS operation.
Claim: 35. A system according to claim 1 and also comprising a rule definition GUI operative to provide a user with a set of rule component options presented in natural language from which a rule can be composed.
Claim: 36. A system according to claim 1 wherein said TRS automatically adjusts to changing network topology.
Claim: 37. A system according to claim 1 wherein said TRS is part of a network control system.
Claim: 38. A system according to claim 1 and also comprising an event stream including a sequence of events, less than all of which are deemed alarms, and wherein said TRS can be connected to the event stream directly.
Claim: 39. A system according to claim 1 wherein said TRS imitates the flow of alarms in the network, thereby allowing anyone with good knowledge of the alarm flow in a given network type to generate the rules for that network type.
Claim: 40. A system according to claim 1 wherein said TRS issues ‘derived’ alarms to describe root causes when no incoming alarm accurately does so.
Claim: 41. A system according to claim 1 wherein said TRS results are sent to a fault management GUI without requiring the operator to look at a separate correlation screen.
Claim: 42. A system according to claim 1 wherein said TRS stores its decision results in a history database, thereby allowing users to review the decision results and associated alarm groups, after the faults that generated these decisions and alarm groups have already been resolved.
Claim: 43. A system according to claim 1 wherein said TRS comprises a rule set.
Claim: 44. A root cause analysis method operative in conjunction with fault management apparatus, the method comprising: performing topology-based reasoning including topologically analyzing alarms in order to identify at least one root cause thereof; and adapting said topology-based reasoning for use with a different type of network by adding at least one rule to an existing rule set employed in the course of said performing topology-based reasoning.
Claim: 45. A method according to claim 44 and also comprising providing an output indication of at least one identified root cause.
Claim: 46. A system according to claim 45 wherein said rule set comprises a first rule associated with a first network element manufacturer and a second rule associated with a second network element manufacturer.
Claim: 47. A root cause analysis system operative in conjunction with fault management apparatus, the system comprising: a topology-based reasoning system (TRS) operative to topologically analyze alarms in order to identify at least one root cause thereof, wherein said TRS is operative to make at least one root-cause decision based at least partly on the following parameter: the distance between the location in the network of the suspected root-cause and the point of origin of each alarm related to it.
Claim: 48. A root cause analysis system operative in conjunction with fault management apparatus, the system comprising: a topology-based reasoning system (TRS) operative to topologically analyze alarms in order to identify at least one root cause thereof, wherein said TRS is operative to make at least one root-cause decision based at least partly on the following parameter: the amount of alarms received out of all the alarms that system expects for that root cause.
Claim: 49. A root cause analysis system operative in conjunction with fault management apparatus, the system comprising: a topology-based reasoning system (TRS) operative to topologically analyze alarms in order to identify at least one root cause thereof, wherein said TRS is operative to identify at least one incoming alarm generated by at least one network maintenance activity.
Claim: 50. A root cause analysis system operative in conjunction with fault management apparatus, the system comprising: a topology-based reasoning system (TRS) operative in topologically analyze alarms in order to identify at least one root cause thereof; and an event stream including a sequence of events, less than all of which are deemed alarms, wherein said TRS can be connected to the event stream directly.
Claim: 51. A root cause analysis system operative in conjunction with fault management apparatus, the system comprising: a topology-based reasoning system (TRS) operative in topologically analyze alarms in order to identify at least one root cause thereof, wherein said TRS imitates the flow of alarms in the network, thereby allowing anyone with good knowledge of the alarm flow in a given network type to generate the rules for that network type.
Current U.S. Class: 714/4
Patent References Cited: 5309448 May 1994 Bouloutas et al.
5392328 February 1995 Schmidt et al.
5646864 July 1997 Whitney
5661668 August 1997 Yemini et al.
5737319 April 1998 Croslin et al.
5748098 May 1998 Grace
5771274 June 1998 Harris
5864662 January 1999 Brownmiller et al.
5946373 August 1999 Harris
5995485 November 1999 Croslin
6026442 February 2000 Lewis et al.
6072777 June 2000 Bencheck et al.
6118936 September 2000 Lauer et al.
6249755 June 2001 Yemini et al.
6430712 August 2002 Lewis
6445774 September 2002 Kidder et al.
6707795 March 2004 Noorhosseini et al.
6766368 July 2004 Jakobson et al.
2001/0051937 December 2001 Ross et al.
2323195 April 2001
549937 July 1993
686329 December 1995
2318479 April 1998
2000020428 January 2000
WO 94/19887 September 1994
WO 94/19912 September 1994
WO 95/32411 November 1995
WO 99/13427 March 1999
WO 99/49474 September 1999
WO 00/33205 June 2000
WO 00/39674 July 2000
WO 01/22226 March 2001
WO 01/31411 May 2001
Other References: Derwent English Abstract of JP 2000 20428 dated Jan. 21, 2000. cited by other
Assistant Examiner: McCarthy, Christopher
Primary Examiner: Beausoliel, Robert
Attorney, Agent or Firm: Ladas & Parry
رقم الانضمام: edspgr.07043661
قاعدة البيانات: USPTO Patent Grants