Editing Distance Clusters
Matthew Mattias Luke John James Thomas Nick Zack Mark
A greedy string classification algorithm
This algorithm attempts to classify strings by their editing distance. It first computes the case-insensitiveediting distances between all input strings in O( n ² ab ), where a and b the maximum string lengths. Normalized editing distances are stored in a square matrix, M. Normalized, in this context, means that the editing distance is divided by the length of the longer of the two strings. The value M i,j = d means that the normalized editing distance between elements i and j is equal to d . The diagonal of this matrix should consist of all zeros.
Next, the algorithm sorts the columns of this distance matrix by the simple invariant that the element immediately to the right of the diagonal is the least element on that row right of the diagonal. Symbolically, the invariant states that for matrix M of size m×n , M i , i +1 ≤ M i,j for all i +1 < j ≤ n . This "sorting" is basically an insertion sort and occurs in Θ( n ²).
Finally, the algorithm assigns clusters by the normalized editing
distance.
This is a simple comparison.
The threshold for differentiating clusters configurable by the user.
If M
i
, i
+1
> threshold
(more
different) then the elements i
and i
+1 are classified in
different clusters.
My goal with all of this is to automatically classify Syslog messages. This application looks promising! Try pasting the below test inputs into the text area above to see what interesting clusters you can come up with.
Test inputs and recommended thresholds
"Nice" clusters (0.50)
Thomas Zack John Matthew Luke Mattias Nick Mark James
Even nicer clusters (.40)
space spouse cats dot house cat dig dog mouse horse
Low variation (.45)
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
High variation (.75)
3505220008376247808 4148830745941488640 4385269805240825344 3925546062580223488 6963321067607564288 1081202811808172672 6074028453849956352 3620542548649959936 5167561983569581056 2577125115152971776
Very high variation (.80)
white green blue purple magenta orange cyan pink effervescent
Long input (notice Grover Cleveland) (.50)
George Washington John Adams Thomas Jefferson James Madison James Monroe John Quincy Adams Andrew Jackson Martin Van Buren William Henry Harrison John Tyler James K. Polk Zachary Taylor Millard Fillmore Franklin Pierce James Buchanan Abraham Lincoln Andrew Johnson Ulysses S. Grant Rutherford B. Hayes James Garfield Chester A. Arthur Grover Cleveland Benjamin Harrison Grover Cleveland William McKinley Theodore Roosevelt William Howard Taft Woodrow Wilson Warren G. Harding Calvin Coolidge Herbert Hoover Franklin D. Roosevelt Harry S. Truman Dwight D. Eisenhower John F. Kennedy Lyndon B. Johnson Richard M. Nixon Gerald R. Ford James Carter Ronald Reagan George H. W. Bush William J. Clinton George W. Bush Barack Obama Donald J. Trump
Interface state change Syslog messages (.20)
00:00:46: %LINK-3-UPDOWN: Interface Port-channel1, changed state to up 00:00:47: %LINK-3-UPDOWN: Interface GigabitEthernet0/1, changed state to up 00:00:47: %LINK-3-UPDOWN: Interface GigabitEthernet0/2, changed state to up 00:00:48: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down 00:00:48: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/1, changed state to down *Mar 1 18:46:11: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36) 18:47:02: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36) *Mar 1 18:48:50.483 UTC: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)
Router adjacency change Syslog messages (.20)
*Mar 15 20:30:22.475: %SPANTREE-2-PVSTSIM_FAIL: Blocking designated port Fa0/1: Inconsitent superior PVST BPDU received on VLAN 10, claiming root 24586:0009.b752.d780 *Mar 15 20:30:22.479: %SPANTREE-2-PVSTSIM_FAIL: Blocking designated port Fa0/2: Inconsitent superior PVST BPDU received on VLAN 10, claiming root 24586:0009.b752.d780 *Mar 15 20:30:22.479: %SPANTREE-2-PVSTSIM_FAIL: Blocking designated port Fa0/3: Inconsitent superior PVST BPDU received on VLAN 10, claiming root 24586:0009.b752.d780 *Mar 15 20:30:22.479: %SPANTREE-2-PVSTSIM_FAIL: Blocking designated port Fa0/4: Inconsitent superior PVST BPDU received on VLAN 10, claiming root 24586:0009.b752.d780 *Mar 15 20:30:22.667: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down *Mar 15 20:30:22.667: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan50, changed state to down *Mar 15 20:30:22.671: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 172.16.50.50 (Vlan50) is down: interface down *Mar 15 20:30:22.671: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 172.16.50.101 (Vlan50) is down: interface down *Mar 15 20:30:22.683: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan10, changed state to down *Mar 15 20:30:22.687: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 192.168.1.101 (Vlan10) is down: interface down *Mar 15 20:30:24.387: %SYS-5-CONFIG_I: Configured from console by console *Mar 15 20:30:42.479: %SPANTREE-2-PVSTSIM_OK: PVST Simulation inconsistency cleared on port FastEthernet0/2. *Mar 15 20:30:52.651: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to up *Mar 15 20:30:52.651: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan10, changed state to up *Mar 15 20:30:52.679: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan50, changed state to up *Mar 15 20:30:54.043: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 172.16.50.50 (Vlan50) is up: new adjacency 2d16h: %SYS-5-CONFIG_I: Configured from console by console 2d16h: %SPANTREE-2-PVSTSIM_FAIL: Blocking root port Fa0/5: Inconsitent inferior PVST BPDU received on VLAN 20, claiming root 24596:0009.b752.d780 2d16h: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 172.16.50.50 (Vlan50) is up: new adjacency 2d16h: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 192.168.1.100 (Vlan10) is up: new adjacency 2d16h: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 172.16.50.100 (Vlan50) is up: new adjacency
Spanning tree state change Syslog messages (.20)
Jan 3 19:49:29:I:MSTP: MST 1 Port 1/1/9:1 - Bridge TC Event Jan 3 19:49:12:I:MSTP: MST 0 Port 1/1/9:1 - DISCARDING Jan 3 19:49:29:I:MSTP: MST 0 Port 1/1/9:1 - LEARNING Jan 3 19:49:29:I:MSTP: MST 0 Port 1/1/9:1 - FORWARDING Jan 3 19:49:12:I:MRP: Interface ethernet 1/1/9:1 of ring 51 Vlan 51, changing to disabled Jan 3 19:49:29:I:MSTP: MST 0 Port 1/1/11:1 - Bridge TC Event Jan 3 19:49:29:I:MRP: Interface ethernet 1/1/9:1 of ring 51 Vlan 51, changing to preforwarding Jan 3 19:50:43:I:MSTP: MST 0 Port 1/1/11:1 - Bridge TC Event Jan 3 19:49:12:I:System: Interface ethernet 1/1/9:3, state down Jan 3 19:49:29:I:MSTP: MST 1 Port 1/1/9:1 - FORWARDING Jan 3 19:49:29:I:MSTP: MST 0 Port 1/1/9:1 - Bridge TC Event Jan 3 19:49:29:I:System: Logical link on dynamic lag interface ethernet 1/1/9:4 is up. Jan 3 19:49:29:I:System: Logical link on dynamic lag interface ethernet 1/1/9:1 is up. Jan 3 20:23:10:I:Security: startup-config was changed by operator from console Jan 3 19:49:29:I:System: Logical link on dynamic lag interface ethernet 1/1/9:3 is up. Jan 3 19:49:12:I:System: Logical link on dynamic lag interface ethernet 1/1/9:4 is down. Jan 3 19:49:29:I:Trunk: Group (1/1/9:1, 1/1/9:2, 1/1/9:3, 1/1/9:4) created by 802.3ad link-aggregation module. Jan 3 19:49:29:I:System: Interface ethernet 1/1/9:2, state up Jan 3 19:49:12:I:System: Logical link on dynamic lag interface ethernet 1/1/9:3 is down. Jan 3 19:49:29:I:System: Interface ethernet 1/1/9:3, state up Jan 3 19:49:29:I:MSTP: MST 0 Port 1/1/9:1 - DISCARDING Jan 3 19:49:29:I:MSTP: MST 1 Port 1/1/9:1 - LEARNING Jan 3 19:49:12:I:MSTP: MST 1 Port 1/1/9:1 - DISCARDING Jan 3 19:49:12:I:Trunk: Group (1/1/9:1, 1/1/9:2, 1/1/9:3, 1/1/9:4) removed by 802.3ad link-aggregation module. Jan 3 19:49:12:I:System: Logical link on dynamic lag interface ethernet 1/1/9:1 is down. Jan 3 19:49:29:I:System: Logical link on dynamic lag interface ethernet 1/1/9:2 is up. Jan 3 19:49:29:I:MSTP: MST 1 Port 1/1/11:1 - Bridge TC Event Jan 3 19:49:12:I:System: Interface ethernet 1/1/9:4, state down Jan 3 19:49:12:I:System: Logical link on dynamic lag interface ethernet 1/1/9:2 is down. Jan 3 20:17:25:I:Security: startup-config was changed by operator from console Jan 3 19:49:29:I:MSTP: MST 1 Port 1/1/9:1 - DISCARDING Jan 3 19:50:43:I:MSTP: MST 1 Port 1/1/11:1 - Bridge TC Event Jan 3 19:49:29:I:System: Interface ethernet 1/1/9:4, state up Jan 3 19:49:29:I:System: Interface ethernet 1/1/9:1, state up Jan 3 19:49:12:I:System: Interface ethernet 1/1/9:2, state down Jan 3 19:49:29:I:MRP: Interface ethernet 1/1/9:1 of ring 51 Vlan 51, changing to forwarding Jan 3 19:49:12:I:System: Interface ethernet 1/1/9:1, state down
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。