[[:TQ|TQ]] [[tq:ug|User guide]]
====== September 11th Reuters terror news ======
The Reuters terror news network was obtained from the CRA (Centering Resonance Analysis) networks produced by
Steve Corman and Kevin Dooley at Arizona State University.
The network is based on all the stories released during 66 consecutive days
by the news agency Reuters concerning the September 11 attack on the U.S.,
beginning at 9:00 AM EST 9/11/01.
The nodes of this network are important words (terms). There is an edge between
two words iff they appear in the same utterance (for details see the paper \cite{CRA}). The
weight of an edge is its frequency. The network has n = 13332
nodes (different words in the news) and m = 243447 edges, 50859
with value larger than 1. There are no loops in the network.
The Reuters terror news network was used as a case network for the
Viszards visualization session on the Sunbelt XXII International
Sunbelt Social Network Conference, New Orleans, USA, 13-17. February 2002.
We transformed the Pajek version of the network into the Ianus format used in TQ.
To identify important terms we computed their aggregated frequencies
and extracted the subnetwork of the 50 most frequently used (during 66 days) nodes.
They are listed in the following table:
**50 most frequent terms in the Terror news network.**
n term ∑freq n term ∑freq
1 united_states 15000 26 terrorism 2212
2 attack 10348 27 day 2128
3 taliban 6266 28 week 2017
4 people 5286 29 worker 1983
5 afghanistan 5176 30 office 1967
6 bin_laden 4885 31 group 1966
7 new_york 4832 32 air 1962
8 pres_bush 4506 33 minister 1919
9 washington 4047 34 time 1898
10 official 3902 35 hijack 1884
11 anthrax 3563 36 strike 1818
12 military 3394 37 afghan 1775
13 plane 3078 38 flight 1775
14 world_trade_ctr 3006 39 tell 1746
15 security 2906 40 terrorist 1745
16 american 2825 41 airport 1741
17 country 2794 42 pakistan 1714
18 city 2689 43 tower 1685
19 war 2679 44 bomb 1674
20 tuesday 2635 45 new 1650
21 pentagon 2620 46 buildng 1634
22 force 2516 47 wednesday 1593
23 government 2380 48 nation 1589
24 leader 2375 49 police 1587
25 world 2213 50 foreign 1558
Trying to draw this subnetwork it turns out to be almost
a complete graph. To obtain something readable we removed all
temporal edges with a value smaller than 10. The corresponding
underlying graph is presented in the following figure. The isolated nodes
were removed.
{{tq:pics:sept11.png?600}}
**September 11th.** Subnetwork of the most frequently used terms.
For each of the 50 nodes we determined its temporal activity and
drew it. By visual inspection we identified 6 typical activity patterns --
types of terms. For all charts in the figure
the displayed values are in the interval [0,200] - the largest
activity value for the term Wednesday is larger than 200.
The **primary** terms are the terms with a very high frequency of appearance in
the first week after September 11th and smaller, slowly declining
values in the following period. The representative of this group
in the figure is **''hijack''** and other members are:
''airport'', ''american'', ''attack'', ''city'', ''day'', ''flight'', ''nation'',
''New York'', ''official'', ''Pentagon'', ''people'', ''plane'', ''police'', ''president Bush'', ''security'',
''tower'', ''United States'', ''Washington'', ''world'', ''World Trade center''.
These are the terms describing the event.
The **secondary** terms are a reaction to the event. There are
no big changes in their values. We identified three subgroups:
a) **slowly declining** represented with **''bin Laden''**
(''country'', ''foreign'', ''government'', ''military'', ''minister'', ''new'',
''Pakistan'', ''tell'', ''terrorism'', ''terrorist'', ''time'', ''war'', ''week'');
b) **stationary** represented with
**''taliban''** (''afghan'', ''Afghanistan'', ''force'', ''group'', ''leader''); and
c) **occasional** with several peaks, represented with
**''bomb''** (''air'', ''building'', ''office'', ''strike'', ''worker'').
There are three special patterns - two **periodic**
**''Wednesday''** and ''Tuesday''; and one **episodic** **''anthrax''**.
| hijack |{{tq:pics:picb35.png?400}}|
| bin Laden |{{tq:pics:picb6.png?400}} |
| taliban |{{tq:pics:picb3.png?400}} |
| bomb |{{tq:pics:picb44.png?400}}|
| Wednesday |{{tq:pics:picb47.png?400}}|
| anthrax |{{tq:pics:picb11.png?400}}|
**Types of activity.**
To consider in a measure of importance of the node u ∈ V
also the node's position in the network we constructed the
attraction coefficient att(u).
Let **A** = [ auv] be a network matrix of temporal quantities with
positive real values. We define the **node activity** act(u) as (see Section~\ref{activ})
act(u) = act({u}, V\{u}) = ∑v∈V\{u} auv .
Then the **attraction** of the node u is defined as
att(u) = 1/Δ ∑v∈V\{u} avu / act(v) .
Note that the fraction auv / act(v) is measuring the
proportion of the activity of the node v that is shared with the node u.
From 0 ≤ avu / act(v) ≤ 1 and deg(v)=0 ⇒ avu=0
it follows that
∑v∈V\{u} avu / act(v) ≤ deg(u) ≤ Δ
where Δ denotes the maximum degree. Therefore we have 0 ≤ att(u) ≤ 1, for all u∈V.
The maximum possible attraction value 1 is attained exactly for nodes:
a) in an undirected network: that are the root of a star;
b) in a directed network: that are the only out-neighbors of their in-neighbors --
the root of a directed in-star.
We computed the temporal attraction and the corresponding aggregated attraction
values for all the nodes in our network. We selected 30 nodes with the
largest aggregated attraction values. They are listed in the following table:
**30 most attractive terms in the Terror news network.**
n term ∑att n term ∑att
1 united_states 12.216 16 war 2.758
2 taliban 7.096 17 force 2.596
3 attack 7.070 18 new_york 2.590
4 afghanistan 5.142 19 government 2.496
5 people 5.023 20 day 2.338
6 bin_laden 4.660 21 leader 2.305
7 anthrax 4.601 22 terrorism 2.202
8 pres_bush 4.374 23 time 2.182
9 country 3.317 24 group 2.072
10 washington 3.067 25 afghan 2.040
11 security 2.939 26 world 1.995
12 american 2.922 27 week 1.961
13 official 2.831 28 pakistan 1.943
14 city 2.798 29 letter 1.866
15 military 2.793 30 new 1.851
Again we visually explored them. In the following figure we present
temporal attraction coefficients for the 6 selected terms. For all charts in the figure
the displayed attraction values are in the interval [0,0.2].
| pres Bush |{{tq:pics:pica8.png?400}} |
| Pakistan |{{tq:pics:pica28.png?400}}|
| taliban |{{tq:pics:pica2.png?400}} |
| Kabul |{{tq:pics:pica32.png?400}}|
| bomb |{{tq:pics:pica33.png?400}}|
| anthrax |{{tq:pics:pica7.png?400}} |
**Attraction patterns.**
Comparing on the common terms (''taliban'', ''bomb'', ''anthrax'') the activity
charts in the previous figure with the corresponding attraction charts in this
figure we see that they are "correlated" (obviously
act(a;t) = 0 implies att(a;t) = 0), but different
in details.
For example, the terms ''taliban'' and ''bomb'' have small attraction values at
the beginning of the time window -- the terms were disguised by the
primary terms. On the other hand, the terms ''taliban'' and ''Kabul'' get
increased attraction towards the end of the time window.
In preparation. Not finished!!!