====== 1248. in 1249. sredin seminar ======
20. in 27. maj 2015
===== Analiza bibliografije z WoS o bločnem modeliranju in razvrščanju v omrežjih =====
==== Zbiranje podatkov ====
=== Web of Science ===
http://home.izum.si/izum/ft_baze/wos.asp
"block model*" or "network cluster*" or "graph cluster*" or "community detect*" or "blockmodel*" or "block-model*" or "structural equival*" or "regular equival*"
Možnost dopolnitve s "Citing papers" nismo uporabili.
**!!! Če želimo shranjevati polne opise (vsa polja) se moramo omejiti na bazo "WoS Core Collection".**
=== Scopus ===
http://home.izum.si/izum/ft_baze/scopus.asp
http://blog.scopus.com/posts/chrome-42-issues-with-scopus-document-download-manager
== Datoteke RIS ==
http://en.wikipedia.org/wiki/RIS_(file_format)
https://www.researcherid.com/resources/html/help_upload.htm
https://jira.sakaiproject.org/secure/attachment/21845/RIS+Format+Specifications.pdf
http://archive.refman.com/support/risformat_intro.asp
==== Pretvorba v omrežja ====
WoS2Pajek : Imena del poskušajo najti ravnotežje glede problema identifikacije.
V zadnji različici WoS2PajekT dodano titles in index.
==== Prve analize ====
Pajek
Osnovno čiščenje: Info ; odstranimo večkratne povezave in zanke:
read cite.net
Network/Create new network/Transform/Remove/Multiple lines/Single line [no]
Network/Create new network/Transform/Remove/Loops [no]
Problemi identifikacije:
* enote z različnimi imeni - veliko dela; ustvarimo enakovrednostna razbitja in omrežja skrčimo glede na njih. Za ključne besede delno rešujemo s krnjenjem (lemmatization)
* isto ime pripada več enotam - trenutno se s tem ne ukvarjamo.
=== Meje omrežja ===
Enote s polnim opisom : sklicne enote.
Razbitje DC ; Info DC
Neupoštevane enote : odvečne enote.
== Neupoštevane enote ==
Vhodne stopnje (input degree). Izpis vozlišč (npr 1000) z največjimi stopnjami. Ročno dodajanje najpogostejših.
read DC.clu
Network/Create partition/Degree/Input
select DC as Second partition
Partitions/Extract subpartition [0]
Select DC as First partition
Operations/Network+Partition/Extract subnetwork [0]
select extacted subpartition as First
partition Info button [1 +1000]
Dopolnimo podatke iz baze za manjkajoče enote. Ponovno ustvarimo ustrezna omrežja.
Postopek ponavljamo, dokler nismo zadovoljni z dobljenim omrežjem.
== Odvečne enote ==
Med končnimi enotami (DC=0 in outdeg=0) izločimo tiste, na katere je manj kot k sklicov. V našem primeru je k=2 .
Network/Create partition/Degree/Input
Partition/Binarize partition [2-*]
select DC as Second partition
Partitions/Max
File/Partition/Change Label [boundary partition]
Save Partition Button [boundary.clu]
select simplified network
Operations/Network+Partition/Extract subnetwork [1-*]
File/Network/Change Label [bounded network]
Save Network Button [citeB.net]
=== Porazdelitve ===
Lahko si ogledamo različne porazdelitve: število člankov po letih; število sklicov v člankih; število avtorjev v člankih; ...
select cite
Network/Create Partition/Degree/input
Zanimive so enote z največjimi vrednostmi v posameznih porazdelitvah: najbolj citirana dela; avtorji z največ deli; ...
Iz WJ.net (simplify, remove ******, indegree) dobimo revije, v katerih je objavljenih največ člankov.
Informacijski gumb [1 +200]
Rank Vertex Cluster Id
---------------------------------------
1 80843 862 NATURE
2 80837 845 P NATL ACAD SCI USA
3 80856 794 SCIENCE
4 80931 519 PHYS REV LETT
5 80848 485 PHYS REV E
6 81195 477 J GEOPHYS RES-SOL EA
7 80844 451 LECT NOTES COMPUT SC
8 80897 437 NUCLEIC ACIDS RES
9 80847 435 PHYSICA A
10 80855 420 SOC NETWORKS
11 80932 406 B SEISMOL SOC AM
12 80905 404 BIOINFORMATICS
13 82268 358 TECTONOPHYSICS
14 81344 354 GEOPHYS J INT
15 81352 318 J GEOPHYS RES
16 81227 304 J MOL BIOL
17 80930 297 PHYS REV B
18 81115 281 J BIOL CHEM
19 80846 254 AM J SOCIOL
20 81906 246 AM SOCIOL REV
21 81343 245 GEOPHYS RES LETT
22 81189 233 GEOLOGY
23 81031 233 NEUROIMAGE
24 80832 233 IEEE T PATTERN ANAL
25 81376 201 J CHEM PHYS
26 81229 198 BIOCHEMISTRY-US
27 82796 196 APPL ENVIRON MICROB
28 81198 192 J GEOPHYS RES-SOLID
29 81080 191 PATTERN RECOGN
30 81228 189 J AM CHEM SOC
---------------------------------------
Za stopnje v Cite.net:
Podatke prestavimo v R.
Partition/Copy to Vector
Tools/R/Send to R/Current Vector
lahko pa tudi v Pajku shranimo na datoteko in preberemo v R
> setwd("C:/Users/batagelj/work/Python/WoS/BM1")
> d <- read.table(file="indegAll.clu",skip=1)
> t <- table(d)
> head(t)
d
0 1 2 3 4 5
2154 66431 6790 2155 959 585
> x <- as.numeric(names(t))
> plot(x,t)
> plot(x,t,log='xy',main="indeg distribution",xlab='deg',ylab='freq',pch=16)
{{event:seminar:pics:indegall.pdf}}; {{event:seminar:pics:outdegall.pdf}}
Največje vhodne stopnje v Cite.net:
Rank Vertex Cluster Id
---------------------------------------
1 52 505 GIRVAN_M(2002)99:7821
2 295 472 NEWMAN_M(2004)69:026113
3 163 368 FORTUNAT_S(2010)486:75
4 727 280 FORTUNAT_S(2007)104:36
5 84 270 NEWMAN_M(2006)103:8577
6 213 267 PALLA_G(2005)435:814
7 149 251 WASSERMA_S(1994):
8 73 249 CLAUSET_A(2004)70:066111
9 173 245 WATTS_D(1998)393:440
10 174 233 ZACHARY_W(1977)33:452
11 70 221 BLONDEL_V(2008):P10008
12 28 184 NEWMAN_M(2006)74:036104
13 399 184 NEWMAN_M(2003)45:167
14 200 180 LANCICHI_A(2008)78:046110
15 1592 179 ALBERT_R(2002)74:47
16 158 177 BARABASI_A(1999)286:509
17 217 171 RADICCHI_F(2004)101:2658
18 745 170 LORRAIN_F(1971)1:49
19 75 168 GUIMERA_R(2005)433:895
20 1809 167 DANON_L(2005):P09008
21 1692 162 WHITE_H(1976)81:730
22 27 157 NEWMAN_M(2004)69:066133
23 185 156 DUCH_J(2005)72:027104
24 219 151 ROSVALL_M(2008)105:1118
25 340 134 BURRIDGE_R(1967)57:341
26 59 133 NEWMAN_M(2004)38:321
27 865 127 REICHARD_J(2006)74:016110
28 199 126 LANCICHI_A(2009)11:033015
29 1105 126 LANCICHI_A(2009)80:056117
30 1395 111 BREIGER_R(1975)12:328
---------------------------------------
Največje izhodne stopnje v Cite.net:
Rank Vertex Cluster Id
---------------------------------------
1 1008 863 BOCCALET_S(2006)424:175
2 9139 456 TURCOTTE_D(1999)62:1377
3 399 417 NEWMAN_M(2003)45:167
4 163 399 FORTUNAT_S(2010)486:75
5 30604 321 SIBLEY_C(2012)12:505
6 65979 310 FRANK_K(1998)23:171
7 8184 297 KAWAMURA_H(2012)84:839
8 2239 281 DOROGOVT_S(2002)51:1079
9 8401 275 ARENAS_A(2008)469:93
10 10842 255 BURT_R(1980)6:79
11 759 254 WU_F(1982)54:235
12 1592 208 ALBERT_R(2002)74:47
13 4965 204 JAIN_A(1999)31:264
14 42301 200 GRABHER_G(2006)30:163
15 66177 198 AXT_V(1998)70:145
16 14176 186 FOGGIA_P(2014)28:1450001
17 1546 178 ROSSI_R(2015)27:1112
18 207 175 LU_L(2011)390:1150
19 69171 174 DAHMEN_K(1996)53:14872
20 10474 169 AGGARWAL_C(2014)47:10
21 32083 168 MARKA_S(2012)1260:55
22 57714 168 RUNDLE_J(2003)41:1019
23 17049 168 ROBINS_G(2013)57:261
24 66555 167 FOOKES_P(1997)30:293
25 38192 166 PAVLOPOU_G(2011)4:
26 29794 160 MARSDEN_P(1990)16:435
27 802 158 MALLIARO_F(2013)533:95
28 45680 155 XU_P(2009)27:636
29 135 153 MCPHERSO_M(2001)27:415
30 28532 152 GULATI_R(1999)104:1439
---------------------------------------
==== Analiza omrežja sklicevanj ====
Za določitev pomembnih delov uporabimo uteži SPC (Search Path Count). Za njihov izračun mora biti omrežje aciklično.
Network/Create Partition/Components/Strong [2]
Če obstajajo, izločimo krepke komponente in si jih ogledamo.
Operations/Network+Partition/Extract subnetwork [1-*]
Draw/Network + First Partition
-> Cyc.net
{{event:seminar:pics:cyc.pdf}}
Cikle odpravimo tako, da posamezno komponento stisnemo ali pa nadomestimo s polnim dvodelnim omrežjem (transformacija Preprint).
>>> import sys; wdir = r'C:\Users\Batagelj\work\python\WoS\BM1'; sys.path.append(wdir)
>>> import Preprint3; Preprint3.run(wdir,'.','citeB.net','cyc.net')
Izračunamo uteži SPC. Dobimo še glavno pot. Narišemo z uporabom makra layers.
Boljšo rešitev da običajno postopek CPM (Critical Path Method).
{{event:seminar:pics:MP.pdf}}; {{event:seminar:pics:CPM.pdf}}
Za hitro razlago dobljenih rešitev jih shranimo in na njih uporabimo v R-ju funkcijo description:
setwd("C:\\Users\\batagelj\\work\\Python\\WoS\\BM1\\results")
<<< description
T <- read.csv('titles.csv',sep=";",colClasses="character")
T$code <- 1
head(T)
d <- description("CPM.net","CPMnew.csv",T)
head(d)
Ogledamo si jih lahko v Excelu. Uredimo po letih objave.
Podrobnejši vpogled v zgradbo omrežja sklicevanj nam ponuja postopek otokov (islands)
Določimo islands [20,150] in islands [20,200]. Spreminja se le največji, glavni otok.
Če jih narišemo vse
{{event:seminar:pics:islands.pdf}}
opazimo, da je več med njimi zvezdastih - nezanimivih. Posamično izločimo le tiste z
razgibano zgradbo.
islands
partition/canonical
extract largest islands
draw KK components
select interesting
select an island
extract it
macro layers
manual improvement
export EPS -> PDF
description in R
{{event:seminar:pics:island1b.pdf}}
==== Sodelovanje avtorjev ====
Omrežje WA omejimo na dela s polnimi opisi.
Stopnje. Dela z največ avtorji; avtorji z največ deli.
!!! Članek Aad, G; A neural network clustering algorithm for the ATLAS silicon ... ima 2847 soavtorjev !!!
read WA.net
odstrani večkratne povezave
Info gumb -> Rows=80822, Cols=46658
Partition/Create Constant Partition [46658 2]
read DC partition
select constant partition as Second
Partitions/Fuse Partitions
Operations/Network+Partition/Extract Subnetwork [Yes][1-*]
File/Network/Change Label [WA bounded]
normalization macro
transpose
select WAb as Second
Networks/Multiply [Yes]
remove loops
pS cores
partition from vector 1.5
extract
draw
{{event:seminar:pics:collabor.pdf}}
==== Sklicevanje med avtorji ====
cite.net -> citeD.net
ACi = AWb * CiteD * WAb
islands [4 50]
{{event:seminar:pics:aucite.pdf}}
55 53 54 48 47 23 45 42
52 49 50 51 22
==== Skupine in ključne besede ====
island 54 in islan 55
WK.net -> WKd.net
AK = AWb * WKd
Izreži del, ki pripada posameznemu otoku in poglej utežene vhodne stopnje:
i - številka otoka; k = |K|
Partition/Create Constant Partition [k 999] = CC
select islands partition CI as First partition
select partition CC as Second partition
fuse CI and CC
extract [i 999] -> Ni, Ci
Network/Create Vector/Centrality/Weighted Degree/Input
Operations/Network+Partition/Extract Subnetwork [999] -> NiK, CiK
select partition Ci
Operations/Vector+Partition/Extract Subvector [999] -> ViK
Vector Info Button [+50]
Island 55
Rank Vertex Value Id
--------------------------------------------------------
1 4 548.0000 network
2 11 446.0000 community
3 29 387.0000 complex
4 19 261.0000 detection
5 31 196.0000 structure
6 5 190.0000 algorithm
7 47 185.0000 model
8 68 155.0000 modularity
9 26 149.0000 graph
10 14 108.0000 social
11 93 97.0000 cluster
12 212 78.0000 organization
13 242 76.0000 metabolic
14 60 73.0000 dynamics
15 185 70.0000 analysis
16 222 63.0000 optimization
17 71 61.0000 base
18 239 59.0000 random
19 76 54.0000 overlap
20 226 53.0000 detect
21 299 50.0000 identification
22 577 50.0000 use
23 147 49.0000 web
24 428 48.0000 hierarchical
25 24 47.0000 method
26 40 47.0000 prediction
27 560 47.0000 small-world
28 434 46.0000 stochastic
29 67 43.0000 genetic
30 437 42.0000 blockmodel
Island 54
Rank Vertex Value Id
--------------------------------------------------------
1 4 123.0000 network
2 14 83.0000 social
3 130 53.0000 structural
4 432 50.0000 equivalence
5 437 47.0000 blockmodel
6 216 46.0000 blockmodeling
7 219 45.0000 partition
8 26 42.0000 graph
9 179 40.0000 generalized
10 541 38.0000 balance
11 7 36.0000 datum
12 5 35.0000 algorithm
13 47 34.0000 model
14 185 33.0000 analysis
15 224 28.0000 two-mode
16 93 28.0000 cluster
17 225 24.0000 matrix
18 1315 22.0000 sign
19 885 18.0000 relational
20 433 16.0000 role
21 31 15.0000 structure
22 238 15.0000 multiple
23 1052 15.0000 relation
24 133 14.0000 position
25 434 13.0000 stochastic
26 3889 12.0000 relax
27 222 12.0000 optimization
28 606 12.0000 decomposition
29 71 12.0000 base
30 1615 11.0000 constraint
Island 53
Rank Vertex Value Id
--------------------------------------------------------
1 47 80.0000 model
2 106 78.0000 earthquake
3 146 54.0000 fault
4 824 51.0000 self-organized
5 823 47.0000 criticality
6 60 36.0000 dynamics
7 2972 29.0000 slider-block
8 1286 20.0000 threshold
9 1466 18.0000 seismicity
10 52 17.0000 scale
11 506 15.0000 san-andreas
12 685 15.0000 simulation
13 1764 14.0000 critical
14 6241 14.0000 nucleation
15 46 13.0000 behavior
16 1700 12.0000 mechanical
17 172 12.0000 fluctuation
18 4599 12.0000 cellular-automaton
19 112 11.0000 fracture
20 105 11.0000 friction
21 2775 11.0000 phenomenon
22 1004 10.0000 statistical
23 109 10.0000 time
24 1012 9.0000 stress
25 190 9.0000 block
26 1474 9.0000 numerical
27 4 9.0000 network
28 3266 8.0000 forest-fire
29 23 8.0000 approach
30 1405 8.0000 failure
--------------------------------------------------------
island 23
Rank Vertex Value Id
--------------------------------------------------------
1 190 67.0000 block
2 998 57.0000 drug
3 47 55.0000 model
4 1510 53.0000 canine
5 1214 51.0000 potential
6 1515 50.0000 atrioventricular
7 2328 50.0000 action
8 511 49.0000 heart
9 2895 47.0000 chronic
10 5867 43.0000 av
11 1411 41.0000 repolarization
12 1666 37.0000 susceptibility
13 2912 37.0000 ventricular
14 2909 37.0000 arrhythmia
15 6914 34.0000 atrioventricular-block
16 6742 34.0000 antiarrhythmic
17 1389 34.0000 syndrome
18 1506 32.0000 qt
19 1939 31.0000 dog
20 968 31.0000 long
21 4003 29.0000 vivo
22 3209 29.0000 hypertrophy
23 1511 29.0000 torsade
24 5872 28.0000 torsades-de-pointes
25 41 28.0000 effect
26 257 25.0000 duration
27 736 24.0000 profile
28 8342 23.0000 torsade-de-pointes
29 8178 22.0000 monophasic
30 2910 22.0000 prolongation
--------------------------------------------------------
==== Revije ====
Zgleda, da so v WoS pretežno poenotili zapise imen revij (del).
AJ = AW * WJ
avtor a je v reviji j objavil aj[a,j] člankov
JJ = JA * bin(AJ)
v reviji i so avtorji, ki so objavili tudi v reviji j, objavili jj[i,j] člankov