====== 1248. in 1249. sredin seminar ====== 20. in 27. maj 2015 ===== Analiza bibliografije z WoS o bločnem modeliranju in razvrščanju v omrežjih ===== ==== Zbiranje podatkov ==== === Web of Science === http://home.izum.si/izum/ft_baze/wos.asp "block model*" or "network cluster*" or "graph cluster*" or "community detect*" or "blockmodel*" or "block-model*" or "structural equival*" or "regular equival*" Možnost dopolnitve s "Citing papers" nismo uporabili. **!!! Če želimo shranjevati polne opise (vsa polja) se moramo omejiti na bazo "WoS Core Collection".** === Scopus === http://home.izum.si/izum/ft_baze/scopus.asp http://blog.scopus.com/posts/chrome-42-issues-with-scopus-document-download-manager == Datoteke RIS == http://en.wikipedia.org/wiki/RIS_(file_format) https://www.researcherid.com/resources/html/help_upload.htm https://jira.sakaiproject.org/secure/attachment/21845/RIS+Format+Specifications.pdf http://archive.refman.com/support/risformat_intro.asp ==== Pretvorba v omrežja ==== WoS2Pajek : Imena del poskušajo najti ravnotežje glede problema identifikacije. V zadnji različici WoS2PajekT dodano titles in index. ==== Prve analize ==== Pajek Osnovno čiščenje: Info ; odstranimo večkratne povezave in zanke: read cite.net Network/Create new network/Transform/Remove/Multiple lines/Single line [no] Network/Create new network/Transform/Remove/Loops [no] Problemi identifikacije: * enote z različnimi imeni - veliko dela; ustvarimo enakovrednostna razbitja in omrežja skrčimo glede na njih. Za ključne besede delno rešujemo s krnjenjem (lemmatization) * isto ime pripada več enotam - trenutno se s tem ne ukvarjamo. === Meje omrežja === Enote s polnim opisom : sklicne enote. Razbitje DC ; Info DC Neupoštevane enote : odvečne enote. == Neupoštevane enote == Vhodne stopnje (input degree). Izpis vozlišč (npr 1000) z največjimi stopnjami. Ročno dodajanje najpogostejših. read DC.clu Network/Create partition/Degree/Input select DC as Second partition Partitions/Extract subpartition [0] Select DC as First partition Operations/Network+Partition/Extract subnetwork [0] select extacted subpartition as First partition Info button [1 +1000] Dopolnimo podatke iz baze za manjkajoče enote. Ponovno ustvarimo ustrezna omrežja. Postopek ponavljamo, dokler nismo zadovoljni z dobljenim omrežjem. == Odvečne enote == Med končnimi enotami (DC=0 in outdeg=0) izločimo tiste, na katere je manj kot k sklicov. V našem primeru je k=2 . Network/Create partition/Degree/Input Partition/Binarize partition [2-*] select DC as Second partition Partitions/Max File/Partition/Change Label [boundary partition] Save Partition Button [boundary.clu] select simplified network Operations/Network+Partition/Extract subnetwork [1-*] File/Network/Change Label [bounded network] Save Network Button [citeB.net] === Porazdelitve === Lahko si ogledamo različne porazdelitve: število člankov po letih; število sklicov v člankih; število avtorjev v člankih; ... select cite Network/Create Partition/Degree/input Zanimive so enote z največjimi vrednostmi v posameznih porazdelitvah: najbolj citirana dela; avtorji z največ deli; ... Iz WJ.net (simplify, remove ******, indegree) dobimo revije, v katerih je objavljenih največ člankov. Informacijski gumb [1 +200] Rank Vertex Cluster Id --------------------------------------- 1 80843 862 NATURE 2 80837 845 P NATL ACAD SCI USA 3 80856 794 SCIENCE 4 80931 519 PHYS REV LETT 5 80848 485 PHYS REV E 6 81195 477 J GEOPHYS RES-SOL EA 7 80844 451 LECT NOTES COMPUT SC 8 80897 437 NUCLEIC ACIDS RES 9 80847 435 PHYSICA A 10 80855 420 SOC NETWORKS 11 80932 406 B SEISMOL SOC AM 12 80905 404 BIOINFORMATICS 13 82268 358 TECTONOPHYSICS 14 81344 354 GEOPHYS J INT 15 81352 318 J GEOPHYS RES 16 81227 304 J MOL BIOL 17 80930 297 PHYS REV B 18 81115 281 J BIOL CHEM 19 80846 254 AM J SOCIOL 20 81906 246 AM SOCIOL REV 21 81343 245 GEOPHYS RES LETT 22 81189 233 GEOLOGY 23 81031 233 NEUROIMAGE 24 80832 233 IEEE T PATTERN ANAL 25 81376 201 J CHEM PHYS 26 81229 198 BIOCHEMISTRY-US 27 82796 196 APPL ENVIRON MICROB 28 81198 192 J GEOPHYS RES-SOLID 29 81080 191 PATTERN RECOGN 30 81228 189 J AM CHEM SOC --------------------------------------- Za stopnje v Cite.net: Podatke prestavimo v R. Partition/Copy to Vector Tools/R/Send to R/Current Vector lahko pa tudi v Pajku shranimo na datoteko in preberemo v R > setwd("C:/Users/batagelj/work/Python/WoS/BM1") > d <- read.table(file="indegAll.clu",skip=1) > t <- table(d) > head(t) d 0 1 2 3 4 5 2154 66431 6790 2155 959 585 > x <- as.numeric(names(t)) > plot(x,t) > plot(x,t,log='xy',main="indeg distribution",xlab='deg',ylab='freq',pch=16) {{event:seminar:pics:indegall.pdf}}; {{event:seminar:pics:outdegall.pdf}} Največje vhodne stopnje v Cite.net: Rank Vertex Cluster Id --------------------------------------- 1 52 505 GIRVAN_M(2002)99:7821 2 295 472 NEWMAN_M(2004)69:026113 3 163 368 FORTUNAT_S(2010)486:75 4 727 280 FORTUNAT_S(2007)104:36 5 84 270 NEWMAN_M(2006)103:8577 6 213 267 PALLA_G(2005)435:814 7 149 251 WASSERMA_S(1994): 8 73 249 CLAUSET_A(2004)70:066111 9 173 245 WATTS_D(1998)393:440 10 174 233 ZACHARY_W(1977)33:452 11 70 221 BLONDEL_V(2008):P10008 12 28 184 NEWMAN_M(2006)74:036104 13 399 184 NEWMAN_M(2003)45:167 14 200 180 LANCICHI_A(2008)78:046110 15 1592 179 ALBERT_R(2002)74:47 16 158 177 BARABASI_A(1999)286:509 17 217 171 RADICCHI_F(2004)101:2658 18 745 170 LORRAIN_F(1971)1:49 19 75 168 GUIMERA_R(2005)433:895 20 1809 167 DANON_L(2005):P09008 21 1692 162 WHITE_H(1976)81:730 22 27 157 NEWMAN_M(2004)69:066133 23 185 156 DUCH_J(2005)72:027104 24 219 151 ROSVALL_M(2008)105:1118 25 340 134 BURRIDGE_R(1967)57:341 26 59 133 NEWMAN_M(2004)38:321 27 865 127 REICHARD_J(2006)74:016110 28 199 126 LANCICHI_A(2009)11:033015 29 1105 126 LANCICHI_A(2009)80:056117 30 1395 111 BREIGER_R(1975)12:328 --------------------------------------- Največje izhodne stopnje v Cite.net: Rank Vertex Cluster Id --------------------------------------- 1 1008 863 BOCCALET_S(2006)424:175 2 9139 456 TURCOTTE_D(1999)62:1377 3 399 417 NEWMAN_M(2003)45:167 4 163 399 FORTUNAT_S(2010)486:75 5 30604 321 SIBLEY_C(2012)12:505 6 65979 310 FRANK_K(1998)23:171 7 8184 297 KAWAMURA_H(2012)84:839 8 2239 281 DOROGOVT_S(2002)51:1079 9 8401 275 ARENAS_A(2008)469:93 10 10842 255 BURT_R(1980)6:79 11 759 254 WU_F(1982)54:235 12 1592 208 ALBERT_R(2002)74:47 13 4965 204 JAIN_A(1999)31:264 14 42301 200 GRABHER_G(2006)30:163 15 66177 198 AXT_V(1998)70:145 16 14176 186 FOGGIA_P(2014)28:1450001 17 1546 178 ROSSI_R(2015)27:1112 18 207 175 LU_L(2011)390:1150 19 69171 174 DAHMEN_K(1996)53:14872 20 10474 169 AGGARWAL_C(2014)47:10 21 32083 168 MARKA_S(2012)1260:55 22 57714 168 RUNDLE_J(2003)41:1019 23 17049 168 ROBINS_G(2013)57:261 24 66555 167 FOOKES_P(1997)30:293 25 38192 166 PAVLOPOU_G(2011)4: 26 29794 160 MARSDEN_P(1990)16:435 27 802 158 MALLIARO_F(2013)533:95 28 45680 155 XU_P(2009)27:636 29 135 153 MCPHERSO_M(2001)27:415 30 28532 152 GULATI_R(1999)104:1439 --------------------------------------- ==== Analiza omrežja sklicevanj ==== Za določitev pomembnih delov uporabimo uteži SPC (Search Path Count). Za njihov izračun mora biti omrežje aciklično. Network/Create Partition/Components/Strong [2] Če obstajajo, izločimo krepke komponente in si jih ogledamo. Operations/Network+Partition/Extract subnetwork [1-*] Draw/Network + First Partition -> Cyc.net {{event:seminar:pics:cyc.pdf}} Cikle odpravimo tako, da posamezno komponento stisnemo ali pa nadomestimo s polnim dvodelnim omrežjem (transformacija Preprint). >>> import sys; wdir = r'C:\Users\Batagelj\work\python\WoS\BM1'; sys.path.append(wdir) >>> import Preprint3; Preprint3.run(wdir,'.','citeB.net','cyc.net') Izračunamo uteži SPC. Dobimo še glavno pot. Narišemo z uporabom makra layers. Boljšo rešitev da običajno postopek CPM (Critical Path Method). {{event:seminar:pics:MP.pdf}}; {{event:seminar:pics:CPM.pdf}} Za hitro razlago dobljenih rešitev jih shranimo in na njih uporabimo v R-ju funkcijo description: setwd("C:\\Users\\batagelj\\work\\Python\\WoS\\BM1\\results") <<< description T <- read.csv('titles.csv',sep=";",colClasses="character") T$code <- 1 head(T) d <- description("CPM.net","CPMnew.csv",T) head(d) Ogledamo si jih lahko v Excelu. Uredimo po letih objave. Podrobnejši vpogled v zgradbo omrežja sklicevanj nam ponuja postopek otokov (islands) Določimo islands [20,150] in islands [20,200]. Spreminja se le največji, glavni otok. Če jih narišemo vse {{event:seminar:pics:islands.pdf}} opazimo, da je več med njimi zvezdastih - nezanimivih. Posamično izločimo le tiste z razgibano zgradbo. islands partition/canonical extract largest islands draw KK components select interesting select an island extract it macro layers manual improvement export EPS -> PDF description in R {{event:seminar:pics:island1b.pdf}} ==== Sodelovanje avtorjev ==== Omrežje WA omejimo na dela s polnimi opisi. Stopnje. Dela z največ avtorji; avtorji z največ deli. !!! Članek Aad, G; A neural network clustering algorithm for the ATLAS silicon ... ima 2847 soavtorjev !!! read WA.net odstrani večkratne povezave Info gumb -> Rows=80822, Cols=46658 Partition/Create Constant Partition [46658 2] read DC partition select constant partition as Second Partitions/Fuse Partitions Operations/Network+Partition/Extract Subnetwork [Yes][1-*] File/Network/Change Label [WA bounded] normalization macro transpose select WAb as Second Networks/Multiply [Yes] remove loops pS cores partition from vector 1.5 extract draw {{event:seminar:pics:collabor.pdf}} ==== Sklicevanje med avtorji ==== cite.net -> citeD.net ACi = AWb * CiteD * WAb islands [4 50] {{event:seminar:pics:aucite.pdf}} 55 53 54 48 47 23 45 42 52 49 50 51 22 ==== Skupine in ključne besede ==== island 54 in islan 55 WK.net -> WKd.net AK = AWb * WKd Izreži del, ki pripada posameznemu otoku in poglej utežene vhodne stopnje: i - številka otoka; k = |K| Partition/Create Constant Partition [k 999] = CC select islands partition CI as First partition select partition CC as Second partition fuse CI and CC extract [i 999] -> Ni, Ci Network/Create Vector/Centrality/Weighted Degree/Input Operations/Network+Partition/Extract Subnetwork [999] -> NiK, CiK select partition Ci Operations/Vector+Partition/Extract Subvector [999] -> ViK Vector Info Button [+50] Island 55 Rank Vertex Value Id -------------------------------------------------------- 1 4 548.0000 network 2 11 446.0000 community 3 29 387.0000 complex 4 19 261.0000 detection 5 31 196.0000 structure 6 5 190.0000 algorithm 7 47 185.0000 model 8 68 155.0000 modularity 9 26 149.0000 graph 10 14 108.0000 social 11 93 97.0000 cluster 12 212 78.0000 organization 13 242 76.0000 metabolic 14 60 73.0000 dynamics 15 185 70.0000 analysis 16 222 63.0000 optimization 17 71 61.0000 base 18 239 59.0000 random 19 76 54.0000 overlap 20 226 53.0000 detect 21 299 50.0000 identification 22 577 50.0000 use 23 147 49.0000 web 24 428 48.0000 hierarchical 25 24 47.0000 method 26 40 47.0000 prediction 27 560 47.0000 small-world 28 434 46.0000 stochastic 29 67 43.0000 genetic 30 437 42.0000 blockmodel Island 54 Rank Vertex Value Id -------------------------------------------------------- 1 4 123.0000 network 2 14 83.0000 social 3 130 53.0000 structural 4 432 50.0000 equivalence 5 437 47.0000 blockmodel 6 216 46.0000 blockmodeling 7 219 45.0000 partition 8 26 42.0000 graph 9 179 40.0000 generalized 10 541 38.0000 balance 11 7 36.0000 datum 12 5 35.0000 algorithm 13 47 34.0000 model 14 185 33.0000 analysis 15 224 28.0000 two-mode 16 93 28.0000 cluster 17 225 24.0000 matrix 18 1315 22.0000 sign 19 885 18.0000 relational 20 433 16.0000 role 21 31 15.0000 structure 22 238 15.0000 multiple 23 1052 15.0000 relation 24 133 14.0000 position 25 434 13.0000 stochastic 26 3889 12.0000 relax 27 222 12.0000 optimization 28 606 12.0000 decomposition 29 71 12.0000 base 30 1615 11.0000 constraint Island 53 Rank Vertex Value Id -------------------------------------------------------- 1 47 80.0000 model 2 106 78.0000 earthquake 3 146 54.0000 fault 4 824 51.0000 self-organized 5 823 47.0000 criticality 6 60 36.0000 dynamics 7 2972 29.0000 slider-block 8 1286 20.0000 threshold 9 1466 18.0000 seismicity 10 52 17.0000 scale 11 506 15.0000 san-andreas 12 685 15.0000 simulation 13 1764 14.0000 critical 14 6241 14.0000 nucleation 15 46 13.0000 behavior 16 1700 12.0000 mechanical 17 172 12.0000 fluctuation 18 4599 12.0000 cellular-automaton 19 112 11.0000 fracture 20 105 11.0000 friction 21 2775 11.0000 phenomenon 22 1004 10.0000 statistical 23 109 10.0000 time 24 1012 9.0000 stress 25 190 9.0000 block 26 1474 9.0000 numerical 27 4 9.0000 network 28 3266 8.0000 forest-fire 29 23 8.0000 approach 30 1405 8.0000 failure -------------------------------------------------------- island 23 Rank Vertex Value Id -------------------------------------------------------- 1 190 67.0000 block 2 998 57.0000 drug 3 47 55.0000 model 4 1510 53.0000 canine 5 1214 51.0000 potential 6 1515 50.0000 atrioventricular 7 2328 50.0000 action 8 511 49.0000 heart 9 2895 47.0000 chronic 10 5867 43.0000 av 11 1411 41.0000 repolarization 12 1666 37.0000 susceptibility 13 2912 37.0000 ventricular 14 2909 37.0000 arrhythmia 15 6914 34.0000 atrioventricular-block 16 6742 34.0000 antiarrhythmic 17 1389 34.0000 syndrome 18 1506 32.0000 qt 19 1939 31.0000 dog 20 968 31.0000 long 21 4003 29.0000 vivo 22 3209 29.0000 hypertrophy 23 1511 29.0000 torsade 24 5872 28.0000 torsades-de-pointes 25 41 28.0000 effect 26 257 25.0000 duration 27 736 24.0000 profile 28 8342 23.0000 torsade-de-pointes 29 8178 22.0000 monophasic 30 2910 22.0000 prolongation -------------------------------------------------------- ==== Revije ==== Zgleda, da so v WoS pretežno poenotili zapise imen revij (del). AJ = AW * WJ avtor a je v reviji j objavil aj[a,j] člankov JJ = JA * bin(AJ) v reviji i so avtorji, ki so objavili tudi v reviji j, objavili jj[i,j] člankov