1248. in 1249. sredin seminar

20. in 27. maj 2015

Partially translated in English

Analiza bibliografije z WoS o bločnem modeliranju in razvrščanju v omrežjih

Data collection

Web of Science

http://home.izum.si/izum/ft_baze/wos.asp

“block model*” or “network cluster*” or “graph cluster*” or “community detect*” or “blockmodel*” or “block-model*” or “structural equival*” or “regular equival*”

Možnost dopolnitve s “Citing papers” nismo uporabili.

!!! Če želimo shranjevati polne opise (vsa polja) se moramo omejiti na bazo “WoS Core Collection”.

Scopus

RIS files

Transforming bibliography into a collection of networks

WoS2Pajek : Imena del poskušajo najti ravnotežje glede problema identifikacije.

V zadnji različici WoS2PajekT dodano titles in index.

First analyses

Pajek

Osnovno čiščenje: Info ; odstranimo večkratne povezave in zanke:

read cite.net
Network/Create new network/Transform/Remove/Multiple lines/Single line [no]
Network/Create new network/Transform/Remove/Loops [no]

Problemi identifikacije:

  • enote z različnimi imeni - veliko dela; ustvarimo enakovrednostna razbitja in omrežja skrčimo glede na njih. Za ključne besede delno rešujemo s krnjenjem (lemmatization)
  • isto ime pripada več enotam - trenutno se s tem ne ukvarjamo.

Network boundaries

Enote s polnim opisom : sklicne enote.

Razbitje DC ; Info DC

Neupoštevane enote : odvečne enote.

Missing works

Input degrees. List of vertices (1000) with largest degrees. Search for each (degree >= 20) in WoS and add it if found.

read DC.clu
Network/Create partition/Degree/Input
select DC as Second partition
Partitions/Extract subpartition [0]
Select DC as First partition
Operations/Network+Partition/Extract subnetwork [0]
select extacted subpartition as First
partition Info button  [1 +1000]

Dopolnimo podatke iz baze za manjkajoče enote. Ponovno ustvarimo ustrezna omrežja.

Postopek ponavljamo, dokler nismo zadovoljni z dobljenim omrežjem.

Removing unimportant works

We look at the citation network (multiple lines and loops removed).

From the set of “terminal” works (DC=0 and outdeg=0) we remove those with indeg < k. In our case k=2 .

Network/Create partition/Degree/Input
Partition/Binarize partition [2-*]
select DC as Second partition
Partitions/Max
File/Partition/Change Label [boundary partition]
Save Partition Button [boundary.clu]
select simplified network
Operations/Network+Partition/Extract subnetwork [1-*]
File/Network/Change Label [bounded network]
Save Network Button [citeB.net]

Distributions

Lahko si ogledamo različne porazdelitve: število člankov po letih; število sklicov v člankih; število avtorjev v člankih; …

select cite
Network/Create Partition/Degree/input

Zanimive so enote z največjimi vrednostmi v posameznih porazdelitvah: najbolj citirana dela; avtorji z največ deli; …

From WJ.net (simplify, remove **, indegree) we get the journals in which the largest number of papers was published:

Partition Info button [1 +200]

      Rank    Vertex   Cluster     Id
---------------------------------------
         1     80843       862     NATURE
         2     80837       845     P NATL ACAD SCI USA
         3     80856       794     SCIENCE
         4     80931       519     PHYS REV LETT
         5     80848       485     PHYS REV E
         6     81195       477     J GEOPHYS RES-SOL EA
         7     80844       451     LECT NOTES COMPUT SC
         8     80897       437     NUCLEIC ACIDS RES
         9     80847       435     PHYSICA A
        10     80855       420     SOC NETWORKS
        11     80932       406     B SEISMOL SOC AM
        12     80905       404     BIOINFORMATICS
        13     82268       358     TECTONOPHYSICS
        14     81344       354     GEOPHYS J INT
        15     81352       318     J GEOPHYS RES
        16     81227       304     J MOL BIOL
        17     80930       297     PHYS REV B
        18     81115       281     J BIOL CHEM
        19     80846       254     AM J SOCIOL
        20     81906       246     AM SOCIOL REV
        21     81343       245     GEOPHYS RES LETT
        22     81189       233     GEOLOGY
        23     81031       233     NEUROIMAGE
        24     80832       233     IEEE T PATTERN ANAL
        25     81376       201     J CHEM PHYS
        26     81229       198     BIOCHEMISTRY-US
        27     82796       196     APPL ENVIRON MICROB
        28     81198       192     J GEOPHYS RES-SOLID
        29     81080       191     PATTERN RECOGN
        30     81228       189     J AM CHEM SOC
---------------------------------------

The degree partition for Cite.net can be exported in R:

Partition/Copy to Vector
Tools/R/Send to R/Current Vector

or saved in Pajek on the file and read into R:

> setwd("C:/Users/batagelj/work/Python/WoS/BM1")
> d <- read.table(file="indegAll.clu",skip=1)
> t <- table(d)
> head(t)
d
    0     1     2     3     4     5
 2154 66431  6790  2155   959   585
> x <- as.numeric(names(t))
> plot(x,t)
> plot(x,t,log='xy',main="indeg distribution",xlab='deg',ylab='freq',pch=16)

indegall.pdf; outdegall.pdf

Largest indegrees in Cite.net:

      Rank    Vertex   Cluster     Id
---------------------------------------
         1        52       505     GIRVAN_M(2002)99:7821
         2       295       472     NEWMAN_M(2004)69:026113
         3       163       368     FORTUNAT_S(2010)486:75
         4       727       280     FORTUNAT_S(2007)104:36
         5        84       270     NEWMAN_M(2006)103:8577
         6       213       267     PALLA_G(2005)435:814
         7       149       251     WASSERMA_S(1994):
         8        73       249     CLAUSET_A(2004)70:066111
         9       173       245     WATTS_D(1998)393:440
        10       174       233     ZACHARY_W(1977)33:452
        11        70       221     BLONDEL_V(2008):P10008
        12        28       184     NEWMAN_M(2006)74:036104
        13       399       184     NEWMAN_M(2003)45:167
        14       200       180     LANCICHI_A(2008)78:046110
        15      1592       179     ALBERT_R(2002)74:47
        16       158       177     BARABASI_A(1999)286:509
        17       217       171     RADICCHI_F(2004)101:2658
        18       745       170     LORRAIN_F(1971)1:49
        19        75       168     GUIMERA_R(2005)433:895
        20      1809       167     DANON_L(2005):P09008
        21      1692       162     WHITE_H(1976)81:730
        22        27       157     NEWMAN_M(2004)69:066133
        23       185       156     DUCH_J(2005)72:027104
        24       219       151     ROSVALL_M(2008)105:1118
        25       340       134     BURRIDGE_R(1967)57:341
        26        59       133     NEWMAN_M(2004)38:321
        27       865       127     REICHARD_J(2006)74:016110
        28       199       126     LANCICHI_A(2009)11:033015
        29      1105       126     LANCICHI_A(2009)80:056117
        30      1395       111     BREIGER_R(1975)12:328
---------------------------------------

Largest outdegrees in Cite.net:

      Rank    Vertex   Cluster     Id
---------------------------------------
         1      1008       863     BOCCALET_S(2006)424:175
         2      9139       456     TURCOTTE_D(1999)62:1377
         3       399       417     NEWMAN_M(2003)45:167
         4       163       399     FORTUNAT_S(2010)486:75
         5     30604       321     SIBLEY_C(2012)12:505
         6     65979       310     FRANK_K(1998)23:171
         7      8184       297     KAWAMURA_H(2012)84:839
         8      2239       281     DOROGOVT_S(2002)51:1079
         9      8401       275     ARENAS_A(2008)469:93
        10     10842       255     BURT_R(1980)6:79
        11       759       254     WU_F(1982)54:235
        12      1592       208     ALBERT_R(2002)74:47
        13      4965       204     JAIN_A(1999)31:264
        14     42301       200     GRABHER_G(2006)30:163
        15     66177       198     AXT_V(1998)70:145
        16     14176       186     FOGGIA_P(2014)28:1450001
        17      1546       178     ROSSI_R(2015)27:1112
        18       207       175     LU_L(2011)390:1150
        19     69171       174     DAHMEN_K(1996)53:14872
        20     10474       169     AGGARWAL_C(2014)47:10
        21     32083       168     MARKA_S(2012)1260:55
        22     57714       168     RUNDLE_J(2003)41:1019
        23     17049       168     ROBINS_G(2013)57:261
        24     66555       167     FOOKES_P(1997)30:293
        25     38192       166     PAVLOPOU_G(2011)4:
        26     29794       160     MARSDEN_P(1990)16:435
        27       802       158     MALLIARO_F(2013)533:95
        28     45680       155     XU_P(2009)27:636
        29       135       153     MCPHERSO_M(2001)27:415
        30     28532       152     GULATI_R(1999)104:1439
---------------------------------------

Citation network analysis

Za določitev pomembnih delov uporabimo uteži SPC (Search Path Count). Za njihov izračun mora biti omrežje aciklično.

Network/Create Partition/Components/Strong [2]

Če obstajajo, izločimo krepke komponente in si jih ogledamo.

Operations/Network+Partition/Extract subnetwork [1-*]
Draw/Network + First Partition

→ Cyc.net

cyc.pdf

Cikle odpravimo tako, da posamezno komponento stisnemo ali pa nadomestimo s polnim dvodelnim omrežjem (transformacija Preprint).

>>> import sys; wdir = r'C:\Users\Batagelj\work\python\WoS\BM1'; sys.path.append(wdir)
>>> import Preprint3;  Preprint3.run(wdir,'.','citeB.net','cyc.net')

Izračunamo uteži SPC. Dobimo še glavno pot. Narišemo z uporabom makra layers. Boljšo rešitev da običajno postopek CPM (Critical Path Method).

mp.pdf; cpm.pdf

For interpretation of the obtained solutions we save the corresponding subnetworks in Pajek and in R apply on them the function description that creates an (Excel) CSV file with basic data (author, title, journal, year) about the included works.

setwd("C:\\Users\\batagelj\\work\\Python\\WoS\\BM1\\results")
<<< description
T <- read.csv('titles.csv',sep=";",colClasses="character")
T$code <- 1
head(T)
d <- description("CPM.net","CPMnew.csv",T)
head(d)

We can inspect it in Excel. We reorder the works by the publication year.

Podrobnejši vpogled v zgradbo omrežja sklicevanj nam ponuja postopek otokov (islands) Določimo islands [20,150] in islands [20,200]. Spreminja se le največji, glavni otok. Če jih narišemo vse

All islands in range [20,150] and [20,200]: islands.pdf

opazimo, da je več med njimi zvezdastih - nezanimivih. Posamično izločimo le tiste z razgibano zgradbo.

islands
partition/canonical
extract largest islands
draw KK components
select interesting
select an island
extract it
macro layers
manual improvement
export EPS -> PDF
description in R

island1b.pdf

Collaboration among authors

We limit the WA network to works with complete descriptions (DC=1).

Stopnje. Dela z največ avtorji; avtorji z največ deli.

!!! The paper: Aad, G; A neural network clustering algorithm for the ATLAS silicon … has 2847 co-authors !!!

read WA.net
remove multiple lines
Network Info button -> Rows=80822, Cols=46658
Partition/Create Constant Partition [46658 2]
read DC partition
select constant partition as Second
Partitions/Fuse Partitions
Operations/Network+Partition/Extract Subnetwork [Yes][1-*]
File/Network/Change Label [WA bounded]
normalization macro
transpose
select WAb as Second
Networks/Multiply [Yes]
remove loops
pS cores
partition from vector 1.5
extract
draw

collabor.pdf

Citations among authors

cite.net → citeD.net

ACi = AWb * CiteD * WAb

islands [4 50]

aucite.pdf

Numbering of islands (see picture)

55 53 54 48 47 23 45 42
   52 49 50       51 22

Islands and the corresponding keywords

island 54 in island 55

WK.net → WKd.net

AK = AWb * WKd

Cut-out the subnetwork for the authors from the island and look at the weighted indegrees in it:

i - island; k = |K|

Partition/Create Constant Partition [k 999] = CC
select islands partition CI as First partition
select partition CC as Second partition
fuse CI and CC
extract [i 999] -> Ni, Ci
Network/Create Vector/Centrality/Weighted Degree/Input
Operations/Network+Partition/Extract Subnetwork [999] -> NiK, CiK
select partition Ci
Operations/Vector+Partition/Extract Subvector [999] -> ViK
Vector Info Button [+50]

Island 55

      Rank    Vertex                       Value   Id
--------------------------------------------------------
         1         4                    548.0000   network
         2        11                    446.0000   community
         3        29                    387.0000   complex
         4        19                    261.0000   detection
         5        31                    196.0000   structure
         6         5                    190.0000   algorithm
         7        47                    185.0000   model
         8        68                    155.0000   modularity
         9        26                    149.0000   graph
        10        14                    108.0000   social
        11        93                     97.0000   cluster
        12       212                     78.0000   organization
        13       242                     76.0000   metabolic
        14        60                     73.0000   dynamics
        15       185                     70.0000   analysis
        16       222                     63.0000   optimization
        17        71                     61.0000   base
        18       239                     59.0000   random
        19        76                     54.0000   overlap
        20       226                     53.0000   detect
        21       299                     50.0000   identification
        22       577                     50.0000   use
        23       147                     49.0000   web
        24       428                     48.0000   hierarchical
        25        24                     47.0000   method
        26        40                     47.0000   prediction
        27       560                     47.0000   small-world
        28       434                     46.0000   stochastic
        29        67                     43.0000   genetic
        30       437                     42.0000   blockmodel

Island 54

      Rank    Vertex                       Value   Id
--------------------------------------------------------
         1         4                    123.0000   network
         2        14                     83.0000   social
         3       130                     53.0000   structural
         4       432                     50.0000   equivalence
         5       437                     47.0000   blockmodel
         6       216                     46.0000   blockmodeling
         7       219                     45.0000   partition
         8        26                     42.0000   graph
         9       179                     40.0000   generalized
        10       541                     38.0000   balance
        11         7                     36.0000   datum
        12         5                     35.0000   algorithm
        13        47                     34.0000   model
        14       185                     33.0000   analysis
        15       224                     28.0000   two-mode
        16        93                     28.0000   cluster
        17       225                     24.0000   matrix
        18      1315                     22.0000   sign
        19       885                     18.0000   relational
        20       433                     16.0000   role
        21        31                     15.0000   structure
        22       238                     15.0000   multiple
        23      1052                     15.0000   relation
        24       133                     14.0000   position
        25       434                     13.0000   stochastic
        26      3889                     12.0000   relax
        27       222                     12.0000   optimization
        28       606                     12.0000   decomposition
        29        71                     12.0000   base
        30      1615                     11.0000   constraint

Island 53

      Rank    Vertex                       Value   Id
--------------------------------------------------------
         1        47                     80.0000   model
         2       106                     78.0000   earthquake
         3       146                     54.0000   fault
         4       824                     51.0000   self-organized
         5       823                     47.0000   criticality
         6        60                     36.0000   dynamics
         7      2972                     29.0000   slider-block
         8      1286                     20.0000   threshold
         9      1466                     18.0000   seismicity
        10        52                     17.0000   scale
        11       506                     15.0000   san-andreas
        12       685                     15.0000   simulation
        13      1764                     14.0000   critical
        14      6241                     14.0000   nucleation
        15        46                     13.0000   behavior
        16      1700                     12.0000   mechanical
        17       172                     12.0000   fluctuation
        18      4599                     12.0000   cellular-automaton
        19       112                     11.0000   fracture
        20       105                     11.0000   friction
        21      2775                     11.0000   phenomenon
        22      1004                     10.0000   statistical
        23       109                     10.0000   time
        24      1012                      9.0000   stress
        25       190                      9.0000   block
        26      1474                      9.0000   numerical
        27         4                      9.0000   network
        28      3266                      8.0000   forest-fire
        29        23                      8.0000   approach
        30      1405                      8.0000   failure
--------------------------------------------------------

island 23

      Rank    Vertex                       Value   Id
--------------------------------------------------------
         1       190                     67.0000   block
         2       998                     57.0000   drug
         3        47                     55.0000   model
         4      1510                     53.0000   canine
         5      1214                     51.0000   potential
         6      1515                     50.0000   atrioventricular
         7      2328                     50.0000   action
         8       511                     49.0000   heart
         9      2895                     47.0000   chronic
        10      5867                     43.0000   av
        11      1411                     41.0000   repolarization
        12      1666                     37.0000   susceptibility
        13      2912                     37.0000   ventricular
        14      2909                     37.0000   arrhythmia
        15      6914                     34.0000   atrioventricular-block
        16      6742                     34.0000   antiarrhythmic
        17      1389                     34.0000   syndrome
        18      1506                     32.0000   qt
        19      1939                     31.0000   dog
        20       968                     31.0000   long
        21      4003                     29.0000   vivo
        22      3209                     29.0000   hypertrophy
        23      1511                     29.0000   torsade
        24      5872                     28.0000   torsades-de-pointes
        25        41                     28.0000   effect
        26       257                     25.0000   duration
        27       736                     24.0000   profile
        28      8342                     23.0000   torsade-de-pointes
        29      8178                     22.0000   monophasic
        30      2910                     22.0000   prolongation
--------------------------------------------------------

Journals (to do)

Zgleda, da so v WoS pretežno poenotili zapise imen revij (del).

AJ = AW * WJ

avtor a je v reviji j objavil aj[a,j] člankov

JJ = JA * bin(AJ)

v reviji i so avtorji, ki so objavili tudi v reviji j, objavili jj[i,j] člankov

seminar/en1248.txt · Last modified: 2015/05/28 08:13 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki