====== Analysis: authors ======
[[notes:imfm:corona:s2orcmeta|S2ORC metadata networks]]
===== WA indegrees =====
From the list of 100 largest entries in the indegrees vector
[[.:WAin|WA indegrees]]
we see:
* many Chinese authors - can be homonyms
* empty name -> correct into ''Anonymous''
* strange names: ''O039'', ''D039'', ''Prevention, Centers for Disease Control and''
* O039: Complete or unspecified spontaneous abortion without complication [[https://1up.health/health-data/icd10/id/O039|O039]]
* D039: Tetrachloroethylene [[https://www.epa.gov/sites/production/files/2015-09/documents/hwid05.pdf|Hazardous Waste Codes]]
Solutions:
* The first author last name ''name'' computed with
firstAu = Au[0].strip() if len(Au)>0 else "Anonymous"
name = firstAu.split(",")[0] if len(firstAu)>0 else "Anonymous"
===== Distribution of the number of authors =====
In Pajek
Select network WA
Network/2-mode Network/Partition 2-mode
Network/Create Vector/Centrality/Degree/Output
Operations/Vector+Partition/Extract Subvector [1]
Save vector [Aout.vec]
and in R
> wdir <- "C:/Users/batagelj/Documents/2020/corona/MetaTit"
> setwd(wdir)
> na <- read.table('Aout.vec',skip=1)[,1]
> length(na)
[1] 375094
> f <- table(na)
> length(f)
[1] 163
> n <- as.integer(names(f))
> plot(n,f)
> plot(n,f,log='xy',pch=16)
{{notes:imfm:corona:pics:aout.png?400}}
===== Normalized collaboration network =====
WAn = norm(WA). Cn = WAnT * WAn (see [[https://link.springer.com/article/10.1007%2Fs11192-012-0940-1|BC]]).
In Pajek (norm2.mcr from [[https://github.com/bavla/biblio/tree/master/Pajek/macro|Biblio]])
Select network WA
Macro/Play [norm2.mcr] [375094]
Network/2-mode Network/Transpose 2-mode
Select normalized network as Second network
Networks/Multiply Networks
Network/Create Vector/Get Loops
==== Top 100 authors for fractional contribution ====
Cn[a,a] = fractional contribution of the author a to coauthored works
[[.:CnLoops|Top 100 authors for fractional contribution]]
We see:
* Chinese authors disappear.
* strange authors: Anonymous,; Prevention, Centers for Disease Control and; Organization, World Health; Facharztmagazine, Redaktion; O039; Agency, U. S. Federal Emergency Management; The Lancet Infectious Diseases; Brasil. Ministério da Saúde. Secretaria de Ciência, Tecnologia Inovação e Insumos Estratégicos em Saúde .
* add BOM to net files.
==== Ps cores ====
First transform Cn to undirected network.
Select network Cn
Network/Create New Network/Transform/Remove Loops
Network/Create New Network/Transform/Arcs to Edges/Bidirected Only/Sum ValuesNetwork/Create Vector/Generalized Cores/Sum/All
Info Ps vector [+200]
Vector/Make Partition/by Intervals/Selected [5.0]
Operations/Network+Partition/Extract/Subnetwork Induced ... [2]
Each node of the Ps core has the internal sum of weights (weighted degree) at least 5:
{{notes:imfm:corona:pics:ApsCore.pdf|Ps core at level 5.0}}
We see:
* synonyms: Joob, Beuy - Wiwanitkit, Viroj; Joob, B. - Wiwanitkit, V.; Joob, B - Wiwanitkit, V; Memish, Ziad A. - Al-Tawfiq, Jaffar A.; Memish, Ziad A - Memish, Ziad A;
==== Simple islands ====
Single peak islands - single topic.
Network/Create Partition/Islands/Line Weights (simple) [2,100] [yes]
Operations/Network+Partition/Extract/Subnetwork Induced ... [92040-92158]
Partition/Canonical Partition/with Decreasing Freq
Info Partition [+30]
Operations/Network+Partition/Extract/Subnetwork Induced ... [1-18]
There are 92158 simple islands with their size in [2,100]
{{notes:imfm:corona:pics:aislaSM.pdf|Top Cn simple islands for heights}}
{{notes:imfm:corona:pics:aislaSLn.pdf|Top Cn simple islands for size}}
We see:
* All caps names: STRAUSS, JAMES H.; STRAUSS, ELLEN G.
* PhD in names: Allison E. James, PhD; Megan Wallace, DrPH; Theresa Sokol, M. P. H.; Catherine M. Brown, D. V. M.; Ellen Shelley, D. N. P.; Grace Philips, J. D.; David Selvage, M. H. S.; Soliman Hesham, M. D.; Marie E. Killerby, VetMB
* Very long spanish names (Unicode?): Carballada Gonzˆ¡lez, Francisco Nˆ”ˆ–ez Orjales, Ramˆ‡n Martin Lˆ¡zaro; Mar Abad Garcˆ�a, Marˆ�a Gloria ˆlvarez Silveiro Marˆ�a Carmen Coria Abel
* Semi-caps Chinese names: LIU, Zhirong; Xiaoyan, WU; ZHANG, Yi; Liya, MA
* COVID-19 PPC group
* Bootsma, M. C.; Bootsma, M.; Bonten, M. J.; Bonten, M.;
Solutions:
* Unicode BOM: replace encoding with ''encoding="utf-8-sig"'' .
==== Islands ====
Network/Create Partition/Islands/Line Weights [2,50] [yes]
Save Islands partition to the file NcIsla2.clu
Save Islands heights vector to the file NcIslaH2.vec
There are 81934 islands with size in [2,50].
For each island we know its size and its height. I selected 50 top islands for the weight = sqrt(size-1)*height.
wdir <- "C:/Users/batagelj/Documents/2020/corona/MetaTit"
setwd(wdir)
h <- read.table('NcIslaH2.vec',skip=1)[,1]
C <- read.table('NcIsla2.clu',skip=1)[,1]
N <- length(C); k <- max(C); H <- rep(-1,k)
for(i in 1:N){ j <- C[i]
if(j > 0){if(H[j] < 0) H[j] <- h[i]}
}
w <- sqrt(s[2:(k+1)]-1)*H
r <- cbind(s[2:(k+1)],H,w)
q <- order(w,decreasing=TRUE)
R <- r[q,]
> R[1:50,]
H w
81934 2 7.2500000 7.250000
77577 42 0.4952564 3.171188
81933 2 3.0000000 3.000000
...
76851 42 0.2630385 1.684269
62911 49 0.2405513 1.666588
62883 50 0.2361111 1.652778
71695 44 0.2500000 1.639360
...
81913 2 1.5000000 1.500000
81914 2 1.5000000 1.500000
> paste(row.names(R[1:50,]),collapse=",")
[1] "81934,77577,81933,81932,81931,81924,81918,81927,81928,81929,81930,77063,76992,81903,79495,81925,81926,
81901,76826,76782,67555,81916,81917,81919,81920,81921,81922,81923,76851,62911,62883,71695,62902,62930,
62834,62835,62316,64394,65412,81904,81905,81906,81907,81908,81909,81910,81911,81912,81913,81914"
Operations/Network+Partition/Extract/Subnetwork Induced ... [list from R]
Network/Create New Network/Transform/Line Values/Absolute+Sqrt
{{notes:imfm:corona:pics:aisla.pdf|Top Cn islands}}
For visualization because of the large range, the weights on edges were transformed using sqrt.
The picture has to be improved manually.
We see:
* synonyms: Wiwantikit, Viroj; Wiwanikit, Viroj; Wiwanitkit, Viroj; Mungmunpuntipantip, Rujittika; Mungmungpuntipantip, Rujittika; Beuy, Joob; Joob, Beuy; Smith, Everett Clinton; Smith, Everett C; Denison, Mark R.; Denison, Mark R; Coleman, Christopher M; Coleman, Christopher M.; Frieman, Matthew B.; Frieman, Matthew; Frieman, Matthew B; Baric, Ralph; Baric, Ralph S