====== Analysis: authors ====== [[notes:imfm:corona:s2orcmeta|S2ORC metadata networks]] ===== WA indegrees ===== From the list of 100 largest entries in the indegrees vector [[.:WAin|WA indegrees]] we see: * many Chinese authors - can be homonyms * empty name -> correct into ''Anonymous'' * strange names: ''O039'', ''D039'', ''Prevention, Centers for Disease Control and'' * O039: Complete or unspecified spontaneous abortion without complication [[https://1up.health/health-data/icd10/id/O039|O039]] * D039: Tetrachloroethylene [[https://www.epa.gov/sites/production/files/2015-09/documents/hwid05.pdf|Hazardous Waste Codes]] Solutions: * The first author last name ''name'' computed with firstAu = Au[0].strip() if len(Au)>0 else "Anonymous" name = firstAu.split(",")[0] if len(firstAu)>0 else "Anonymous" ===== Distribution of the number of authors ===== In Pajek Select network WA Network/2-mode Network/Partition 2-mode Network/Create Vector/Centrality/Degree/Output Operations/Vector+Partition/Extract Subvector [1] Save vector [Aout.vec] and in R > wdir <- "C:/Users/batagelj/Documents/2020/corona/MetaTit" > setwd(wdir) > na <- read.table('Aout.vec',skip=1)[,1] > length(na) [1] 375094 > f <- table(na) > length(f) [1] 163 > n <- as.integer(names(f)) > plot(n,f) > plot(n,f,log='xy',pch=16) {{notes:imfm:corona:pics:aout.png?400}} ===== Normalized collaboration network ===== WAn = norm(WA). Cn = WAnT * WAn (see [[https://link.springer.com/article/10.1007%2Fs11192-012-0940-1|BC]]). In Pajek (norm2.mcr from [[https://github.com/bavla/biblio/tree/master/Pajek/macro|Biblio]]) Select network WA Macro/Play [norm2.mcr] [375094] Network/2-mode Network/Transpose 2-mode Select normalized network as Second network Networks/Multiply Networks Network/Create Vector/Get Loops ==== Top 100 authors for fractional contribution ==== Cn[a,a] = fractional contribution of the author a to coauthored works [[.:CnLoops|Top 100 authors for fractional contribution]] We see: * Chinese authors disappear. * strange authors: Anonymous,; Prevention, Centers for Disease Control and; Organization, World Health; Facharztmagazine, Redaktion; O039; Agency, U. S. Federal Emergency Management; The Lancet Infectious Diseases; Brasil. Ministério da Saúde. Secretaria de Ciência, Tecnologia Inovação e Insumos Estratégicos em Saúde . * add BOM to net files. ==== Ps cores ==== First transform Cn to undirected network. Select network Cn Network/Create New Network/Transform/Remove Loops Network/Create New Network/Transform/Arcs to Edges/Bidirected Only/Sum ValuesNetwork/Create Vector/Generalized Cores/Sum/All Info Ps vector [+200] Vector/Make Partition/by Intervals/Selected [5.0] Operations/Network+Partition/Extract/Subnetwork Induced ... [2] Each node of the Ps core has the internal sum of weights (weighted degree) at least 5: {{notes:imfm:corona:pics:ApsCore.pdf|Ps core at level 5.0}} We see: * synonyms: Joob, Beuy - Wiwanitkit, Viroj; Joob, B. - Wiwanitkit, V.; Joob, B - Wiwanitkit, V; Memish, Ziad A. - Al-Tawfiq, Jaffar A.; Memish, Ziad A - Memish, Ziad A; ==== Simple islands ==== Single peak islands - single topic. Network/Create Partition/Islands/Line Weights (simple) [2,100] [yes] Operations/Network+Partition/Extract/Subnetwork Induced ... [92040-92158] Partition/Canonical Partition/with Decreasing Freq Info Partition [+30] Operations/Network+Partition/Extract/Subnetwork Induced ... [1-18] There are 92158 simple islands with their size in [2,100] {{notes:imfm:corona:pics:aislaSM.pdf|Top Cn simple islands for heights}} {{notes:imfm:corona:pics:aislaSLn.pdf|Top Cn simple islands for size}} We see: * All caps names: STRAUSS, JAMES H.; STRAUSS, ELLEN G. * PhD in names: Allison E. James, PhD; Megan Wallace, DrPH; Theresa Sokol, M. P. H.; Catherine M. Brown, D. V. M.; Ellen Shelley, D. N. P.; Grace Philips, J. D.; David Selvage, M. H. S.; Soliman Hesham, M. D.; Marie E. Killerby, VetMB * Very long spanish names (Unicode?): Carballada Gonzˆ¡lez, Francisco Nˆ”ˆ–ez Orjales, Ramˆ‡n Martin Lˆ¡zaro; Mar Abad Garcˆ�a, Marˆ�a Gloria ˆlvarez Silveiro Marˆ�a Carmen Coria Abel * Semi-caps Chinese names: LIU, Zhirong; Xiaoyan, WU; ZHANG, Yi; Liya, MA * COVID-19 PPC group * Bootsma, M. C.; Bootsma, M.; Bonten, M. J.; Bonten, M.; Solutions: * Unicode BOM: replace encoding with ''encoding="utf-8-sig"'' . ==== Islands ==== Network/Create Partition/Islands/Line Weights [2,50] [yes] Save Islands partition to the file NcIsla2.clu Save Islands heights vector to the file NcIslaH2.vec There are 81934 islands with size in [2,50]. For each island we know its size and its height. I selected 50 top islands for the weight = sqrt(size-1)*height. wdir <- "C:/Users/batagelj/Documents/2020/corona/MetaTit" setwd(wdir) h <- read.table('NcIslaH2.vec',skip=1)[,1] C <- read.table('NcIsla2.clu',skip=1)[,1] N <- length(C); k <- max(C); H <- rep(-1,k) for(i in 1:N){ j <- C[i] if(j > 0){if(H[j] < 0) H[j] <- h[i]} } w <- sqrt(s[2:(k+1)]-1)*H r <- cbind(s[2:(k+1)],H,w) q <- order(w,decreasing=TRUE) R <- r[q,] > R[1:50,] H w 81934 2 7.2500000 7.250000 77577 42 0.4952564 3.171188 81933 2 3.0000000 3.000000 ... 76851 42 0.2630385 1.684269 62911 49 0.2405513 1.666588 62883 50 0.2361111 1.652778 71695 44 0.2500000 1.639360 ... 81913 2 1.5000000 1.500000 81914 2 1.5000000 1.500000 > paste(row.names(R[1:50,]),collapse=",") [1] "81934,77577,81933,81932,81931,81924,81918,81927,81928,81929,81930,77063,76992,81903,79495,81925,81926, 81901,76826,76782,67555,81916,81917,81919,81920,81921,81922,81923,76851,62911,62883,71695,62902,62930, 62834,62835,62316,64394,65412,81904,81905,81906,81907,81908,81909,81910,81911,81912,81913,81914" Operations/Network+Partition/Extract/Subnetwork Induced ... [list from R] Network/Create New Network/Transform/Line Values/Absolute+Sqrt {{notes:imfm:corona:pics:aisla.pdf|Top Cn islands}} For visualization because of the large range, the weights on edges were transformed using sqrt. The picture has to be improved manually. We see: * synonyms: Wiwantikit, Viroj; Wiwanikit, Viroj; Wiwanitkit, Viroj; Mungmunpuntipantip, Rujittika; Mungmungpuntipantip, Rujittika; Beuy, Joob; Joob, Beuy; Smith, Everett Clinton; Smith, Everett C; Denison, Mark R.; Denison, Mark R; Coleman, Christopher M; Coleman, Christopher M.; Frieman, Matthew B.; Frieman, Matthew; Frieman, Matthew B; Baric, Ralph; Baric, Ralph S