WA works X authors network

May, 6-7, 2017

I first tried to convert the source file ListOfAuthors_Authorship affiliation.txt into a CSV file readable by Excel. I replaced all ”;” with “§” and afterwards all tabs “\t” by ”;”. There was an error in line 4691. I also added a header. I saved it to the file affil.csv. It opens in Excel.

Trying to read the file affil.csv in R it is reporting some problems. To bypass them I used

> setwd("C:/Users/batagelj/Documents/2017/malceva/elib")
> csv <- file("affil.csv","r")
> lines <- readLines(csv)
> length(lines)
[1] 4979
> close(csv)
> Encoding(lines) <- "UTF-8"
> S <- strsplit(lines,"§")
> S[[1]]
[1] "workID;authID;authName;instID;instName"
> S <- strsplit(lines,";")
> S[[1]]
[1] "workID"    "authID"   "authName" "instID"   "instName"
> S[[2]]
[1] "10229737"                        
[2] "185451"            
[3] "ЗЫРЯНОВ СЕРГЕЙ ГРИГОРЬЕВИЧ"       
[4] "994084882"            
[5] "Российская академия естественных наук по Южно-Уральскому центру геополитики и управления 
     Челябинский институт (филиал) Уральской академии государственной службы"
> 
> n <- length(S); nm <- n-1
> wId <- vector(mode="character",length=nm)
> aId <- vector(mode="character",length=nm)
> aNm <- vector(mode="character",length=nm)
> iId <- vector(mode="character",length=nm)
> iNm <- vector(mode="character",length=nm)
> for(i in 2:n){wId[i-1] <- S[[i]][1]; aId[i-1] <- S[[i]][2];
+   aNm[i-1] <- S[[i]][3]; iId[i-1] <- S[[i]][4]; iNm[i-1] <- S[[i]][5] }
> wIn <- factor(wId); wlev <- levels(wIn)
> aIn <- factor(aId); alev <- levels(aIn)
> nw <- length(wlev); na <- length(alev)
> aname <- vector(mode="character",length=na)
> for(i in 1:nm){
+    ina <- as.integer(aIn[i])
+    if(aname[ina]=="") aname[ina] <- aNm[i] else
+    if(aname[ina]!=aNm[i]) cat("***",i,ina,aId[i],aname[ina],aNm[i],"\n",sep=" ")
+ }

There are authors that are using different names - see the list.

Now we are ready to export the WA network file in Pajek format

> Encoding(aname) <- "UTF-8"
> net <- file("WA.net","w")
> writeLines(paste("*vertices ",nw+na," ",nw,sep=""),net,useBytes=T)
> for(i in 1:nw) writeLines(paste(i,' "',wlev[i],'"',sep=""),net,useBytes=T)
> for(i in 1:na) writeLines(paste(nw+i,' "',aname[i],'"',sep=""),net,useBytes=T)
> writeLines("*arcs",net,useBytes=T)
> for(i in 1:nm) writeLines(paste(as.integer(wIn[i]),nw+as.integer(aIn[i]),sep=" "),net,useBytes=T)
> close(net)

There was some searching on Google to learn how to write out from R an UTF-8 encoded file.

In an text editor I added some comments and saved the file as a UTF-8 encoded with BOM (signature).

We prepare also a CSV file linking author's ID with his/her name

> lst <- file("authList.csv","w")
> writeLines("index;authID;authName",lst,useBytes=T)
> for(i in 1:na) writeLines(paste(i,alev[i],aname[i],sep=";"),lst,useBytes=T)
> close(lst)
notes/net/dm/wa0.txt · Last modified: 2017/05/08 18:26 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki