May, 6-7, 2017
I first tried to convert the source file ListOfAuthors_Authorship affiliation.txt
into a CSV file readable by Excel. I replaced all ”;” with “§” and afterwards all tabs “\t” by ”;”. There was an error in line 4691. I also added a header. I saved it to the file affil.csv
. It opens in Excel.
Trying to read the file affil.csv
in R it is reporting some problems. To bypass them I used
> setwd("C:/Users/batagelj/Documents/2017/malceva/elib") > csv <- file("affil.csv","r") > lines <- readLines(csv) > length(lines) [1] 4979 > close(csv) > Encoding(lines) <- "UTF-8" > S <- strsplit(lines,"§") > S[[1]] [1] "workID;authID;authName;instID;instName" > S <- strsplit(lines,";") > S[[1]] [1] "workID" "authID" "authName" "instID" "instName" > S[[2]] [1] "10229737" [2] "185451" [3] "ЗЫРЯНОВ СЕРГЕЙ ГРИГОРЬЕВИЧ" [4] "994084882" [5] "Российская академия естественных наук по Южно-Уральскому центру геополитики и управления Челябинский институт (филиал) Уральской академии государственной службы" > > n <- length(S); nm <- n-1 > wId <- vector(mode="character",length=nm) > aId <- vector(mode="character",length=nm) > aNm <- vector(mode="character",length=nm) > iId <- vector(mode="character",length=nm) > iNm <- vector(mode="character",length=nm) > for(i in 2:n){wId[i-1] <- S[[i]][1]; aId[i-1] <- S[[i]][2]; + aNm[i-1] <- S[[i]][3]; iId[i-1] <- S[[i]][4]; iNm[i-1] <- S[[i]][5] } > wIn <- factor(wId); wlev <- levels(wIn) > aIn <- factor(aId); alev <- levels(aIn) > nw <- length(wlev); na <- length(alev) > aname <- vector(mode="character",length=na) > for(i in 1:nm){ + ina <- as.integer(aIn[i]) + if(aname[ina]=="") aname[ina] <- aNm[i] else + if(aname[ina]!=aNm[i]) cat("***",i,ina,aId[i],aname[ina],aNm[i],"\n",sep=" ") + }
There are authors that are using different names - see the list.
Now we are ready to export the WA network file in Pajek format
> Encoding(aname) <- "UTF-8" > net <- file("WA.net","w") > writeLines(paste("*vertices ",nw+na," ",nw,sep=""),net,useBytes=T) > for(i in 1:nw) writeLines(paste(i,' "',wlev[i],'"',sep=""),net,useBytes=T) > for(i in 1:na) writeLines(paste(nw+i,' "',aname[i],'"',sep=""),net,useBytes=T) > writeLines("*arcs",net,useBytes=T) > for(i in 1:nm) writeLines(paste(as.integer(wIn[i]),nw+as.integer(aIn[i]),sep=" "),net,useBytes=T) > close(net)
There was some searching on Google to learn how to write out from R an UTF-8 encoded file.
In an text editor I added some comments and saved the file as a UTF-8 encoded with BOM (signature).
We prepare also a CSV file linking author's ID with his/her name
> lst <- file("authList.csv","w") > writeLines("index;authID;authName",lst,useBytes=T) > for(i in 1:na) writeLines(paste(i,alev[i],aname[i],sep=";"),lst,useBytes=T) > close(lst)