https://github.com/bavla/Rnet https://figshare.com/authors/Loet_Leydesdorff/684994 https://www.leydesdorff.net/ https://documenter.getpostman.com/view/7840038/SzYaVdeo?version=latest https://www.leydesdorff.net/jcr07/cited/ https://www.aminer.org/citation https://www.citnetexplorer.nl/ https://harzing.com/resources/publish-or-perish https://linqs.soe.ucsc.edu/data https://www.slideshare.net/MasoudMohammadi5/citation-analysis-85720471 https://github.com/rcsb/BioCaddiePilot32/blob/master/src/main/resources/NetworkAnalysis.md https://www.bibliometrix.org/ https://networkrepository.com/cit.php
Combine networks from https://www.leydesdorff.net/jcr07/cited/ into a single network for each year.
At the first sight it seems that the files are numbered with consecutive numbers. But it turned out that this assumption doesn't hold.
<b>Cited 2007:</b><br> <A HREF="v1.txt">Aapg Bulletin</A><BR> <A HREF="v2.txt">Aaps Journal</A><BR> <A HREF="v3.txt">Aaps Pharmscitech</A><BR> <A HREF="v4.txt">Aatcc Review</A><BR> <A HREF="v10001.txt">Abacus A Journal Of Accounting Finance And Business Studies</A><BR> <A HREF="v5.txt">Abdominal Imaging</A><BR> ... <A HREF="v6416.txt">Zuckerindustrie</A><BR> <A HREF="v11865.txt">Zygon</A><BR> <A HREF="v6417.txt">Zygote</A><BR>
I cut-out from the html description of the page https://www.leydesdorff.net/jcr07/cited/ the list of all files and saved it in the file list07.txt
. From it I extracted in R the corresponding file names.
> wdir <- "C:/Users/vlado/work2/data/nets/loet" > setwd(wdir) > source("Pajek.R") > > # https://www.leydesdorff.net/jcr07/cited/v4.txt > > cat("Start:",format(Sys.time(), "%H:%M:%S"),"\n") > durl <- "https://www.leydesdorff.net/jcr07/cited/" > L <- readLines("list07.txt"); L <- L[tolower(substr(L,1,2))=="<a"] > S <- unlist(strsplit(L,'"')); F <- tolower(S[3*(1:length(L))-1]) > m <- 0; mmax <- 800000; ner <- 0 > U <- rep(NA,mmax); V <- rep(NA,mmax); W <- rep(NA,mmax) > N <- c(); N["§{@@@@@@@@}§"] <- 0; j <- 0 > for(f in F){ + j <- j+1 + if(j %% 100==0) {cat(j,":",format(Sys.time(), "%H:%M:%S"),"\n"); flush.console()} + page <- paste(durl,f,sep='') + M <- net2matrix(page,warn=-1) + if(is.na(M)){ner <- ner+1 + cat("\n",j,"error in file",page,"\n"); flush.console(); next + } + nn <- nrow(M); Nam <- row.names(M) + for(u in 1:nn) for(v in u:nn){ + if(M[u,v]!=0){ + indu <- N[Nam[u]]; if(is.na(indu)) indu <- N[Nam[u]] <- length(N) + indv <- N[Nam[v]]; if(is.na(indv)) indv <- N[Nam[v]] <- length(N) + m <- m+1; U[m] <- indu; V[m] <- indv; W[m] <- M[u,v] + } + } + } > cat("end:",format(Sys.time(), "%H:%M:%S"),"\n",ner," errors\n") > > uvLab2net(names(N)[2:length(N)],U[1:m],V[1:m],W[1:m],Net="JCR07.net")
The program was interrupted
Error in file(file, "r") : cannot open the connection
To continue the network construction in the interruption point I entered a slightly changed part of the program (valid also for following interruptions)
> cat("Start:",format(Sys.time(), "%H:%M:%S"),"\n") > F1 <- F[j:length(F)]; j <- j-1 > for(f in F1){ + j <- j+1 ... + } + } > cat("end:",format(Sys.time(), "%H:%M:%S"),"\n",ner," errors\n")
The obtained network JCR*.net
has many multiple links with different weights. In Pajek I transform it into the corresponding simple network JCR*s.net
with the maximum weight on each link.
Network Nodes LinksC LinksS AvDegree JCR4 7251 463056 109036 30.075 JCR5 7397 549421 120835 32.671 JCR6 7487 543443 119713 31.979 JCR7 7769 552335 124208 31.975
LinksC
is the number of links in the network JCR*.net
and LinksS
is the number of links in the network JCR*s.net
. AvDegree
is the average degree in the network JCR*s.net
.
The Pajek NET files are available at Github/Bavla.