====== Cluster analysis ====== * ''C:\Users\batagelj\Documents\papers\2017\sreda\slides'' * ''C:\Users\batagelj\work\Python\WoS\BM\results\cluster'' * ''C:\Users\batagelj\Documents\papers\2012\cladag\slides\cladag12VB.pdf'' Look at http://vladowiki.fmf.uni-lj.si/doku.php?id=notes:clu:counties:pajek ===== Bibliographic coupling ===== At ''C:\Users\batagelj\work\Python\WoS\BM'' we read into Pajek the file ''citeC.net''. It has many components, 110 nontrivial. We extract the main component ''citeMain.net'' (3994 nodes). First we compute the bibliographic coupling network **biCo** select / read the citation network Network/Create new network/Transform/Transpose 1-mode select transposed as Second network select citation network as First Networks/Multiply networks Network/Create new network/Transform/Remove/Loops Network/Create new network/Transform/Arcs->Edges/Bidirect only/Min The commands are saved as macro ''biCo''. Inspecting its link values distribution we decided to cut it at level 25. The **biCo** network has 177 components, 5 nontrivial (sizes: 3808, 6, 4, 2, 2). ===== Fractional bibliographic coupling ===== Using the Pajek macro ''biCon'' we produce networks ''Jaccard'' and ''Hammond''. We analyze the Jaccard network using islands - see slides. ==== Clustering ==== We restrict our analysis to the largest connected component HammondMain of Hammond network. Network/Create Hierarchy/Clustering with Relational Constraints/Run [Max Tolerant] save partition Clustering with relational constraint (tree) [Max/Tolerant] to MaxTol.clu save vector Clustering with relational constraint (heights) [Max/Tolerant] to heightMaxTol.vec save vector Clustering with relational constraint (size) [Max/Tolerant] to sizeMaxTol.vec select HammondMain Network/Create new network/Transform/Remove/All edges save network to HammondNam.net We continue in R. > setwd("C:\\Users\\batagelj\\work\\Python\\WoS\\BM\\results\\cluster") > source("C:\\Users\\batagelj\\work\\Python\\WoS\\BM\\results\\cluster\\Pajek2R.R") > source("C:\\Users\\batagelj\\work\\Python\\WoS\\BM\\results\\cluster\\varCutTree.R") > RC <- Pajek2R("MaxTol.clu") > n <- RC$n; nm <- n-1; np <- n+1 > rCount <- varCutree(RC,rep(1,n),5,400) > t <- table(rCount$part) > out <- file("CMaxTot1.clu","w"); cat(paste("*vertices ",n),rCount$part,sep="\n",file=out); close(out) > t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 375 85 35 28 65 354 94 48 55 386 23 228 34 6 80 103 16 11 8 27 21 22 23 24 25 26 27 28 29 30 31 32 33 34 82 335 17 265 6 37 204 28 234 68 80 159 23 209 > orDendro <- function(m,i){if(i<0) return(-i) + return(c(orDendro(m,m[i,1]),orDendro(m,m[i,2])))} > > orSize <- function(m,i){if(i<0) return(1) + s[i] <<- orSize(m,m[i,1])+orSize(m,m[i,2]) + return(s[i])} > HC <- read.csv("heightMaxTol.vec",header=FALSE,skip=np)[[1]] > colT <- c("integer","character","numeric","numeric","numeric") > nam <- read.csv("HammondNam.net",header=FALSE,skip=1,sep="",colClasses=colT,nrows=n) > RC$height <- HC > RC$method <- "Maximum/Tolerant" > RC$dist.method <- "Normalized Hamming" > RC$labels <- nam > class(RC) <- "hclust" > RC$call <- "Pajek.data" > RC$order <- orDendro(RC$merge,nm) > plot(RC,hang=-1,cex.axis=4,main="Maximum/Tolerant",lwd=0.01,oma=c(5,5,5,5)) Error in graphics:::plotHclust(n, merge, height, order(x$order), hang, : invalid dendrogram input