====== Cluster analysis ======
* ''C:\Users\batagelj\Documents\papers\2017\sreda\slides''
* ''C:\Users\batagelj\work\Python\WoS\BM\results\cluster''
* ''C:\Users\batagelj\Documents\papers\2012\cladag\slides\cladag12VB.pdf''
Look at http://vladowiki.fmf.uni-lj.si/doku.php?id=notes:clu:counties:pajek
===== Bibliographic coupling =====
At ''C:\Users\batagelj\work\Python\WoS\BM'' we read into Pajek the file ''citeC.net''. It has many components, 110 nontrivial. We extract the main component ''citeMain.net'' (3994 nodes).
First we compute the bibliographic coupling network **biCo**
select / read the citation network
Network/Create new network/Transform/Transpose 1-mode
select transposed as Second network
select citation network as First
Networks/Multiply networks
Network/Create new network/Transform/Remove/Loops
Network/Create new network/Transform/Arcs->Edges/Bidirect only/Min
The commands are saved as macro ''biCo''. Inspecting its link values distribution we decided to cut it at level 25.
The **biCo** network has 177 components, 5 nontrivial (sizes: 3808, 6, 4, 2, 2).
===== Fractional bibliographic coupling =====
Using the Pajek macro ''biCon'' we produce networks ''Jaccard'' and ''Hammond''. We analyze the Jaccard network using islands - see slides.
==== Clustering ====
We restrict our analysis to the largest connected component HammondMain of Hammond network.
Network/Create Hierarchy/Clustering with Relational Constraints/Run [Max Tolerant]
save partition Clustering with relational constraint (tree) [Max/Tolerant] to MaxTol.clu
save vector Clustering with relational constraint (heights) [Max/Tolerant] to heightMaxTol.vec
save vector Clustering with relational constraint (size) [Max/Tolerant] to sizeMaxTol.vec
select HammondMain
Network/Create new network/Transform/Remove/All edges
save network to HammondNam.net
We continue in R.
> setwd("C:\\Users\\batagelj\\work\\Python\\WoS\\BM\\results\\cluster")
> source("C:\\Users\\batagelj\\work\\Python\\WoS\\BM\\results\\cluster\\Pajek2R.R")
> source("C:\\Users\\batagelj\\work\\Python\\WoS\\BM\\results\\cluster\\varCutTree.R")
> RC <- Pajek2R("MaxTol.clu")
> n <- RC$n; nm <- n-1; np <- n+1
> rCount <- varCutree(RC,rep(1,n),5,400)
> t <- table(rCount$part)
> out <- file("CMaxTot1.clu","w"); cat(paste("*vertices ",n),rCount$part,sep="\n",file=out); close(out)
> t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
375 85 35 28 65 354 94 48 55 386 23 228 34 6 80 103 16 11 8 27
21 22 23 24 25 26 27 28 29 30 31 32 33 34
82 335 17 265 6 37 204 28 234 68 80 159 23 209
> orDendro <- function(m,i){if(i<0) return(-i)
+ return(c(orDendro(m,m[i,1]),orDendro(m,m[i,2])))}
>
> orSize <- function(m,i){if(i<0) return(1)
+ s[i] <<- orSize(m,m[i,1])+orSize(m,m[i,2])
+ return(s[i])}
> HC <- read.csv("heightMaxTol.vec",header=FALSE,skip=np)[[1]]
> colT <- c("integer","character","numeric","numeric","numeric")
> nam <- read.csv("HammondNam.net",header=FALSE,skip=1,sep="",colClasses=colT,nrows=n)
> RC$height <- HC
> RC$method <- "Maximum/Tolerant"
> RC$dist.method <- "Normalized Hamming"
> RC$labels <- nam
> class(RC) <- "hclust"
> RC$call <- "Pajek.data"
> RC$order <- orDendro(RC$merge,nm)
> plot(RC,hang=-1,cex.axis=4,main="Maximum/Tolerant",lwd=0.01,oma=c(5,5,5,5))
Error in graphics:::plotHclust(n, merge, height, order(x$order), hang, :
invalid dendrogram input