Cluster analysis

  • C:\Users\batagelj\Documents\papers\2017\sreda\slides
  • C:\Users\batagelj\work\Python\WoS\BM\results\cluster
  • C:\Users\batagelj\Documents\papers\2012\cladag\slides\cladag12VB.pdf

Look at http://vladowiki.fmf.uni-lj.si/doku.php?id=notes:clu:counties:pajek

Bibliographic coupling

At C:\Users\batagelj\work\Python\WoS\BM we read into Pajek the file citeC.net. It has many components, 110 nontrivial. We extract the main component citeMain.net (3994 nodes).

First we compute the bibliographic coupling network biCo

select / read the citation network
Network/Create new network/Transform/Transpose 1-mode
select transposed as Second network
select citation network as First
Networks/Multiply networks
Network/Create new network/Transform/Remove/Loops
Network/Create new network/Transform/Arcs->Edges/Bidirect only/Min

The commands are saved as macro biCo. Inspecting its link values distribution we decided to cut it at level 25.

The biCo network has 177 components, 5 nontrivial (sizes: 3808, 6, 4, 2, 2).

Fractional bibliographic coupling

Using the Pajek macro biCon we produce networks Jaccard and Hammond. We analyze the Jaccard network using islands - see slides.

Clustering

We restrict our analysis to the largest connected component HammondMain of Hammond network.

Network/Create Hierarchy/Clustering with Relational Constraints/Run [Max Tolerant]
save partition Clustering with relational constraint (tree) [Max/Tolerant] to MaxTol.clu
save vector Clustering with relational constraint (heights) [Max/Tolerant] to heightMaxTol.vec
save vector Clustering with relational constraint (size) [Max/Tolerant] to sizeMaxTol.vec
select HammondMain
Network/Create new network/Transform/Remove/All edges
save network to HammondNam.net

We continue in R.

> setwd("C:\\Users\\batagelj\\work\\Python\\WoS\\BM\\results\\cluster")
> source("C:\\Users\\batagelj\\work\\Python\\WoS\\BM\\results\\cluster\\Pajek2R.R")
> source("C:\\Users\\batagelj\\work\\Python\\WoS\\BM\\results\\cluster\\varCutTree.R")
> RC <- Pajek2R("MaxTol.clu")
> n <- RC$n; nm <- n-1; np <- n+1
> rCount <- varCutree(RC,rep(1,n),5,400)
> t <- table(rCount$part)
> out <- file("CMaxTot1.clu","w"); cat(paste("*vertices ",n),rCount$part,sep="\n",file=out); close(out)
> t
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
375  85  35  28  65 354  94  48  55 386  23 228  34   6  80 103  16  11   8  27 
 21  22  23  24  25  26  27  28  29  30  31  32  33  34 
 82 335  17 265   6  37 204  28 234  68  80 159  23 209  
> orDendro <- function(m,i){if(i<0) return(-i)
+   return(c(orDendro(m,m[i,1]),orDendro(m,m[i,2])))}
> 
> orSize <- function(m,i){if(i<0) return(1)
+   s[i] <<- orSize(m,m[i,1])+orSize(m,m[i,2])
+   return(s[i])}

> HC <- read.csv("heightMaxTol.vec",header=FALSE,skip=np)[[1]]
> colT <- c("integer","character","numeric","numeric","numeric") 
> nam <- read.csv("HammondNam.net",header=FALSE,skip=1,sep="",colClasses=colT,nrows=n)
> RC$height <- HC
> RC$method <- "Maximum/Tolerant"
> RC$dist.method <- "Normalized Hamming"
> RC$labels <- nam
> class(RC) <- "hclust"
> RC$call <- "Pajek.data"
> RC$order <- orDendro(RC$merge,nm)
> plot(RC,hang=-1,cex.axis=4,main="Maximum/Tolerant",lwd=0.01,oma=c(5,5,5,5))
  Error in graphics:::plotHclust(n, merge, height, order(x$order), hang,  : 
  invalid dendrogram input
notes/bm2/clan.txt · Last modified: 2017/04/19 16:19 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki