Read in Pajek the project file USC3110.paj
and file with 75 standardized variables USC3110.vec
.
To determine the clustering with relational constraint we have first to compute the dissimilarity between linked vertices based on selected variables. The vectors/variables to be used in the dissimilarity computation are determined by the current Pajek's cluster. In our case this cluster should contain indices from 4 to 78. We produce it as follows
Cluster/Create complete cluster [78]
Then click on the Edit Cluster Button
. In the displayed list double click on vertex 1 and confirm its deletion. Repeat this also for vertices 2 and 3. We have the required cluster.
Now we can add to the links of US counties network the dissimilarities as weights
Operations/Dissimilarity*/Vector Based/Euclidean [Yes]
and run clustering with relational constraint afterward
Net/Hierarchical Decomposition/Clustering with Relational Constraint/Options [Average,Tolerant] Net/Hierarchical Decomposition/Clustering with Relational Constraint/Run
We get a vector of cluster heights, a vector of cluster sizes, a permutation of vertices with respect to the clustering, and a “double partition”. The “double partition” contains the description of the clustering tree in the form p[i] = j , meaning: the father of vertex i in the clustering tree is the vertex j . The leaves of the tree are the vertices of our network; the internal vertices of the tree are numbered as they are produced with indices from NumVertices+1 on.
Inspecting the distribution of heights values we selected the threshold 11. Now we can produce the corresponding partition:
[select the heights vector] Net/Hierarchical Decomposition/Clustering with Relational Constraint/Make Partition/Using Threshold [11]
We obtain the following partition avgtol11.pdf that consists of a very large red cluster, several small clusters and a “cluster” of unclassified cyan vertices.
This partition is of little use for our analysis. See a discussion about what can be wrong and how we can try to bypass the problems.
We first try to extract from the dendrogram the clusters with number of units in some selected interval - for example [5,400]
> RC <- Pajek2R("./cluster/AvgTolNew.clu") > rCount <- varCutree(RC,rep(1,3110),5,400) > t <- table(rCount$part) > t 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 745 18 398 5 5 12 8 25 9 6 5 13 22 6 19 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 44 10 9 5 6 29 6 10 347 79 5 151 7 8 7 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 130 5 9 97 17 7 13 9 5 7 6 5 10 337 40 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 7 5 20 12 6 5 25 19 11 8 5 32 116 42 5 60 61 62 63 64 65 999999 5 5 10 18 16 14 8 > out <- file("CRAveTot3.clu","w"); cat("*vertices 3110",rCount$part,sep="\n",file=out); close(out)
We read the “double partition” in R and using R procedures determine the partition, that is finally exported as Pajek's clustering to be displayed using Pajek. The obtained clustering has 745 unclassified units, but we can see some structure avgtol-3.pdf.
We can also determine rank-height of the clustering tree and cut it at selected level.
levCutree <- function(R,hT){ mark <- function(t,c){ if(t>0) vis[t] <<- TRUE if(t<0) part[-t] <<- c else {mark(R$merge[t,1],c); mark(R$merge[t,2],c)} } n <- nrow(R$merge)+1; part <- rep(999999,n) vis <- rep(FALSE,n-1); nclust <- 0 for(i in (n-1):1) if (!vis[i]) { vis[i] <- TRUE; hi <- R$height[i] l <- R$merge[i,1]; if (l == 0) next r <- R$merge[i,2]; if (r == 0) next if (hi <= hT) { nclust <- nclust+1; mark(l,nclust); mark(r,nclust) } else { if (l < 0) mark(l,0); if (r < 0) mark(r,0) } } return(part) }
Procedure
> RC$height <- rep(1,3109) > rRank <- rankTree(RC) > RC$height <- rRank > rankPart <- levCutree(RC,70) > out <- file("AveTotRank70.clu","w"); cat("*vertices 3110",rankPart,sep="\n",file=out); close(out)
This gives another clustering avgtotrank70.pdf.
Using in Pajek the option Options/Mark Vertices Using/Partition Clusters
we identify the cluster numbers of interesting clusters and look in R to their characteristics
> dat <- read.csv(file="varsN.csv",stringsAsFactors=FALSE) > dim(dat) > da <- dat[-2916,] > dim(da) [1] 3110 92 > south <- rankPart==87 > east <- rankPart==115 > west <- rankPart==27 > north <- rankPart==170 > da[south,"Areaname"] > summary(da[south,]) > da[north,"Areaname"] > summary(da[north,]) > da[east,"Areaname"] > summary(da[east,]) > da[west,"Areaname"] > summary(da[west,])
rankres.zip (rankPart.clu
, south.txt
, north.txt
, east.txt
, west.txt
, rankDat.RData
(rankPart, da, s))
> M <- cbind(apply(da[south,-c(1,2)],2,mean),apply(da[north,-c(1,2)],2,mean), apply(da[east,-c(1,2)],2,mean),apply(da[west,-c(1,2)],2,mean)) > M [,1] [,2] [,3] [,4] AGE050200D 3.705131e+01 3.811625e+01 3.777956e+01 3.718879e+01 BZA115203D 2.238562e-01 -4.332500e+00 -2.267291e+00 1.110345e+00 CLF040200D 4.632680e+00 3.657500e+00 4.416184e+00 6.244828e+00 EDU635200D 7.361373e+01 8.225594e+01 7.482572e+01 8.114224e+01 EDU685200D 1.457680e+01 1.431656e+01 1.542521e+01 1.936810e+01 HSG045200D 8.044444e+00 6.816562e+00 1.575571e+01 1.633362e+01 HSG495200D 5.846732e+04 7.602125e+04 8.654140e+04 1.307491e+05 INC610199D 3.725614e+04 4.543871e+04 4.188305e+04 4.402873e+04 INC910199D 1.574440e+04 1.819005e+04 1.762827e+04 1.819972e+04 IPE010200D 3.188551e+04 3.947356e+04 3.588853e+04 3.821586e+04 IPE120200D 1.699837e+01 9.140313e+00 1.329523e+01 1.319052e+01 IPE220200D 2.394739e+01 1.266219e+01 1.872641e+01 1.882414e+01 LFE305200D 2.401340e+01 2.149219e+01 2.579881e+01 2.211379e+01 PIN020200D 2.094700e+04 2.417972e+04 2.321806e+04 2.403807e+04 POP050200D 8.283987e+00 4.062500e+00 1.024055e+01 1.616207e+01 POP060200D 6.590621e+01 9.249344e+01 1.828421e+02 1.647716e+02 POP165200D 5.081536e+01 5.074594e+01 5.101295e+01 4.982759e+01 POP225200D 7.758333e+01 9.593219e+01 8.562027e+01 8.376983e+01 POP255200D 1.336667e+01 1.692813e+00 1.145741e+01 1.559483e+00 POP285200D 2.348366e+00 2.543750e-01 3.321976e-01 1.993966e+00 POP325200D 4.490196e-01 4.378125e-01 6.032368e-01 2.432759e+00 POP405200D 9.479739e+00 2.018750e+00 2.408007e+00 1.334052e+01 POP645200D 3.341176e+00 1.557187e+00 2.626405e+00 8.694828e+00 VST020200D 1.353856e+01 1.227937e+01 1.239046e+01 1.262672e+01 VST220200D 1.151863e+01 1.079156e+01 1.044395e+01 9.142241e+00 VST420200D 7.472222e+00 6.154687e+00 7.596252e+00 6.198276e+00 WAT130200D 1.611786e+03 2.109659e+03 1.112694e+03 6.616555e+03 P.pop.under5 6.580324e+00 6.232957e+00 6.118504e+00 6.348849e+00 P.pop.under18 2.635719e+01 2.549562e+01 2.434618e+01 2.607041e+01 P.pop.over85 2.103039e+00 2.360762e+00 1.679386e+00 1.672105e+00 P.ind.fam.farms 9.076461e+01 8.849999e+01 9.043491e+01 8.426045e+01 P.land.farms 6.290727e+01 7.931597e+01 3.275108e+01 3.458045e+01 P.irrigatedLand Inf 3.381284e+00 Inf 2.006879e+01 P.violent.crime 3.207766e-01 8.674755e-02 2.946077e-01 3.348652e-01 P.murders 4.446644e-03 9.991130e-04 3.495106e-03 2.931687e-03 P.rapes 2.683564e-02 1.064653e-02 1.955211e-02 3.005992e-02 P.16to19.notHighSc 1.056589e+01 7.645125e+00 1.062883e+01 9.078797e+00 P.Hisp.Latin 9.407646e+00 1.978180e+00 2.341986e+00 1.327710e+01 R.Hisp.Lat.maleFemale 1.229879e+00 Inf NA 1.171553e+00 P.emply.ind.AGR.FOR.FISH.HUNT.MINING 8.323722e+00 5.990339e+00 3.431304e+00 7.956955e+00 P.emply.ind.CONSTRUCTION 7.450183e+00 6.311400e+00 7.353999e+00 6.644653e+00 P.emply.ind.MANUFACTORING 1.275459e+01 2.152131e+01 1.834340e+01 9.776601e+00 P.emply.ind.WHOLESALEtrade 2.766558e+00 3.053673e+00 2.866999e+00 2.833837e+00 P.emply.ind.RETAILtrade 1.084854e+01 1.095023e+01 1.131198e+01 1.064729e+01 P.emply.ind.TRANSPORT.WAREHOUSING 5.410541e+00 5.163809e+00 4.880563e+00 4.484460e+00 P.emply.ind.INFORMATION 1.441289e+00 1.651531e+00 1.763538e+00 1.860243e+00 P.emply.ind.FINANC.INSUR 4.144753e+00 4.189244e+00 4.194833e+00 4.113244e+00 P.emply.ind.PROFscientTECH 4.304369e+00 4.257937e+00 5.330816e+00 6.102132e+00 P.emply.ind.EDUC.HEALTH 1.988487e+01 1.870545e+01 1.899052e+01 1.862793e+01 P.emply.ind.ARTSaccomFOOD 5.721664e+00 5.774690e+00 6.271828e+00 8.140394e+00 P.emply.ind.PUBLICaddmin 5.223214e+00 3.521698e+00 4.666341e+00 5.762203e+00 P.25overLESS9thGRADE 1.031324e+01 6.652107e+00 9.706479e+00 7.133818e+00 P.employ.FARMING 1.298267e+01 9.958493e+00 4.789943e+00 8.400648e+00 P.employ.AGRIC.FOREST.FISH.HUNT 1.110473e+00 1.823078e-01 7.170553e-01 2.357110e+00 F.GOV.EXP.perCapita 5.468467e+02 4.645189e+02 5.004825e+02 5.314188e+02 P.employ.GOV 1.879330e+01 1.389505e+01 1.545659e+01 2.064506e+01 P.employ.GOV.stateLoc 1.625734e+01 1.222596e+01 1.334276e+01 1.687476e+01 P.vacantHousingUnits 1.482792e+01 8.548903e+00 1.285020e+01 1.328897e+01 P.occupiedHousingUnits 8.517208e+01 9.145110e+01 8.714980e+01 8.671103e+01 P.occupiedHousingUnitsBLACK 1.043022e+01 1.119232e+00 8.972250e+00 1.249519e+00 P.occupiedHousingUnitsHisLat 5.798692e+00 1.182373e+00 1.376569e+00 8.194458e+00 P.OWNERoccupiedHousingUnits 6.313439e+01 6.966030e+01 6.510236e+01 5.789313e+01 P.RENTERoccupiedHousingUnits 2.203769e+01 2.179080e+01 2.204743e+01 2.881790e+01 P.occupiedHousingUnitsLackingPlumb 7.908735e-01 4.948386e-01 9.446285e-01 7.176409e-01 P.URBANpopul 4.077331e+01 4.130698e+01 4.030167e+01 6.026730e+01 P.RURALpopul 5.922669e+01 5.869302e+01 5.969833e+01 3.973270e+01 P.CHANGErural90to00 1.010434e+01 -3.028793e+00 -1.562482e+00 -3.422337e+00 P.CHANGEurban90to00 8.787050e+00 1.106016e+01 2.283582e+01 3.857946e+01 P.LAND 9.727151e+01 9.870546e+01 9.587915e+01 9.542434e+01 P.WATER 2.728487e+00 1.294513e+00 4.120870e+00 4.575628e+00 P.BELOWpovertyLevel 1.764538e+01 9.208135e+00 1.432859e+01 1.419735e+01 P.ABOVEpovertyLevel 8.235462e+01 9.079186e+01 8.567141e+01 8.580265e+01 P.WHITE.BELOWpovertyLevel 1.314581e+01 8.742184e+00 1.188328e+01 1.230589e+01 P.aINDIAN.BELOWpovertyLevel NA NA NA 2.388984e+01 P.BLACK.BELOWpovertyLevel NA NA NA NA P.ASIAN.BELOWpovertyLevel NA NA NA NA P.HisLat.BELOWpovertyLevel 2.806823e+01 1.892611e+01 NA 2.553416e+01 CHANGEperCapitaIncome89to99 5.676222e+03 6.653450e+03 6.233325e+03 5.874371e+03 P.CHANGEemployIndustry90to00 1.878123e+01 1.559929e+01 1.633401e+01 2.284493e+01 P.IrrigationGROUNDwaterUse NA NA NA 2.796188e+01 GroundWaterUsePerCapita 1.018104e-03 1.067929e-03 1.556404e-04 1.129905e-03 P.NET.DOMESTIC.MIGRATIONS -5.273263e-01 -4.457541e-01 1.212542e-01 -1.495736e-01 P.NativePopulationBornInStateOfRes 7.388104e+01 7.706141e+01 7.259531e+01 5.158747e+01 R.LABOR.FORCEmaleFemale 1.205602e+00 1.164036e+00 1.160966e+00 1.211234e+00 R.VOTING.DEMOCRATESoverREPUBLICANS 6.720723e+01 7.530776e+01 8.824383e+01 7.642053e+01 P.PUBLIC.SCHOOL.ENROLNEMT 7.128929e+01 6.774728e+01 6.893168e+01 6.875286e+01 TOTAL.DEPOSITSperCapita 1.117644e+01 1.447363e+01 1.121010e+01 8.799865e+00 CROPvaluePerFARM 3.048392e+01 6.127429e+01 2.972905e+01 1.145691e+02 LIFESTOCKvaluePerFARM 5.498766e+01 4.841435e+01 4.143259e+01 4.736536e+01 P.CHANGEpverty95to00 -1.399711e+01 -8.650349e+00 -1.150927e+01 -7.111793e+00 >
The first two characters in the county id identify the state to which the county belongs.
> d <- readLines("usc3110.paj")[14:3123] > n <- 3110; s <- character(n) > for(i in 1:n) s[i] <- as.integer(substr(strsplit(dd[i],'\"')[[1]][2],1,2)) > out <- file("states3110.clu","w"); cat("*vertices 3110",s,sep="\n",file=out); close(out)
This partition is already included in the USC3110.paj
file:
------------------------------------------------------------------------------ Reading Partition --- "US states" ------------------------------------------------------------------------------ To which state a county belongs? 01 - Alabama 21 - Kentucky 38 - North Dakota 02 - Alaska 22 - Louisiana 39 - Ohio 04 - Arizona 23 - Maine 40 - Oklahoma 05 - Arkansas 24 - Maryland 41 - Oregon 06 - California 25 - Massachusetts 42 - Pennsylvania 08 - Colorado 26 - Michigan 44 - Rhode Island 09 - Connecticut 27 - Minnesota 45 - South Carolina 10 - Delaware 28 - Mississippi 46 - South Dakota 11 - Dist. of Columbia 29 - Missouri 47 - Tennessee 12 - Florida 30 - Montana 48 - Texas 13 - Georgia 31 - Nebraska 49 - Utah 15 - Hawaii 32 - Nevada 50 - Vermont 16 - Idaho 33 - New Hampshire 51 - Virginia 17 - Illinois 34 - New Jersey 53 - Washington 18 - Indiana 35 - New Mexico 54 - West Virginia 19 - Iowa 36 - New York 55 - Wisconsin 20 - Kansas 37 - North Carolina 56 - Wyoming
File/Partition/Read [states3110.clu] File/Partition/Read [regions4.clu] File/Partition/Read [regions9.clu] [select states partition as partition 1] [select regions9 partition as partition 2] Operations/Functional Composition/Partition*Partition File/Partition/Save [reg3310-9.clu] [select states partition as partition 1] [select regions4 partition as partition 2] Operations/Functional Composition/Partition*Partition File/Partition/Save [reg3310-4.clu]
> getwd() [1] "D:/Data/counties" > load("rankDat.RData") > C <- read.csv("./reg3110-9.clu",header=FALSE,skip=1)$V1 > table(C) C 1 2 3 4 5 6 7 8 9 133 281 618 437 67 150 470 364 590 > library(MASS) > V <- da[,-c(1,2)] > names(V) [33] "P.irrigatedLand" [39] "R.Hisp.Lat.maleFemale" [74] "P.aINDIAN.BELOWpovertyLevel" [75] "P.BLACK.BELOWpovertyLevel" [76] "P.ASIAN.BELOWpovertyLevel" [80] "P.IrrigationGROUNDwaterUse" > U <- V[,-c(33,39,74,75,76,80)] > L <- lda(C~.,U) Warning message: In lda.default(x, grouping, ...) : variables are collinear > names(L) [1] "prior" "counts" "means" "scaling" "lev" "svd" [7] "N" "call" "terms" "xlevels" "na.action" > L$means AGE050200D BZA115203D CLF040200D EDU635200D EDU685200D HSG045200D HSG495200D 1 37.27970 2.4563910 6.190977 81.26241 20.69248 16.362406 152539.85 2 36.79319 2.1946237 4.670609 82.71935 20.14050 19.684588 106231.90 .... 9 68.83642 9.874268 41.91874 LIFESTOCKvaluePerFARM P.CHANGEpverty95to00 1 50.48722 -3.836304 2 65.86691 -1.174104 3 74.07023 -13.616674 4 26.20308 -6.648226 5 29.61547 -9.345561 6 36.08322 -10.412122 7 60.42837 -12.549059 8 29.61485 -12.168219 9 48.31252 -9.428939 > L$counts 1 2 3 4 5 6 7 8 9 133 279 608 437 67 148 470 362 551 > L$scaling LD1 LD2 LD3 AGE050200D -1.850965e-01 6.327903e-02 -1.865893e-02 BZA115203D 1.561909e-03 5.040118e-04 -1.502661e-03 CLF040200D -2.216119e-01 1.937772e-01 -1.191955e-01 .... > L$N [1] 3055 > L$lev [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" > L$prior 1 2 3 4 5 6 7 0.04353519 0.09132570 0.19901800 0.14304419 0.02193126 0.04844517 0.15384615 8 9 0.11849427 0.18036007 > L$svd [1] 35.981057 29.047318 23.855715 16.392248 14.756034 13.592597 10.619947 [8] 7.094417 > plot(L) >