====== Notes ======
===== Entering Pajek file =====
{{ru:7iss:labs:graph.jpg?400}}
We describe the network from the picture with three files:
''ExNet.net''
% Example network - ISS7
% Moscow, June 2017
*vertices 4
1 "A"
2 "B"
3 "C"
4 "D"
*arcs
1 2 2
2 3 1
4 1 2
4 1 4
*edges
1 3 3
2 3 5
2 4 1
4 4 5
''shape.clu''
% 1 - circle 2 - square
*vertices 4
2
2
1
1
and ''value.vec''
*vertices 4
27
17
14
36
We can combine them into a single file
''ExNet.paj''
% Example network - 7ISS
% Moscow, June 2017
*Network exNet.net
*Vertices 4
1 "A" 0.6472 0.4527 0.5000
2 "B" 0.2408 0.5353 0.5000
3 "C" 0.4770 0.7659 0.5000
4 "D" 0.4678 0.1910 0.5000
*Arcs
1 2 2
2 3 1
4 1 2
4 1 4
*Edges
1 3 3
2 3 5
2 4 1
4 4 5
*Partition shape.clu
% 1 - circle 2 - square
*Vertices 4
2
2
1
1
*Vector value.vec
*Vertices 4
27
17
14
36
A file with alternative node names
''ExNet.nam''
% Example network names - 7ISS
% Moscow, June 2017
*vertices 4
1 "Владо"
2 "Борис"
3 "Валя"
4 "Дарья"
has to be saved in Unicode UTF-8 with signiture (BOM). It can be used in Pajek to rename the nodes:
select the network
Network/Create New Network/Transform/Add/Vertex Labels/Default [No]
Network/Create New Network/Transform/Add/Vertex Labels/From File(s) [ExNet.nam]
{{pub:zip:exnet.zip}}
===== Pajek data sets =====
[[http://vlado.fmf.uni-lj.si/pub/networks/data/|Pajek data sets]]
===== Transforming migration matrix into Pajek network =====
https://www.imi.ox.ac.uk/data/demig-data/demig-c2c-data/download-the-data/demig-c2c-data-downloads
http://www.worldbank.org/en/topic/migrationremittancesdiasporaissues/brief/migration-remittances-data
Read in Excel and save it in CSV as "bilateralmigrationmatrix20130.csv".\\
Move last lines to the front.\\
Replace "," with "".\\
Save as "migration2013.csv"
> setwd("C:/Users/batagelj/Downloads/data/migration")
> D <- read.csv2("migration2013.csv",row.names=1,skip=3)
> A <- as.matrix(D)
> dim(A)
[1] 218 218
> A[1:10,1:10]
Afghanistan Albania Algeria American.Samoa Andorra Angola Antigua.and.Barbuda Argentina Armenia
Afghanistan 0 0 0 0 0 0 0 9 0
Albania 0 0 0 0 0 0 0 77 0
Algeria 0 0 0 0 0 0 0 210 0
American Samoa 0 0 0 0 0 0 0 0 0
Andorra 0 0 0 0 0 0 0 47 0
Angola 0 0 0 0 0 0 0 81 0
Antigua and Barbuda 0 0 0 0 0 0 0 0 0
Argentina 0 0 0 0 708 0 0 0 0
Armenia 0 0 0 0 0 0 0 939 0
Aruba 0 0 0 0 0 0 5 0 0
Aruba
Afghanistan 0
Albania 4
Algeria 3
American Samoa 0
Andorra 0
Angola 0
Antigua and Barbuda 5
Argentina 71
Armenia 0
Aruba 0
> A[214:218,214:218]
Zimbabwe Other.North Other.South World X
Zimbabwe 0 1 2 973247 NA
Other North 13164 38330 1168 2713351 NA
Other South 26920 31434 2895 4946635 NA
World 360992 470548 95518 247245059 NA
NA NA NA NA NA
> A <- A[1:217,1:217]
> W <- A[1:216,1:216]
> A[210:216,210:216]
Virgin.Islands..U.S.. West.Bank.and.Gaza Yemen.Rep. Zambia Zimbabwe Other.North Other.South
Virgin Islands (U.S.) 0 0 0 0 0 28 529
West Bank and Gaza 0 0 3740 0 0 0 0
Yemen Rep. 0 1473 0 0 0 0 0
Zambia 0 0 0 0 26909 11 10
Zimbabwe 0 0 0 5149 0 1 2
Other North 2886 2015 2483 861 13164 38330 1168
Other South 2692 20874 34508 6777 26920 31434 2895
>
> n <- nrow(A)
> net <- file("migration2013.net","w"); cat('*vertices ',n,'\n',file=net)
> for(v in 1:n) cat(v,' "',row.names(A)[v],'"\n',sep='',file=net)
> cat('*arcs\n',file=net)
> for(v in 1:n) {
+ for(u in 1:n) if(A[v,u]>0) cat(v,' ',u,' ',A[v,u],'\n',sep='',file=net)
+ }
> close(net)
{{pub:zip:migration2013.zip}}
===== Pathfinder =====
Pathfinder determines a skeleton of a weighted network. The weights should be **dissimilarities**. A migration flow is a similarity s. A simple way to transform it into a dissimilarity d is d = 1/s.
read migration2013.net
Network/Create new network/Transform/Line values/Power [-1]
Network/Create new network/Transform/Reduction/Pathfinder* [0]
We obtain a simplfied network. We draw it using some automatic procedure (Kamada-Kawai/Free) and manually improve the picture.
{{pub:zip:pfmig.zip}} in PF1.net the nodes Other South and Other Nord are removed.
===== Population size =====
http://www.prb.org/pdf13/2013-population-data-sheet_eng.pdf
http://www.photius.com/rankings/population/population_2013_0.html
https://www.cia.gov/library/publications/the-world-factbook/rankorder/2119rank.html
We download the file https://www.cia.gov/library/publications/the-world-factbook/rankorder/rawdata_2119.txt and transform it into CSV format: add header, remove commas from numbers, add a separator ; between columns. We save it as popCnt.csv .
> P <- read.csv2("popCnt.csv",row.names=2,strip.white=TRUE)
===== Fusing the data =====
> Pnames <- tolower(rownames(P))
> head(Pnames)
[1] "china" "india" "european union" "united states"
[5] "indonesia" "brazil"
> Anames <- tolower(rownames(A))
> head(Anames)
[1] "afghanistan" "albania" "algeria" "american samoa"
[5] "andorra" "angola"
> p <- match(Anames,Pnames)
> q <- match(Pnames,Anames)
> cbind(which(is.na(p)),Anames[is.na(p)])
> cbind(which(is.na(q)),Pnames[is.na(q)])
Manually find the matchings:
> cbind(which(is.na(p)),Anames[is.na(p)])
[,1] [,2]
[1,] "14" "bahamas the"
[26,] "180" "bahamas, the"
[2,] "28" "brunei darussalam"
[25,] "175" "brunei"
[3,] "39" "channel islands"
[31,] "197" "jersey"
[32,] "205" "guernsey"
[4,] "44" "congo dem. rep."
[5,] "18" "congo, democratic republic of the"
[5,] "45" "congo rep."
[18,] "125" "congo, republic of the"
[6,] "52" "czech republic"
[13,] "87" "czechia"
[7,] "58" "egypt arab rep."
[3,] "16" "egypt"
[8,] "64" "faeroe islands"
[34,] "212" "faroe islands"
[9,] "70" "gambia the"
[21,] "147" "gambia, the"
[10,] "84" "hong kong sar china"
[14,] "101" "hong kong"
[11,] "89" "iran islamic rep."
[4,] "17" "iran"
[12,] "101" "korea dem. rep."
[10,] "51" "korea, north"
[13,] "102" "korea rep."
[7,] "28" "korea, south"
[14,] "105" "kyrgyz republic"
[16,] "115" "kyrgyzstan"
[15,] "106" "lao pdr"
[15,] "104" "laos"
[16,] "115" "macao sar china"
[23,] "170" "macau"
[17,] "116" "macedonia fyr"
[20,] "146" "macedonia"
[18,] "127" "micronesia fed. sts."
[28,] "194" "micronesia, federated states of"
[19,] "134" "myanmar"
[6,] "25" "burma"
[20,] "158" "russian federation"
[2,] "10" "russia"
[21,] "169" "sint maarten (dutch part)"
[35,] "213" "sint maarten"
[22,] "170" "slovak republic"
[17,] "119" "slovakia"
[23,] "178" "st. kitts and nevis"
[33,] "210" "saint kitts and nevis"
[24,] "179" "st. lucia"
[27,] "187" "saint lucia"
[25,] "180" "st. martin (french part)"
[37,] "217" "saint martin"
[26,] "181" "st. vincent and the grenadines"
[30,] "196" "saint vincent and the grenadines"
[27,] "187" "syrian arab republic"
[12,] "66" "syria"
[28,] "208" "venezuela rb"
[8,] "43" "venezuela"
[29,] "210" "virgin islands (u.s.)"
[29,] "195" "virgin islands"
[30,] "211" "west bank and gaza"
[19,] "142" "west bank"
[22,] "153" "gaza strip"
[31,] "212" "yemen rep."
[9,] "48" "yemen"
[32,] "215" "other north"
[33,] "216" "other south"
> cbind(which(is.na(q)),Pnames[is.na(q)])
[,1] [,2]
[1,] "3" "european union"
[11,] "55" "taiwan"
[24,] "171" "western sahara"
[36,] "215" "british virgin islands"
[38,] "219" "gibraltar"
[39,] "221" "anguilla"
[40,] "222" "wallis and futuna"
[41,] "224" "nauru"
[42,] "225" "cook islands"
[43,] "226" "saint helena, ascension, and tristan da cunha"
[44,] "227" "saint barthelemy"
[45,] "228" "saint pierre and miquelon"
[46,] "229" "montserrat"
[47,] "230" "falkland islands (islas malvinas)"
[48,] "231" "norfolk island"
[49,] "232" "christmas island"
[50,] "233" "svalbard"
[51,] "234" "tokelau"
[52,] "235" "niue"
[53,] "236" "holy see (vatican city)"
[54,] "237" "cocos (keeling) islands"
[55,] "238" "pitcairn islands"
and construct the population number vector
> pNA <- c(
+ 14, 28, 39, 44, 45, 52, 58, 64, 70, 84,
+ 89, 101, 102, 105, 106, 115, 116, 127, 134, 158,
+ 169, 170, 178, 179, 180, 181, 187, 208, 210, 211,
+ 212 )
>
> qNA <- c(
+ 180, 175, 197, 18, 125, 87, 16, 212, 147, 101,
+ 17, 51, 28, 115, 104, 170, 146, 194, 25, 10,
+ 213, 119, 210, 187, 217, 196, 66, 43, 195, 142,
+ 48 )
> popP <- P$pop
> head(popP)
[1] 1373541278 1266883598 513949445 323995528 258316051 205823665
> pn <- p
> pn[pNA] <- qNA
> Anames[is.na(pn)]
[1] "other north" "other south"
> popP[142] <- popP[142]+popP[153]
> popP[197] <- popP[197]+popP[205]
> pop <- popP[pn]
> Anames[is.na(pop)]
[1] "other north" "other south"
> n <- nrow(A)-2
> net <- file("migration2013B.net","w"); cat('*vertices ',n,'\n',file=net)
> vec <- file("migration2013pop.vec","w"); cat('*vertices ',n,'\n',file=vec)
> for(v in 1:n) cat(v,' "',row.names(A)[v],'"\n',sep='',file=net)
> cat('*arcs\n',file=net)
> for(v in 1:n) {
+ cat(pop[v],'\n',file=vec)
+ for(u in 1:n) if(A[v,u]>0) cat(v,' ',u,' ',A[v,u],'\n',sep='',file=net)
+ }
> close(net); close(vec)
> names(pop) <- Anames
> save(A,pop,file="migration2013.RData")
{{pub:zip:mig2013pop.zip}}
===== Clustering the migration network =====
To make countries (described by rows in migration matrix) comparable we have to normalize them. There are at least two options:
* divide each row by the sum of it entries: conditional probability that a migrant from the first country will migrate to the second country;
* divide each row by the size of a country population: probability that a citizen of the first country will migrate to the second country.
{{ru:7iss:labs:corrected.png?600}}
The function netDist(A) implements the corrected Euclidean dissimilarity between rows (but not columns!!!) of a matrix A.
> S <- apply(A,1,sum)
> T <- A/S
> netDist <- function(A){ n <- nrow(A)
+ D <- matrix(nrow=n,ncol=n,dimnames=dimnames(A)); diag(D) <- 0
+ for(v in 2:n){
+ for(u in 1:(v-1)) {
+ d <- sum((A[v,]-A[u,])**2) - (A[v,u]-A[u,u])**2 - (A[v,v]-A[u,v])**2 +
+ (A[v,u]-A[u,v])**2 + (A[v,v]-A[u,u])**2
+ D[v,u] <- D[u,v] <- sqrt(d)
+ }
+ }
+ return(as.dist(D))
+ }
> D <- netDist(T)
> r <- hclust(D,method="ward.D")
> plot(r,hang=-1,cex=0.2,main="Migrations 2013, profiles")
> head(Anames[r$order],n=10)
[1] "belize" "guatemala"
[3] "cayman islands" "el salvador"
[5] "marshall islands" "mexico"
[7] "puerto rico" "palau"
[9] "micronesia fed. sts." "northern mariana islands"
> per <- file("migration2013ward.per","w"); cat('*vertices ',n,'\n',file=per)
> for(v in 1:n) cat(r$order[v],'\n',file=per); close(per)
>
> n <- n-2
> Ap <- A[1:n,1:n]
> po <- pop[1:n]
> N <- Ap/po
> D <- netDist(N)
> r <- hclust(D,method="ward.D")
> plot(r,hang=-1,cex=0.2,main="Migrations 2013, intense")
> Nt <- t(N)
> D <- netDist(Nt)
> r <- hclust(D,method="ward.D")
> plot(r,hang=-1,cex=0.2,lwd=0.5,main="Migrations 2013, intense/transpose")
> B <- Ap
> B[B>0] <- 1
> D <- netDist(B)
> r <- hclust(D,method="ward.D")
> plot(r,hang=-1,cex=0.2,lwd=0.5,main="Migrations 2013, binary")
> Bt <- t(B)
> D <- netDist(Bt)
> r <- hclust(D,method="ward.D")
> plot(r,hang=-1,cex=0.2,lwd=0.5,main="Migrations 2013, binary/transpose")
\\ \\
[[ru:7iss#labs|Back to 7ISS Labs]]