In the note downloading genealogies I described how to download a collection of genealogies from the Genealogy Forum.
Mimo (Domenico De Stefano) would like to get for each genealogy its pattern spectrum (frequencies of
appearances of selected patterns) for subnetworks stored in the file Frag_new.paj
. This can be done combining Pajek and R - calling Pajek from R (See Hints). In September 2012 I prepared an example. During the ESRA conference in Ljubljana, July 2013 I met Mimo and prepared the following specific solution:
July 19, 2013
First, inside Pajek, using the option Macro/Record
I prepared a template Pajek command file
PajekLog.temp
containing the commands for:
*.clu
file
It is important to add at the end of template the command EXIT
.
PajekLog.temp
:
NETBEGIN 1 CLUBEGIN 1 PERBEGIN 1 CLSBEGIN 1 HIEBEGIN 1 VECBEGIN 1 N 9999 RDPAJ "C:\Users\batagelj\test\mimo\ged\Frag_new.paj" N 20 RDN "C:\Users\batagelj\test\mimo\ged\ged1.ged" (32) C 1 FRAGNSNL 1 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 1 WC "C:\Users\batagelj\test\mimo\ged\1.clu" (32) C 2 FRAGNSNL 2 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 2 WC "C:\Users\batagelj\test\mimo\ged\2.clu" (32) C 3 FRAGNSNL 3 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 3 WC "C:\Users\batagelj\test\mimo\ged\3.clu" (32) C 4 FRAGNSNL 4 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 4 WC "C:\Users\batagelj\test\mimo\ged\4.clu" (32) C 5 FRAGNSNL 5 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 5 WC "C:\Users\batagelj\test\mimo\ged\5.clu" (32) C 6 FRAGNSNL 6 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 6 WC "C:\Users\batagelj\test\mimo\ged\6.clu" (32) C 7 FRAGNSNL 7 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 7 WC "C:\Users\batagelj\test\mimo\ged\7.clu" (32) C 8 FRAGNSNL 8 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 8 WC "C:\Users\batagelj\test\mimo\ged\8.clu" (32) C 9 FRAGNSNL 9 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 9 WC "C:\Users\batagelj\test\mimo\ged\9.clu" (32) C 10 FRAGNSNL 10 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 10 WC "C:\Users\batagelj\test\mimo\ged\10.clu" (32) C 11 FRAGNSNL 11 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 11 WC "C:\Users\batagelj\test\mimo\ged\11.clu" (32) C 12 FRAGNSNL 12 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 12 WC "C:\Users\batagelj\test\mimo\ged\12.clu" (32) C 13 FRAGNSNL 13 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 13 WC "C:\Users\batagelj\test\mimo\ged\13.clu" (32) C 14 FRAGNSNL 14 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 14 WC "C:\Users\batagelj\test\mimo\ged\14.clu" (32) C 15 FRAGNSNL 15 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 15 WC "C:\Users\batagelj\test\mimo\ged\15.clu" (32) C 16 FRAGNSNL 16 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 16 WC "C:\Users\batagelj\test\mimo\ged\16.clu" (32) C 17 FRAGNSNL 17 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 17 WC "C:\Users\batagelj\test\mimo\ged\17.clu" (32) C 18 FRAGNSNL 18 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 18 WC "C:\Users\batagelj\test\mimo\ged\18.clu" (32) C 19 FRAGNSNL 19 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 19 WC "C:\Users\batagelj\test\mimo\ged\19.clu" (32) EXIT
From the template file we make for each genealogy file a command file Pajek.log
which is used by Pajek to process the genealogy and produce the corresponding partitions and save them on files. These partitions are transformed using R into the frequency vector and stored in the file count.dat
.
The complete procedure is prepared in the file GedCount.R
:
# GED patterns counting # V. Batagelj, July 2013 # ------------------------------------------------------ cat("GED patterns counting\nV. Batagelj, July 2013\n\n",date(),"\n",sep="") setwd("C:/Users/batagelj/test/mimo/ged") files <- dir(pattern="*.ged") freq <- integer(19) L <- readLines("PajekLog.temp",n=-1) rez <- file("count.dat","w") for(f in files){ N <- gsub("ged1.ged",f,L) writeLines(N,con="Pajek.log") system("C:/programi/pajek/pajek.exe",invisible=TRUE,wait=TRUE) file.remove("log1.log") if(!file.exists("1.clu")) { cat(f,"*** problems\n"); flush.console();next } else { for(i in 1:19){ clu <- paste(i,".clu",sep="") u <- readLines(con=clu,n=-1) v <- as.numeric(u[2:length(u)]) freq[i] <- sum(v) } len <- length(v) cat(f,len,freq,"\n"); flush.console() cat(f,len,freq,"\n",file=rez) file.remove("1.clu") } } cat(date(),"\n"); close(rez) for(i in 2:19) file.remove(paste(i,".clu",sep=""))
Running it in R we get the following report.
> source("C:\\Users\\batagelj\\test\\mimo\\ged\\GedCount.R") GED patterns counting V. Batagelj, July 2013 Sat Jul 20 13:01:51 2013 ged1.ged 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 120 ged148.ged 1655 0 0 48 32 0 10 0 0 156 0 0 0 0 36 36 0 0 0 800 ged27.ged 244 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 880 ged30.ged 171 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 160 ged65.ged *** problems gedr1255.ged 2234 0 0 8 96 4 0 0 5 36 0 0 0 0 0 24 0 0 0 6488 gedr1445.ged 2283 4 3 24 32 0 45 0 25 108 18 0 0 0 24 60 0 0 0 7440 gedr1477.ged 2283 4 3 24 32 0 45 0 25 108 18 0 0 0 24 60 0 0 0 7440 Sat Jul 20 13:02:06 2013 >
The results (frequencies) are saved on the file count.dat
.
Note: before running the GedCount.R
procedure delete all *.log
files from its directory. In the case of errors in the GED file Pajek window will appear and some messages - simply close them and the procedure will continue processing the files (skiping the file with errors).
Testing the GedCount.R
procedure there were problems with the file ged65.ged
. I tried to import it into
Brother's Keeper 6.6. It reported two problems:
\n\n
with \n
and using the sort of lines on first character I also idenfified the breaks and removed them.
I corrected the file ged65.ged
. Here is the corrected version.
September 2012
Subject: Re: gen From: "Vladimir Batagelj" <vladimir.batagelj@fmf.uni-lj.si> Date: Fri, September 7, 2012 01:11 To: DOMENICO.DESTEFANO@econ.units.it Cc: vladimir.batagelj@fmf.uni-lj.si Mimo, here is the program in R: ------------------------------------------------------- > setwd("C:/Users/Batagelj/Documents/papers/2012/capri/net/mimo") > files <- c("ged1","bouchard","ged2") > L <- readLines("PajekLog.temp",n=-1) > for(f in files){ + N <- gsub("bouchard",f,L) + # print(N) + writeLines(N,con="Pajek.log") + system("C:/programi/pajek/pajek.exe",invisible=TRUE) + } ------------------------------------------------------- It assumes that in the directory mimo there are three GED files ged1.ged, bouchard.ged and ged2.ged. The file PajekLog.temp contains a template of commands to be executed by Pajek. The important addition is that it should be terminated by a line with command EXIT. See the attached file. This solution works, but it can be further improved - deleting log files, ... Vlado -- Vladimir Batagelj, University of Ljubljana, FMF, Department of Mathematics Jadranska 19, 1000 Ljubljana, Slovenia http://vlado.fmf.uni-lj.si Attachments: pajekLog.temp 1 k [ application/octet-stream ] Download
NETBEGIN 1 CLUBEGIN 1 PERBEGIN 1 CLSBEGIN 1 HIEBEGIN 1 VECBEGIN 1 Msg Reading Network --- C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\pattern1.net N 1 RDN "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\pattern1.net" (3) Msg Reading Network --- C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged N 2 RDN "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged" (259) Msg All degree centrality of 2. C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged (259) C 1 DEGC 2 [2] (259) Msg Saving partition to file --- C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.clu C 1 WC "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.clu" (259) EXIT