====== Running Pajek from R ======
In the note [[notes:gendl|downloading genealogies]] I described how to download a collection of genealogies from the [[http://www.genealogyforum.com/gedcom/|Genealogy Forum]].
Mimo (Domenico De Stefano) would like to get for each genealogy its pattern spectrum (frequencies of
appearances of selected patterns) for subnetworks stored in the file ''Frag_new.paj''. This can be done combining Pajek and R - calling Pajek from R (See [[http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/execute.htm|Hints]]). In September 2012 I prepared an [[notes:runr#first_hints|example]]. During the ESRA conference in Ljubljana, July 2013 I met Mimo and prepared the following specific solution:
===== Improved solution =====
July 19, 2013
First, inside Pajek, using the option ''Macro/Record'' I prepared a template Pajek command file
''PajekLog.temp'' containing the commands for:
* reading the patterns file
* reading the genealogy file
* for each pattern network:
* determining the partition describing the appearances of pattern in the genealogy's p-graph
* saving the partition to a ''*.clu'' file
It is important to add at the end of template the command ''EXIT''.
''PajekLog.temp'':
NETBEGIN 1
CLUBEGIN 1
PERBEGIN 1
CLSBEGIN 1
HIEBEGIN 1
VECBEGIN 1
N 9999 RDPAJ "C:\Users\batagelj\test\mimo\ged\Frag_new.paj"
N 20 RDN "C:\Users\batagelj\test\mimo\ged\ged1.ged" (32)
C 1 FRAGNSNL 1 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 1 WC "C:\Users\batagelj\test\mimo\ged\1.clu" (32)
C 2 FRAGNSNL 2 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 2 WC "C:\Users\batagelj\test\mimo\ged\2.clu" (32)
C 3 FRAGNSNL 3 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 3 WC "C:\Users\batagelj\test\mimo\ged\3.clu" (32)
C 4 FRAGNSNL 4 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 4 WC "C:\Users\batagelj\test\mimo\ged\4.clu" (32)
C 5 FRAGNSNL 5 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 5 WC "C:\Users\batagelj\test\mimo\ged\5.clu" (32)
C 6 FRAGNSNL 6 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 6 WC "C:\Users\batagelj\test\mimo\ged\6.clu" (32)
C 7 FRAGNSNL 7 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 7 WC "C:\Users\batagelj\test\mimo\ged\7.clu" (32)
C 8 FRAGNSNL 8 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 8 WC "C:\Users\batagelj\test\mimo\ged\8.clu" (32)
C 9 FRAGNSNL 9 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 9 WC "C:\Users\batagelj\test\mimo\ged\9.clu" (32)
C 10 FRAGNSNL 10 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 10 WC "C:\Users\batagelj\test\mimo\ged\10.clu" (32)
C 11 FRAGNSNL 11 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 11 WC "C:\Users\batagelj\test\mimo\ged\11.clu" (32)
C 12 FRAGNSNL 12 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 12 WC "C:\Users\batagelj\test\mimo\ged\12.clu" (32)
C 13 FRAGNSNL 13 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 13 WC "C:\Users\batagelj\test\mimo\ged\13.clu" (32)
C 14 FRAGNSNL 14 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 14 WC "C:\Users\batagelj\test\mimo\ged\14.clu" (32)
C 15 FRAGNSNL 15 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 15 WC "C:\Users\batagelj\test\mimo\ged\15.clu" (32)
C 16 FRAGNSNL 16 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 16 WC "C:\Users\batagelj\test\mimo\ged\16.clu" (32)
C 17 FRAGNSNL 17 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 17 WC "C:\Users\batagelj\test\mimo\ged\17.clu" (32)
C 18 FRAGNSNL 18 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 18 WC "C:\Users\batagelj\test\mimo\ged\18.clu" (32)
C 19 FRAGNSNL 19 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 19 WC "C:\Users\batagelj\test\mimo\ged\19.clu" (32)
EXIT
From the template file we make for each genealogy file a command file ''Pajek.log'' which is used by Pajek to process the genealogy and produce the corresponding partitions and save them on files. These partitions are transformed using R into the frequency vector and stored in the file ''count.dat''.
The complete procedure is prepared in the file ''GedCount.R'':
# GED patterns counting
# V. Batagelj, July 2013
# ------------------------------------------------------
cat("GED patterns counting\nV. Batagelj, July 2013\n\n",date(),"\n",sep="")
setwd("C:/Users/batagelj/test/mimo/ged")
files <- dir(pattern="*.ged")
freq <- integer(19)
L <- readLines("PajekLog.temp",n=-1)
rez <- file("count.dat","w")
for(f in files){
N <- gsub("ged1.ged",f,L)
writeLines(N,con="Pajek.log")
system("C:/programi/pajek/pajek.exe",invisible=TRUE,wait=TRUE)
file.remove("log1.log")
if(!file.exists("1.clu")) {
cat(f,"*** problems\n"); flush.console();next
} else {
for(i in 1:19){
clu <- paste(i,".clu",sep="")
u <- readLines(con=clu,n=-1)
v <- as.numeric(u[2:length(u)])
freq[i] <- sum(v)
}
len <- length(v)
cat(f,len,freq,"\n"); flush.console()
cat(f,len,freq,"\n",file=rez)
file.remove("1.clu")
}
}
cat(date(),"\n"); close(rez)
for(i in 2:19) file.remove(paste(i,".clu",sep=""))
Running it in R we get the following report.
> source("C:\\Users\\batagelj\\test\\mimo\\ged\\GedCount.R")
GED patterns counting
V. Batagelj, July 2013
Sat Jul 20 13:01:51 2013
ged1.ged 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 120
ged148.ged 1655 0 0 48 32 0 10 0 0 156 0 0 0 0 36 36 0 0 0 800
ged27.ged 244 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 880
ged30.ged 171 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 160
ged65.ged *** problems
gedr1255.ged 2234 0 0 8 96 4 0 0 5 36 0 0 0 0 0 24 0 0 0 6488
gedr1445.ged 2283 4 3 24 32 0 45 0 25 108 18 0 0 0 24 60 0 0 0 7440
gedr1477.ged 2283 4 3 24 32 0 45 0 25 108 18 0 0 0 24 60 0 0 0 7440
Sat Jul 20 13:02:06 2013
>
The results (frequencies) are saved on the file ''count.dat''.
**Note:** before running the ''GedCount.R'' procedure delete all ''*.log'' files from its directory. In the case of errors in the GED file Pajek window will appear and some messages - simply close them and the procedure will continue processing the files (skiping the file with errors).
* {{pub:zip:mimo.zip}}
===== Problems with GED files =====
Testing the ''GedCount.R'' procedure there were problems with the file ''ged65.ged''. I tried to import it into
[[http://www.bkwin.org/|Brother's Keeper 6.6]]. It reported two problems:
- strange line breaks and empty lines; I imported the file into TextPad and replaced ''\n\n'' with ''\n'' and using the sort of lines on first character I also idenfified the breaks and removed them.
- in the file ''ged65.ged'' long GEDCOM tags (keywords) are used. They have to be replaced by short tags (see [[http://genealogy.about.com/library/weekly/aa110100d.htm|list 1]], [[http://www.tamurajones.net/GEDCOMTags.xhtml|list 2]])
I corrected the file ''ged65.ged''. Here is the {{pub:zip:ged65.zip|corrected version}}.
===== First hints =====
September 2012
Subject: Re: gen
From: "Vladimir Batagelj"
Date: Fri, September 7, 2012 01:11
To: DOMENICO.DESTEFANO@econ.units.it
Cc: vladimir.batagelj@fmf.uni-lj.si
Mimo,
here is the program in R:
-------------------------------------------------------
> setwd("C:/Users/Batagelj/Documents/papers/2012/capri/net/mimo")
> files <- c("ged1","bouchard","ged2")
> L <- readLines("PajekLog.temp",n=-1)
> for(f in files){
+ N <- gsub("bouchard",f,L)
+ # print(N)
+ writeLines(N,con="Pajek.log")
+ system("C:/programi/pajek/pajek.exe",invisible=TRUE)
+ }
-------------------------------------------------------
It assumes that in the directory mimo there are three GED files
ged1.ged, bouchard.ged and ged2.ged.
The file PajekLog.temp contains a template of commands to be
executed by Pajek. The important addition is that it should be
terminated by a line with command EXIT. See the attached file.
This solution works, but it can be further improved - deleting
log files, ...
Vlado
--
Vladimir Batagelj, University of Ljubljana, FMF, Department of Mathematics
Jadranska 19, 1000 Ljubljana, Slovenia
http://vlado.fmf.uni-lj.si
Attachments:
pajekLog.temp 1 k [ application/octet-stream ] Download
NETBEGIN 1
CLUBEGIN 1
PERBEGIN 1
CLSBEGIN 1
HIEBEGIN 1
VECBEGIN 1
Msg Reading Network --- C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\pattern1.net
N 1 RDN "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\pattern1.net" (3)
Msg Reading Network --- C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged
N 2 RDN "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged" (259)
Msg All degree centrality of 2. C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged (259)
C 1 DEGC 2 [2] (259)
Msg Saving partition to file --- C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.clu
C 1 WC "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.clu" (259)
EXIT