====== Running Pajek from R ====== In the note [[notes:gendl|downloading genealogies]] I described how to download a collection of genealogies from the [[http://www.genealogyforum.com/gedcom/|Genealogy Forum]]. Mimo (Domenico De Stefano) would like to get for each genealogy its pattern spectrum (frequencies of appearances of selected patterns) for subnetworks stored in the file ''Frag_new.paj''. This can be done combining Pajek and R - calling Pajek from R (See [[http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/execute.htm|Hints]]). In September 2012 I prepared an [[notes:runr#first_hints|example]]. During the ESRA conference in Ljubljana, July 2013 I met Mimo and prepared the following specific solution: ===== Improved solution ===== July 19, 2013 First, inside Pajek, using the option ''Macro/Record'' I prepared a template Pajek command file ''PajekLog.temp'' containing the commands for: * reading the patterns file * reading the genealogy file * for each pattern network: * determining the partition describing the appearances of pattern in the genealogy's p-graph * saving the partition to a ''*.clu'' file It is important to add at the end of template the command ''EXIT''. ''PajekLog.temp'': NETBEGIN 1 CLUBEGIN 1 PERBEGIN 1 CLSBEGIN 1 HIEBEGIN 1 VECBEGIN 1 N 9999 RDPAJ "C:\Users\batagelj\test\mimo\ged\Frag_new.paj" N 20 RDN "C:\Users\batagelj\test\mimo\ged\ged1.ged" (32) C 1 FRAGNSNL 1 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 1 WC "C:\Users\batagelj\test\mimo\ged\1.clu" (32) C 2 FRAGNSNL 2 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 2 WC "C:\Users\batagelj\test\mimo\ged\2.clu" (32) C 3 FRAGNSNL 3 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 3 WC "C:\Users\batagelj\test\mimo\ged\3.clu" (32) C 4 FRAGNSNL 4 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 4 WC "C:\Users\batagelj\test\mimo\ged\4.clu" (32) C 5 FRAGNSNL 5 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 5 WC "C:\Users\batagelj\test\mimo\ged\5.clu" (32) C 6 FRAGNSNL 6 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 6 WC "C:\Users\batagelj\test\mimo\ged\6.clu" (32) C 7 FRAGNSNL 7 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 7 WC "C:\Users\batagelj\test\mimo\ged\7.clu" (32) C 8 FRAGNSNL 8 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 8 WC "C:\Users\batagelj\test\mimo\ged\8.clu" (32) C 9 FRAGNSNL 9 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 9 WC "C:\Users\batagelj\test\mimo\ged\9.clu" (32) C 10 FRAGNSNL 10 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 10 WC "C:\Users\batagelj\test\mimo\ged\10.clu" (32) C 11 FRAGNSNL 11 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 11 WC "C:\Users\batagelj\test\mimo\ged\11.clu" (32) C 12 FRAGNSNL 12 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 12 WC "C:\Users\batagelj\test\mimo\ged\12.clu" (32) C 13 FRAGNSNL 13 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 13 WC "C:\Users\batagelj\test\mimo\ged\13.clu" (32) C 14 FRAGNSNL 14 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 14 WC "C:\Users\batagelj\test\mimo\ged\14.clu" (32) C 15 FRAGNSNL 15 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 15 WC "C:\Users\batagelj\test\mimo\ged\15.clu" (32) C 16 FRAGNSNL 16 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 16 WC "C:\Users\batagelj\test\mimo\ged\16.clu" (32) C 17 FRAGNSNL 17 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 17 WC "C:\Users\batagelj\test\mimo\ged\17.clu" (32) C 18 FRAGNSNL 18 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 18 WC "C:\Users\batagelj\test\mimo\ged\18.clu" (32) C 19 FRAGNSNL 19 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32) C 19 WC "C:\Users\batagelj\test\mimo\ged\19.clu" (32) EXIT From the template file we make for each genealogy file a command file ''Pajek.log'' which is used by Pajek to process the genealogy and produce the corresponding partitions and save them on files. These partitions are transformed using R into the frequency vector and stored in the file ''count.dat''. The complete procedure is prepared in the file ''GedCount.R'': # GED patterns counting # V. Batagelj, July 2013 # ------------------------------------------------------ cat("GED patterns counting\nV. Batagelj, July 2013\n\n",date(),"\n",sep="") setwd("C:/Users/batagelj/test/mimo/ged") files <- dir(pattern="*.ged") freq <- integer(19) L <- readLines("PajekLog.temp",n=-1) rez <- file("count.dat","w") for(f in files){ N <- gsub("ged1.ged",f,L) writeLines(N,con="Pajek.log") system("C:/programi/pajek/pajek.exe",invisible=TRUE,wait=TRUE) file.remove("log1.log") if(!file.exists("1.clu")) { cat(f,"*** problems\n"); flush.console();next } else { for(i in 1:19){ clu <- paste(i,".clu",sep="") u <- readLines(con=clu,n=-1) v <- as.numeric(u[2:length(u)]) freq[i] <- sum(v) } len <- length(v) cat(f,len,freq,"\n"); flush.console() cat(f,len,freq,"\n",file=rez) file.remove("1.clu") } } cat(date(),"\n"); close(rez) for(i in 2:19) file.remove(paste(i,".clu",sep="")) Running it in R we get the following report. > source("C:\\Users\\batagelj\\test\\mimo\\ged\\GedCount.R") GED patterns counting V. Batagelj, July 2013 Sat Jul 20 13:01:51 2013 ged1.ged 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 120 ged148.ged 1655 0 0 48 32 0 10 0 0 156 0 0 0 0 36 36 0 0 0 800 ged27.ged 244 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 880 ged30.ged 171 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 160 ged65.ged *** problems gedr1255.ged 2234 0 0 8 96 4 0 0 5 36 0 0 0 0 0 24 0 0 0 6488 gedr1445.ged 2283 4 3 24 32 0 45 0 25 108 18 0 0 0 24 60 0 0 0 7440 gedr1477.ged 2283 4 3 24 32 0 45 0 25 108 18 0 0 0 24 60 0 0 0 7440 Sat Jul 20 13:02:06 2013 > The results (frequencies) are saved on the file ''count.dat''. **Note:** before running the ''GedCount.R'' procedure delete all ''*.log'' files from its directory. In the case of errors in the GED file Pajek window will appear and some messages - simply close them and the procedure will continue processing the files (skiping the file with errors). * {{pub:zip:mimo.zip}} ===== Problems with GED files ===== Testing the ''GedCount.R'' procedure there were problems with the file ''ged65.ged''. I tried to import it into [[http://www.bkwin.org/|Brother's Keeper 6.6]]. It reported two problems: - strange line breaks and empty lines; I imported the file into TextPad and replaced ''\n\n'' with ''\n'' and using the sort of lines on first character I also idenfified the breaks and removed them. - in the file ''ged65.ged'' long GEDCOM tags (keywords) are used. They have to be replaced by short tags (see [[http://genealogy.about.com/library/weekly/aa110100d.htm|list 1]], [[http://www.tamurajones.net/GEDCOMTags.xhtml|list 2]]) I corrected the file ''ged65.ged''. Here is the {{pub:zip:ged65.zip|corrected version}}. ===== First hints ===== September 2012 Subject: Re: gen From: "Vladimir Batagelj" Date: Fri, September 7, 2012 01:11 To: DOMENICO.DESTEFANO@econ.units.it Cc: vladimir.batagelj@fmf.uni-lj.si Mimo, here is the program in R: ------------------------------------------------------- > setwd("C:/Users/Batagelj/Documents/papers/2012/capri/net/mimo") > files <- c("ged1","bouchard","ged2") > L <- readLines("PajekLog.temp",n=-1) > for(f in files){ + N <- gsub("bouchard",f,L) + # print(N) + writeLines(N,con="Pajek.log") + system("C:/programi/pajek/pajek.exe",invisible=TRUE) + } ------------------------------------------------------- It assumes that in the directory mimo there are three GED files ged1.ged, bouchard.ged and ged2.ged. The file PajekLog.temp contains a template of commands to be executed by Pajek. The important addition is that it should be terminated by a line with command EXIT. See the attached file. This solution works, but it can be further improved - deleting log files, ... Vlado -- Vladimir Batagelj, University of Ljubljana, FMF, Department of Mathematics Jadranska 19, 1000 Ljubljana, Slovenia http://vlado.fmf.uni-lj.si Attachments: pajekLog.temp 1 k [ application/octet-stream ] Download NETBEGIN 1 CLUBEGIN 1 PERBEGIN 1 CLSBEGIN 1 HIEBEGIN 1 VECBEGIN 1 Msg Reading Network --- C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\pattern1.net N 1 RDN "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\pattern1.net" (3) Msg Reading Network --- C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged N 2 RDN "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged" (259) Msg All degree centrality of 2. C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged (259) C 1 DEGC 2 [2] (259) Msg Saving partition to file --- C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.clu C 1 WC "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.clu" (259) EXIT