Running Pajek from R

In the note downloading genealogies I described how to download a collection of genealogies from the Genealogy Forum.

Mimo (Domenico De Stefano) would like to get for each genealogy its pattern spectrum (frequencies of appearances of selected patterns) for subnetworks stored in the file Frag_new.paj. This can be done combining Pajek and R - calling Pajek from R (See Hints). In September 2012 I prepared an example. During the ESRA conference in Ljubljana, July 2013 I met Mimo and prepared the following specific solution:

Improved solution

July 19, 2013

First, inside Pajek, using the option Macro/Record I prepared a template Pajek command file PajekLog.temp containing the commands for:

  • reading the patterns file
  • reading the genealogy file
  • for each pattern network:
    • determining the partition describing the appearances of pattern in the genealogy's p-graph
    • saving the partition to a *.clu file

It is important to add at the end of template the command EXIT.

PajekLog.temp:

NETBEGIN 1
CLUBEGIN 1
PERBEGIN 1
CLSBEGIN 1
HIEBEGIN 1
VECBEGIN 1

N 9999 RDPAJ "C:\Users\batagelj\test\mimo\ged\Frag_new.paj"
N 20 RDN "C:\Users\batagelj\test\mimo\ged\ged1.ged" (32)
C 1 FRAGNSNL  1 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 1 WC "C:\Users\batagelj\test\mimo\ged\1.clu" (32)
C 2 FRAGNSNL  2 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 2 WC "C:\Users\batagelj\test\mimo\ged\2.clu" (32)
C 3 FRAGNSNL  3 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 3 WC "C:\Users\batagelj\test\mimo\ged\3.clu" (32)
C 4 FRAGNSNL  4 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 4 WC "C:\Users\batagelj\test\mimo\ged\4.clu" (32)
C 5 FRAGNSNL  5 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 5 WC "C:\Users\batagelj\test\mimo\ged\5.clu" (32)
C 6 FRAGNSNL  6 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 6 WC "C:\Users\batagelj\test\mimo\ged\6.clu" (32)
C 7 FRAGNSNL  7 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 7 WC "C:\Users\batagelj\test\mimo\ged\7.clu" (32)
C 8 FRAGNSNL  8 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 8 WC "C:\Users\batagelj\test\mimo\ged\8.clu" (32)
C 9 FRAGNSNL  9 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 9 WC "C:\Users\batagelj\test\mimo\ged\9.clu" (32)
C 10 FRAGNSNL  10 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 10 WC "C:\Users\batagelj\test\mimo\ged\10.clu" (32)
C 11 FRAGNSNL  11 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 11 WC "C:\Users\batagelj\test\mimo\ged\11.clu" (32)
C 12 FRAGNSNL  12 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 12 WC "C:\Users\batagelj\test\mimo\ged\12.clu" (32)
C 13 FRAGNSNL  13 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 13 WC "C:\Users\batagelj\test\mimo\ged\13.clu" (32)
C 14 FRAGNSNL  14 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 14 WC "C:\Users\batagelj\test\mimo\ged\14.clu" (32)
C 15 FRAGNSNL  15 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 15 WC "C:\Users\batagelj\test\mimo\ged\15.clu" (32)
C 16 FRAGNSNL  16 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 16 WC "C:\Users\batagelj\test\mimo\ged\16.clu" (32)
C 17 FRAGNSNL  17 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 17 WC "C:\Users\batagelj\test\mimo\ged\17.clu" (32)
C 18 FRAGNSNL  18 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 18 WC "C:\Users\batagelj\test\mimo\ged\18.clu" (32)
C 19 FRAGNSNL  19 20 FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE (32)
C 19 WC "C:\Users\batagelj\test\mimo\ged\19.clu" (32)
EXIT

From the template file we make for each genealogy file a command file Pajek.log which is used by Pajek to process the genealogy and produce the corresponding partitions and save them on files. These partitions are transformed using R into the frequency vector and stored in the file count.dat.

The complete procedure is prepared in the file GedCount.R:

# GED patterns counting
# V. Batagelj, July 2013 
# ------------------------------------------------------
cat("GED patterns counting\nV. Batagelj, July 2013\n\n",date(),"\n",sep="")
setwd("C:/Users/batagelj/test/mimo/ged")
files <- dir(pattern="*.ged")
freq <- integer(19)
L <- readLines("PajekLog.temp",n=-1)
rez <- file("count.dat","w")
for(f in files){ 
  N <- gsub("ged1.ged",f,L)
  writeLines(N,con="Pajek.log")
  system("C:/programi/pajek/pajek.exe",invisible=TRUE,wait=TRUE)
  file.remove("log1.log")
  if(!file.exists("1.clu")) {
    cat(f,"*** problems\n"); flush.console();next
  } else {
    for(i in 1:19){
      clu <- paste(i,".clu",sep="")
      u <- readLines(con=clu,n=-1) 
      v <- as.numeric(u[2:length(u)])
      freq[i] <- sum(v)
    }
    len <- length(v)
    cat(f,len,freq,"\n"); flush.console()
    cat(f,len,freq,"\n",file=rez)
    file.remove("1.clu")
  }
}
cat(date(),"\n"); close(rez)
for(i in 2:19) file.remove(paste(i,".clu",sep=""))

Running it in R we get the following report.

> source("C:\\Users\\batagelj\\test\\mimo\\ged\\GedCount.R")
GED patterns counting
V. Batagelj, July 2013

Sat Jul 20 13:01:51 2013
ged1.ged 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 120 
ged148.ged 1655 0 0 48 32 0 10 0 0 156 0 0 0 0 36 36 0 0 0 800 
ged27.ged 244 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 880 
ged30.ged 171 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 160 
ged65.ged *** problems
gedr1255.ged 2234 0 0 8 96 4 0 0 5 36 0 0 0 0 0 24 0 0 0 6488 
gedr1445.ged 2283 4 3 24 32 0 45 0 25 108 18 0 0 0 24 60 0 0 0 7440 
gedr1477.ged 2283 4 3 24 32 0 45 0 25 108 18 0 0 0 24 60 0 0 0 7440 
Sat Jul 20 13:02:06 2013 
>  

The results (frequencies) are saved on the file count.dat.

Note: before running the GedCount.R procedure delete all *.log files from its directory. In the case of errors in the GED file Pajek window will appear and some messages - simply close them and the procedure will continue processing the files (skiping the file with errors).

Problems with GED files

Testing the GedCount.R procedure there were problems with the file ged65.ged. I tried to import it into Brother's Keeper 6.6. It reported two problems:

  1. strange line breaks and empty lines; I imported the file into TextPad and replaced \n\n with \n and using the sort of lines on first character I also idenfified the breaks and removed them.
  2. in the file ged65.ged long GEDCOM tags (keywords) are used. They have to be replaced by short tags (see list 1, list 2)

I corrected the file ged65.ged. Here is the corrected version.

First hints

September 2012

Subject:   	Re: gen
From:   	"Vladimir Batagelj" <vladimir.batagelj@fmf.uni-lj.si>
Date:   	Fri, September 7, 2012 01:11
To:   	DOMENICO.DESTEFANO@econ.units.it
Cc:   	vladimir.batagelj@fmf.uni-lj.si

Mimo,

here is the program in R:
-------------------------------------------------------
> setwd("C:/Users/Batagelj/Documents/papers/2012/capri/net/mimo")
> files <- c("ged1","bouchard","ged2")
> L <- readLines("PajekLog.temp",n=-1)
> for(f in files){
+   N <- gsub("bouchard",f,L)
+ #  print(N)
+   writeLines(N,con="Pajek.log")
+   system("C:/programi/pajek/pajek.exe",invisible=TRUE)
+ }
-------------------------------------------------------

It assumes that in the directory mimo there are three GED files
ged1.ged, bouchard.ged and ged2.ged.
The file PajekLog.temp contains a template of commands to be
executed by Pajek. The important addition is that it should be
terminated by a line with command EXIT. See the attached file.

This solution works, but it can be further improved - deleting
log files, ...

Vlado
-- 
Vladimir Batagelj, University of Ljubljana, FMF, Department of Mathematics
  Jadranska 19, 1000 Ljubljana, Slovenia
http://vlado.fmf.uni-lj.si

Attachments:
pajekLog.temp 	1 k  	[ application/octet-stream ] 		 Download
NETBEGIN 1
CLUBEGIN 1
PERBEGIN 1
CLSBEGIN 1
HIEBEGIN 1
VECBEGIN 1

Msg Reading Network   ---    C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\pattern1.net
N 1 RDN "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\pattern1.net" (3)
Msg Reading Network   ---    C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged
N 2 RDN "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged" (259)
Msg All degree centrality of 2. C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.ged (259)
C 1 DEGC 2 [2] (259)
Msg Saving partition to file   ---    C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.clu
C 1 WC "C:\Users\Batagelj\Documents\papers\2012\capri\net\mimo\bouchard.clu" (259)
EXIT
notes/runr.txt · Last modified: 2015/07/14 11:45 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki