Temporal cores in bibliographic networks

Normalization

We first implemented a temporal version of the normalized authorship network from Batagelj, V, Cerinšek, M: On bibliographic networks. Scientometrics 96 (2013) 3, 845-864.

N = n(WA)

    def TQnormal(self,key='tq'):
        N = deepcopy(self)
        for u in N.nodesMode(1):
            qu = TQ.TQ.invert(N.TQnetOutDeg(u),vZero=1)
            for p in N.outStar(u):
                N._links[p][4][key] = TQ.TQ.prod(qu,N._links[p][4][key])
        return N

The program normalize.py transforms a given temporal two-mode network from the file file into the corresponding normalized collaboration network Ct saved to the file fjson

gdir = 'c:/users/batagelj/work/python/graph/graph'
# wdir = 'c:/users/batagelj/work/python/graph/JSON/test'
wdir = 'c:/users/batagelj/work/python/graph/JSON/SN5'
import sys, os, datetime, json
sys.path = [gdir]+sys.path; os.chdir(wdir)
from GraphNew import Graph
# from copy import deepcopy
import TQ
# file = 'WAtestInst.json'
# file = 'WAtestCum.json'
file = 'WAcCum.json'
t1 = datetime.datetime.now()
print("started: ",t1.ctime(),"\n")
WA = Graph.loadNetJSON(file)
t2 = datetime.datetime.now()
print("\nloaded: ",t2.ctime(),"\ntime used: ", t2-t1)
N = WA.TQnormal()
t3 = datetime.datetime.now()
print("\nnormalized: ",t3.ctime(),"\ntime used: ", t3-t2)
# Cc = N.TQtwo2oneCols(lType='arc')
Cc = N.TQtwo2oneCols()
t4 = datetime.datetime.now()
print("\nnormalized collaboration: ",t4.ctime(),"\ntime used: ", t4-t3)
# fjson = 'CcITest.json'
# fjson = 'CcCtest.json'
fjson = 'CcCumSN5.json'
Cc.saveNetJSON(fjson,indent=2)
t5 = datetime.datetime.now()
print("\nsave to file: ",t5.ctime(),"\ntime used: ", t5-t4)

As a real data example we used the data set SN5 (2008). The temporal authorship network WA (restricted to works with DC=1) has two versions: the instant WAcInst.json and the cumulative WAcCum.json. Both networks have the same sizes |W| = 7950, |A| = 12458 and |L| = 19488.

>>> 
====== RESTART: C:/Users/batagelj/work/Python/graph/graph/normalize.py ======
started:  Tue Oct 25 18:56:04 2016 

loaded:  Tue Oct 25 18:56:04 2016 
time used:  0:00:00.392023

normalized:  Tue Oct 25 18:56:06 2016 
time used:  0:00:01.801103

normalized collaboration:  Tue Oct 25 18:56:08 2016 
time used:  0:00:01.385079

save to file:  Tue Oct 25 18:56:12 2016 
time used:  0:00:04.236242
>>>

For testing files WAtestInst.json and WAtestCum.json can be used.

pS cores

To compute the pS cores I applied the program PsCoresTq.py. The program ran into a cycle. First I suspected that the reason are the rounding errors. But it turned out that the statement

   cCore = TQ.TQ.complement(Core[u],Tmin,Tmax)

should be changed to

   cCore = TQ.TQ.complement(Core[u],Tmin,Tmax+1)
import sys, os, datetime, json
sys.path = [gdir]+sys.path; os.chdir(wdir)
import GraphNew as Graph
import TQ
fJSON = 'CcCumSN5.json'
G = Graph.Graph.loadNetJSON(fJSON)
G.delLoops()
print("Temporal Ps cores in: ",fJSON)
t1 = datetime.datetime.now()
print("started: ",t1.ctime(),"\n")
Tmin,Tmax = G._graph['time']
D = { u: G.TQnetSum(u) for u in G._nodes }
Core = { u: [d for d in D[u] if d[2]==0] for u in G.nodes() }
D = { u: [d for d in D[u] if d[2]>0] for u in G.nodes() }
D = { u: d for u,d in D.items() if d!=[] }
Dmin = { u: min([e[2] for e in d]) for u,d in D.items() }
step = 0
while len(D)>0:
   step += 1
   dmin,u = min( (v,k) for k,v in Dmin.items() )
   if step % 100 == 1:
      print("{0:3d}. dmin={1:10.4f}   node={2:4d}".format(step,dmin,u))
   cCore = TQ.TQ.complement(Core[u],Tmin,Tmax+1)
   core = TQ.TQ.extract(cCore,[d for d in D[u] if d[2] == dmin])
   if core!=[]:
      Core[u] = TQ.TQ.sum(Core[u],core)
      D[u] = TQ.TQ.cutGE(TQ.TQ.sum(D[u],TQ.TQ.minus(core)),dmin) 
      for link in G.star(u):
         v = G.twin(u,link)
         if not(v in D): continue
         chLink = TQ.TQ.minus(TQ.TQ.extract(core,G.getLink(link,'tq')))
         if chLink==[]: continue
         diff = TQ.TQ.cutGE(TQ.TQ.sum(D[v],chLink),0)  
         D[v] = [ (sd,fd,max(vd,dmin)) for sd,fd,vd in diff ]
         if len(D[v])==0: del D[v]; del Dmin[v]
         else: Dmin[v] = min([e[2] for e in D[v]])
   if len(D[u])==0: del D[u]; del Dmin[u]
   else: Dmin[u] = min([e[2] for e in D[u]])
print("{0:3d}. dmin={1:10.4f}   node={2:4d}".format(step,dmin,u))
t2 = datetime.datetime.now()
print("\nfinished: ",t2.ctime(),"\ntime used: ", t2-t1)

It turned out that the pS cores procedure is quite fast. It produced the cores in 19s.

>>> 
====== RESTART: C:\Users\batagelj\work\Python\graph\graph\PsCoresTQ.py ======
Temporal Ps cores in:  CcCSN5.json
started:  Tue Oct 25 17:23:26 2016 

  1. dmin=    0.0408   node= 548
101. dmin=    0.0476   node=5079
201. dmin=    0.1244   node=9917
301. dmin=    0.1528   node=4316
401. dmin=    0.1653   node=7756
501. dmin=    0.1800   node=7123
601. dmin=    0.1975   node= 873
...
14201. dmin=    0.9444   node=5560
14301. dmin=    1.0000   node=1056
14401. dmin=    1.0000   node=4485
14501. dmin=    1.0072   node=1358
14601. dmin=    1.1389   node=2051
14701. dmin=    1.2850   node= 337
14801. dmin=    1.3889   node=4560
14901. dmin=    1.5000   node=4045
15001. dmin=    1.7222   node=3130
15101. dmin=    2.1428   node=1924
15201. dmin=    3.0000   node= 796
15261. dmin=    9.7917   node=  20

finished:  Tue Oct 25 17:23:45 2016 
time used:  0:00:19.644124
>>>
>>> C = TQ.TQ.TQdictCut(Core,3)
>>> for v in C:
   print("{0:3d} : {1:11s} ".format(v,G.getNode(v,'lab')),C[v])

   
  20 : BORGATTI_S   [(1991, 1992, 3.1667), (1992, 1993, 4.1667), (1993, 1994, 5.1667), 
    (1994, 1996, 6.1667), (1996, 1997, 6.6667), (1997, 1999, 7.1667), (1999, 2003, 8.6667),
    (2003, 2005, 8.7917), (2005, 2006, 9.2917), (2006, 2009, 9.7917)]
3169 : EVERETT_M    [(1991, 1992, 3.1667), (1992, 1993, 4.1667), (1993, 1994, 5.1667), 
    (1994, 1996, 6.1667), (1996, 1997, 6.6667), (1999, 2003, 8.6667), (2003, 2005, 8.7917), 
    (2005, 2006, 9.2917), (2006, 2009, 9.7917)]
 317 : BERNARD_H    [(1990, 1991, 3.0244), (1991, 1995, 3.1494), (1995, 1997, 3.3094), 
    (1997, 1998, 3.3894), (1998, 2001, 3.5494), (2001, 2003, 3.6294), (2003, 2006, 3.685), 
    (2006, 2009, 4.0706)]
2232 : KILLWORT_P   [(1990, 1991, 3.0244), (1991, 1995, 3.1494), (1995, 1997, 3.3094), 
    (2003, 2006, 3.685), (2006, 2009, 4.0706)]
4551 : STEINHAU_H   [(2003, 2005, 3.0), (2005, 2006, 3.2222), (2006, 2009, 3.6667)]
4860 : METZKE_C     [(2003, 2005, 3.0), (2005, 2006, 3.2222), (2006, 2009, 3.6667)]
3125 : SHELLEY_G    [(2006, 2009, 3.4767)]
1673 : MCCARTY_C    [(2006, 2009, 3.4767)]
1677 : JOHNSEN_E    [(2006, 2009, 3.4767)]
  75 : HOLLAND_P    [(1981, 1983, 3.0), (1983, 2009, 3.2222)]
  78 : LEINHARD_S   [(1981, 1983, 3.0), (1983, 2009, 3.2222)]
 925 : BONACICH_P   [(1997, 2009, 3.2222)]
3840 : BIENENST_E   [(1997, 2009, 3.2222)]
  69 : WASSERMA_S   [(2007, 2009, 3.0174)]
1164 : DOREIAN_P    [(2007, 2009, 3.0174)]
1166 : HUMMON_N     [(2007, 2009, 3.0174)]
1680 : PATTISON_P   [(2007, 2009, 3.0174)]
3225 : FARARO_T     [(2007, 2009, 3.0174)]
1056 : FAUST_K      [(2007, 2009, 3.0174)]
3170 : FERLIGOJ_A   [(2007, 2009, 3.0174)]
2083 : ROBINS_G     [(2007, 2009, 3.0174)]
2084 : SKVORETZ_J   [(2007, 2009, 3.0174)]
 949 : BATAGELJ_V   [(2007, 2009, 3.0174)]
  79 : NEWMAN_M     [(2005, 2009, 3.0)]
 796 : PARK_J       [(2005, 2009, 3.0)]
>>> 
>>> D = { u: G.TQnetSum(u) for u in G._nodes }
>>> S = [ 20, 3169, 1164, 3170, 949, 79 ]
>>> for v in S:
   print("{0:3d} : {1:11s} ".format(v,G.getNode(v,'lab')),D[v])

   
  20 : BORGATTI_S   [(1970, 1988, 0), (1988, 1989, 0.5), (1989, 1990, 1.4444), 
   (1990, 1991, 3.3333), (1991, 1992, 4.2778), (1992, 1993, 5.2778), (1993, 1994, 6.2778),
   (1994, 1996, 7.7778), (1996, 1997, 8.2778), (1997, 1998, 9.2222), (1998, 1999, 9.5972),
   (1999, 2001, 11.0972), (2001, 2002, 11.9167), (2002, 2003, 12.3611), (2003, 2005, 14.1111),
   (2005, 2006, 15.1111), (2006, 2007, 16.0556), (2007, 2009, 16.9204)]
3169 : EVERETT_M    [(1970, 1988, 0), (1988, 1989, 1.0), (1989, 1990, 1.9444), 
   (1990, 1991, 3.8333), (1991, 1992, 4.3333), (1992, 1993, 5.3333), (1993, 1994, 6.3333),
   (1994, 1996, 7.3333), (1996, 1997, 7.8333), (1997, 1999, 8.3333), (1999, 2003, 10.3333), 
   (2003, 2004, 10.7083), (2004, 2005, 11.1528), (2005, 2006, 11.6528), (2006, 2007, 12.1528), 
   (2007, 2009, 12.5972)]
1164 : DOREIAN_P    [(1970, 1984, 0), (1984, 1985, 0.5), (1985, 1989, 1.0), (1989, 1990, 1.5),
   (1990, 1992, 2.4444), (1992, 1994, 3.8333), (1994, 1995, 5.2778), (1995, 1996, 5.7222), 
   (1996, 2000, 7.0972), (2000, 2001, 7.5417), (2001, 2003, 8.0417), (2003, 2004, 8.5417), 
   (2004, 2007, 9.4306), (2007, 2009, 9.4714)]
3170 : FERLIGOJ_A   [(1970, 1982, 0), (1982, 1983, 0.5), (1983, 1992, 1.0), (1992, 1994, 2.3889),
   (1994, 1999, 2.8333), (1999, 2000, 3.3333), (2000, 2001, 3.7778), (2001, 2002, 4.2778), 
   (2002, 2004, 4.6528), (2004, 2005, 6.6389), (2005, 2007, 7.1389), (2007, 2009, 7.1797)]
 949 : BATAGELJ_V   [(1970, 1982, 0), (1982, 1983, 0.5), (1983, 1992, 1.0), 
   (1992, 1994, 2.3889), (1994, 1999, 2.8333), (1999, 2000, 3.7222), (2000, 2001, 4.6667),
   (2001, 2002, 5.1667), (2002, 2004, 5.6667), (2004, 2005, 6.5556), (2005, 2007, 7.0556), 
   (2007, 2009, 7.5964)]
  79 : NEWMAN_M     [(1970, 1999, 0), (1999, 2000, 1.0), (2000, 2001, 2.3194), 
   (2001, 2002, 3.5283), (2002, 2003, 5.80611), (2003, 2004, 7.1811), (2004, 2005, 10.3206),
   (2005, 2006, 12.01556), (2006, 2007, 15.7239), (2007, 2009, 17.0989)]
>>> 
>>> u = 3174  # Mrvar
>>> for p in G.star(u): v = G.twin(u,p); print(v,G.getNode(v,'lab'),':',G._links[p][4]['tq'])

1164 DOREIAN_P :  [[1996, 2009, 0.5]]
4300 ZAVERSNI_M : [[1999, 2009, 0.2222]]
 302 WHITE_D :    [[1999, 2009, 0.2222]]
 949 BATAGELJ_V : [[1999, 2000, 0.4444], [2000, 2001, 0.9444], [2001, 2002, 1.4444], [2002, 2009, 1.9444]]
>>> 
tq/work/cores/bib.txt · Last modified: 2016/11/08 12:34 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki