This is an old revision of the document!


WoS2Pajek

sn9

Running

>>> import sys; wdir = r'd:\data\WoS'; sys.path.append(wdir)
>>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python'
>>> import os; os.chdir(wdir); sys.path.append(MLdir)
>>> import WoS2Pajek

In the Tk window enter the specifications.

SN9 first run

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import sys; wdir = r'C:\Users\Batagelj\work\Python\WoS'; sys.path.append(wdir)
>>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python'
>>> sys.path.append(MLdir)
>>> import WoS2Pajek
Module Wos2Pajek imported.

*** WoS2Pajek 1.0 by V. Batagelj, October 24, 2011 / March 23, 2007

WoS2Pajek parameters
WoS  dir:  C:\Users\Batagelj\work\Python\WoS
ML   dir:  c:\Python25\Lib\site-packages\MontyLingua-2.1\Python
Proj dir:  C:/Users/Batagelj/work/Python/WoS/SN9
WoS file:  C:/Users/Batagelj/work/Python/WoS/SN9/SN9.wos
MaxNum  :  3800000
step    :  1000
ISI name:  0
clean   :  True
keywords:  [True, True, True, False]

*** WoS2Pajek 1.0
by V. Batagelj, October 24, 2011 / March 23, 2007

started: Sat May 11 23:48:39 2013

>>>  --------------------------------------------------------
>>>  Data on social networks
>>>  TS="social network*" OR SO=(Social Networks)
>>>  May 9th 2013
>>>  by Monika Cerinsek and Vladimir Batagelj
>>>  --------------------------------------------------------


69852 : ODEN_S(1977)48:495  -  2013-05-12 00:53:39.851000

>>> End of processing of WoS file
number of works      =  1625956
number of authors    =  416266
number of journals   =  63611
number of keywords   =  46659
number of records    =  69852
number of duplicates =  5785
clean WoS data: clean.WoS

*** FILES:
year of publication partition: C:/Users/Batagelj/work/Python/WoS/SN9\Year.clu
described / cited only partition: C:/Users/Batagelj/work/Python/WoS/SN9\DC.clu
number of pages vector: C:/Users/Batagelj/work/Python/WoS/SN9\NP.vec
citation network: C:/Users/Batagelj/work/Python/WoS/SN9\Cite.net
works X journals network: C:/Users/Batagelj/work/Python/WoS/SN9\WJ.net
works X keywords network: C:/Users/Batagelj/work/Python/WoS/SN9\WK.net
works X authors  network: C:/Users/Batagelj/work/Python/WoS/SN9\WA.net
finished: Sun May 12 00:57:54 2013
time used:  1:09:15.692000

There were some errors in the trace.

*** Error  de  SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42  :  DE
*** Error  de  Silva K. M., 2005, HIST SRI LANKA  :  DE
*** Error  Van  Dijck J, 2007, MEDIATED MEMORIES DI  :  VAN
*** Error  Stewart  TA, 2000, FORTUNE, V142, P390  :  STEWART
*** Error  FURSTENBERG  Jr Franck F, 1976, UNPLANNED PARENTHOOD  :  FURSTENB
...

Similar errors appeared already in the trace for SN8.

*** Error  REUTERS         0531  :  REUTERS
*** Error  TRIBUNE         0502, P9  :  TRIBUNE
*** Error  VORBOTE         0317, P8  :  VORBOTE
*** Error  RAZON           0518, A8  :  RAZON
*** Error  ECONOMIST       1022, P65  :  ECONOMIS

The common reason are double spaces in the names. I removed them in an editor. In the next version of WoS2Pajek this will be included in the name routines.

Current SN9 is smaller than SN8:

147377 : ODEN_S(1977)48:495  -  2011-12-04 14:06:22.571705
>>> End of processing of WoS file
number of works		=  2357436
number of authors	 =  527079
number of journals	=  88518
number of keywords	=  64993
number of records	 =  147377
number of duplicates =  35574

*** FILES:
year of publication partition: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\Year.clu
described / cited only partition: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\DC.clu
number of pages vector: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\NP.vec
citation network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\Cite.net
works X journals network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WJ.net
works X keywords network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WK.net
works X authors  network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WA.net
finished: Sun Dec  4 14:10:46 2011
time used:  2 days, 3:30:49.275828

SN9 second run

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import sys; wdir = r'C:\Users\Batagelj\work\Python\WoS'; sys.path.append(wdir)
>>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python'
>>> import os; os.chdir(wdir); sys.path.append(MLdir)
>>> import WoS2Pajek
Module Wos2Pajek imported.

*** WoS2Pajek 1.1 by V. Batagelj, May 12, 2013 / March 23, 2007

WoS2Pajek parameters
WoS  dir:  C:\Users\Batagelj\work\Python\WoS
ML   dir:  c:\Python25\Lib\site-packages\MontyLingua-2.1\Python
Proj dir:  C:/Users/Batagelj/work/Python/WoS/SN9
WoS file:  C:/Users/Batagelj/work/Python/WoS/SN9/SN9clean.WoS
MaxNum  :  2500000
step    :  1000
ISI name:  0
clean   :  True
keywords:  [True, True, True, True]

****** MontyLingua v.2.1 ******
***** by hugo@media.mit.edu *****

*** WoS2Pajek 1.1 
by V. Batagelj, May 12, 2013 / March 23, 2007

started: Sun May 12 03:48:56 2013

64067 : ODEN_S(1977)48:495  -  2013-05-12 07:27:43.063000
>>> End of processing of WoS file
number of works      =  1625722
number of authors    =  416231
number of journals   =  63591
number of keywords   =  124057
number of records    =  64067
number of duplicates =  0
clean WoS data: clean.WoS

*** FILES:
year of publication partition: C:/Users/Batagelj/work/Python/WoS/SN9\Year.clu
described / cited only partition: C:/Users/Batagelj/work/Python/WoS/SN9\DC.clu
number of pages vector: C:/Users/Batagelj/work/Python/WoS/SN9\NP.vec
citation network: C:/Users/Batagelj/work/Python/WoS/SN9\Cite.net
works X journals network: C:/Users/Batagelj/work/Python/WoS/SN9\WJ.net
works X keywords network: C:/Users/Batagelj/work/Python/WoS/SN9\WK.net
works X authors  network: C:/Users/Batagelj/work/Python/WoS/SN9\WA.net
finished: Sun May 12 07:34:09 2013
time used:  3:45:13.804000
>>>

Replacing multiple whitespaces with a single space

Processing the SN9.WoS some errors appeared in function nameG. Their reason are the multiple spaces in some places in ISI names.

>>> a = "de  SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42  :  DE"
>>> b = "FURSTENBERG  Jr Franck F, 1976, UNPLANNED PARENTHOOD  :  FURSTENB"
>>> c = "TRIBUNE         0502, P9  :  TRIBUNE"
>>> import re
>>> p = re.compile(r'\s+')
>>> p.sub(' ',a)
'de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE'
>>> p.sub(' ',b)
'FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB'
>>> p.sub(' ',c)
'TRIBUNE 0502, P9 : TRIBUNE'

I replaced in nameG the part

  else:
    q = s[0].split(" ")
    n = q[0][:8]
    try:
      if len(q) > 1 : n = n+'_'+q[1][0]
    except:
      print "*** Error ", name, " : ", n

with

  else:
    q = s[0].replace('  ','',1).split(" "); lq = len(q)
    n = q[0][:8]
    if lq > 1 :
      if len(q[1])>0: n = n+'_'+q[1][0]
      else: n = n+'_'+q[lq-1]
>>> a = "de  SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42"
>>> b = "FURSTENBERG  Jr Franck F, 1976, UNPLANNED PARENTHOOD"
>>> c = "TRIBUNE         0502, P9"
>>> wdir = r'C:\Users\Batagelj\work\Python\WoS'
>>> import os; os.chdir(wdir); import sys; sys.path = [wdir]+sys.path; from names import *
>>> nameG(a)
'DESOUZA_F(2010):42'
>>> nameG(b)
'FURSTENB_F(1976):'
>>> nameG(c)
'TRIBUNE_0502(0):P9' 
notes/net/wos2paj.1531212896.txt.gz · Last modified: 2018/07/10 10:54 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki