====== WoS2Pajek ======
===== Running =====
>>> import sys; wdir = r'd:\data\WoS'; sys.path.append(wdir)
>>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python'
>>> import os; os.chdir(wdir); sys.path.append(MLdir)
>>> import WoS2Pajek
In the Tk window enter the specifications.
===== SN9 first run =====
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import sys; wdir = r'C:\Users\Batagelj\work\Python\WoS'; sys.path.append(wdir)
>>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python'
>>> sys.path.append(MLdir)
>>> import WoS2Pajek
Module Wos2Pajek imported.
*** WoS2Pajek 1.0 by V. Batagelj, October 24, 2011 / March 23, 2007
WoS2Pajek parameters
WoS dir: C:\Users\Batagelj\work\Python\WoS
ML dir: c:\Python25\Lib\site-packages\MontyLingua-2.1\Python
Proj dir: C:/Users/Batagelj/work/Python/WoS/SN9
WoS file: C:/Users/Batagelj/work/Python/WoS/SN9/SN9.wos
MaxNum : 3800000
step : 1000
ISI name: 0
clean : True
keywords: [True, True, True, False]
*** WoS2Pajek 1.0
by V. Batagelj, October 24, 2011 / March 23, 2007
started: Sat May 11 23:48:39 2013
>>> --------------------------------------------------------
>>> Data on social networks
>>> TS="social network*" OR SO=(Social Networks)
>>> May 9th 2013
>>> by Monika Cerinsek and Vladimir Batagelj
>>> --------------------------------------------------------
69852 : ODEN_S(1977)48:495 - 2013-05-12 00:53:39.851000
>>> End of processing of WoS file
number of works = 1625956
number of authors = 416266
number of journals = 63611
number of keywords = 46659
number of records = 69852
number of duplicates = 5785
clean WoS data: clean.WoS
*** FILES:
year of publication partition: C:/Users/Batagelj/work/Python/WoS/SN9\Year.clu
described / cited only partition: C:/Users/Batagelj/work/Python/WoS/SN9\DC.clu
number of pages vector: C:/Users/Batagelj/work/Python/WoS/SN9\NP.vec
citation network: C:/Users/Batagelj/work/Python/WoS/SN9\Cite.net
works X journals network: C:/Users/Batagelj/work/Python/WoS/SN9\WJ.net
works X keywords network: C:/Users/Batagelj/work/Python/WoS/SN9\WK.net
works X authors network: C:/Users/Batagelj/work/Python/WoS/SN9\WA.net
finished: Sun May 12 00:57:54 2013
time used: 1:09:15.692000
There were some errors in the trace.
*** Error de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE
*** Error de Silva K. M., 2005, HIST SRI LANKA : DE
*** Error Van Dijck J, 2007, MEDIATED MEMORIES DI : VAN
*** Error Stewart TA, 2000, FORTUNE, V142, P390 : STEWART
*** Error FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB
...
Similar errors appeared already in the trace for SN8.
*** Error REUTERS 0531 : REUTERS
*** Error TRIBUNE 0502, P9 : TRIBUNE
*** Error VORBOTE 0317, P8 : VORBOTE
*** Error RAZON 0518, A8 : RAZON
*** Error ECONOMIST 1022, P65 : ECONOMIS
The common reason are double spaces in the names. I removed them in an editor.
In the next version of WoS2Pajek this will be included in the name routines.
Current SN9 is smaller than SN8:
147377 : ODEN_S(1977)48:495 - 2011-12-04 14:06:22.571705
>>> End of processing of WoS file
number of works = 2357436
number of authors = 527079
number of journals = 88518
number of keywords = 64993
number of records = 147377
number of duplicates = 35574
*** FILES:
year of publication partition: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\Year.clu
described / cited only partition: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\DC.clu
number of pages vector: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\NP.vec
citation network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\Cite.net
works X journals network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WJ.net
works X keywords network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WK.net
works X authors network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WA.net
finished: Sun Dec 4 14:10:46 2011
time used: 2 days, 3:30:49.275828
===== SN9 second run =====
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import sys; wdir = r'C:\Users\Batagelj\work\Python\WoS'; sys.path.append(wdir)
>>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python'
>>> import os; os.chdir(wdir); sys.path.append(MLdir)
>>> import WoS2Pajek
Module Wos2Pajek imported.
*** WoS2Pajek 1.1 by V. Batagelj, May 12, 2013 / March 23, 2007
WoS2Pajek parameters
WoS dir: C:\Users\Batagelj\work\Python\WoS
ML dir: c:\Python25\Lib\site-packages\MontyLingua-2.1\Python
Proj dir: C:/Users/Batagelj/work/Python/WoS/SN9
WoS file: C:/Users/Batagelj/work/Python/WoS/SN9/SN9clean.WoS
MaxNum : 2500000
step : 1000
ISI name: 0
clean : True
keywords: [True, True, True, True]
****** MontyLingua v.2.1 ******
***** by hugo@media.mit.edu *****
*** WoS2Pajek 1.1
by V. Batagelj, May 12, 2013 / March 23, 2007
started: Sun May 12 03:48:56 2013
64067 : ODEN_S(1977)48:495 - 2013-05-12 07:27:43.063000
>>> End of processing of WoS file
number of works = 1625722
number of authors = 416231
number of journals = 63591
number of keywords = 124057
number of records = 64067
number of duplicates = 0
clean WoS data: clean.WoS
*** FILES:
year of publication partition: C:/Users/Batagelj/work/Python/WoS/SN9\Year.clu
described / cited only partition: C:/Users/Batagelj/work/Python/WoS/SN9\DC.clu
number of pages vector: C:/Users/Batagelj/work/Python/WoS/SN9\NP.vec
citation network: C:/Users/Batagelj/work/Python/WoS/SN9\Cite.net
works X journals network: C:/Users/Batagelj/work/Python/WoS/SN9\WJ.net
works X keywords network: C:/Users/Batagelj/work/Python/WoS/SN9\WK.net
works X authors network: C:/Users/Batagelj/work/Python/WoS/SN9\WA.net
finished: Sun May 12 07:34:09 2013
time used: 3:45:13.804000
>>>
===== Replacing multiple whitespaces with a single space =====
Processing the SN9.WoS some errors appeared in function nameG. Their reason are the multiple spaces in some places in ISI names.
>>> a = "de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE"
>>> b = "FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB"
>>> c = "TRIBUNE 0502, P9 : TRIBUNE"
>>> import re
>>> p = re.compile(r'\s+')
>>> p.sub(' ',a)
'de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE'
>>> p.sub(' ',b)
'FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB'
>>> p.sub(' ',c)
'TRIBUNE 0502, P9 : TRIBUNE'
I replaced in nameG the part
else:
q = s[0].split(" ")
n = q[0][:8]
try:
if len(q) > 1 : n = n+'_'+q[1][0]
except:
print "*** Error ", name, " : ", n
with
else:
q = s[0].replace(' ','',1).split(" "); lq = len(q)
n = q[0][:8]
if lq > 1 :
if len(q[1])>0: n = n+'_'+q[1][0]
else: n = n+'_'+q[lq-1]
>>> a = "de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42"
>>> b = "FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD"
>>> c = "TRIBUNE 0502, P9"
>>> wdir = r'C:\Users\Batagelj\work\Python\WoS'
>>> import os; os.chdir(wdir); import sys; sys.path = [wdir]+sys.path; from names import *
>>> nameG(a)
'DESOUZA_F(2010):42'
>>> nameG(b)
'FURSTENB_F(1976):'
>>> nameG(c)
'TRIBUNE_0502(0):P9'