====== WoS2Pajek ====== ===== Running ===== >>> import sys; wdir = r'd:\data\WoS'; sys.path.append(wdir) >>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python' >>> import os; os.chdir(wdir); sys.path.append(MLdir) >>> import WoS2Pajek In the Tk window enter the specifications. ===== SN9 first run ===== Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import sys; wdir = r'C:\Users\Batagelj\work\Python\WoS'; sys.path.append(wdir) >>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python' >>> sys.path.append(MLdir) >>> import WoS2Pajek Module Wos2Pajek imported. *** WoS2Pajek 1.0 by V. Batagelj, October 24, 2011 / March 23, 2007 WoS2Pajek parameters WoS dir: C:\Users\Batagelj\work\Python\WoS ML dir: c:\Python25\Lib\site-packages\MontyLingua-2.1\Python Proj dir: C:/Users/Batagelj/work/Python/WoS/SN9 WoS file: C:/Users/Batagelj/work/Python/WoS/SN9/SN9.wos MaxNum : 3800000 step : 1000 ISI name: 0 clean : True keywords: [True, True, True, False] *** WoS2Pajek 1.0 by V. Batagelj, October 24, 2011 / March 23, 2007 started: Sat May 11 23:48:39 2013 >>> -------------------------------------------------------- >>> Data on social networks >>> TS="social network*" OR SO=(Social Networks) >>> May 9th 2013 >>> by Monika Cerinsek and Vladimir Batagelj >>> -------------------------------------------------------- 69852 : ODEN_S(1977)48:495 - 2013-05-12 00:53:39.851000 >>> End of processing of WoS file number of works = 1625956 number of authors = 416266 number of journals = 63611 number of keywords = 46659 number of records = 69852 number of duplicates = 5785 clean WoS data: clean.WoS *** FILES: year of publication partition: C:/Users/Batagelj/work/Python/WoS/SN9\Year.clu described / cited only partition: C:/Users/Batagelj/work/Python/WoS/SN9\DC.clu number of pages vector: C:/Users/Batagelj/work/Python/WoS/SN9\NP.vec citation network: C:/Users/Batagelj/work/Python/WoS/SN9\Cite.net works X journals network: C:/Users/Batagelj/work/Python/WoS/SN9\WJ.net works X keywords network: C:/Users/Batagelj/work/Python/WoS/SN9\WK.net works X authors network: C:/Users/Batagelj/work/Python/WoS/SN9\WA.net finished: Sun May 12 00:57:54 2013 time used: 1:09:15.692000 There were some errors in the trace. *** Error de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE *** Error de Silva K. M., 2005, HIST SRI LANKA : DE *** Error Van Dijck J, 2007, MEDIATED MEMORIES DI : VAN *** Error Stewart TA, 2000, FORTUNE, V142, P390 : STEWART *** Error FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB ... Similar errors appeared already in the trace for SN8. *** Error REUTERS 0531 : REUTERS *** Error TRIBUNE 0502, P9 : TRIBUNE *** Error VORBOTE 0317, P8 : VORBOTE *** Error RAZON 0518, A8 : RAZON *** Error ECONOMIST 1022, P65 : ECONOMIS The common reason are double spaces in the names. I removed them in an editor. In the next version of WoS2Pajek this will be included in the name routines. Current SN9 is smaller than SN8: 147377 : ODEN_S(1977)48:495 - 2011-12-04 14:06:22.571705 >>> End of processing of WoS file number of works = 2357436 number of authors = 527079 number of journals = 88518 number of keywords = 64993 number of records = 147377 number of duplicates = 35574 *** FILES: year of publication partition: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\Year.clu described / cited only partition: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\DC.clu number of pages vector: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\NP.vec citation network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\Cite.net works X journals network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WJ.net works X keywords network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WK.net works X authors network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WA.net finished: Sun Dec 4 14:10:46 2011 time used: 2 days, 3:30:49.275828 ===== SN9 second run ===== Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import sys; wdir = r'C:\Users\Batagelj\work\Python\WoS'; sys.path.append(wdir) >>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python' >>> import os; os.chdir(wdir); sys.path.append(MLdir) >>> import WoS2Pajek Module Wos2Pajek imported. *** WoS2Pajek 1.1 by V. Batagelj, May 12, 2013 / March 23, 2007 WoS2Pajek parameters WoS dir: C:\Users\Batagelj\work\Python\WoS ML dir: c:\Python25\Lib\site-packages\MontyLingua-2.1\Python Proj dir: C:/Users/Batagelj/work/Python/WoS/SN9 WoS file: C:/Users/Batagelj/work/Python/WoS/SN9/SN9clean.WoS MaxNum : 2500000 step : 1000 ISI name: 0 clean : True keywords: [True, True, True, True] ****** MontyLingua v.2.1 ****** ***** by hugo@media.mit.edu ***** *** WoS2Pajek 1.1 by V. Batagelj, May 12, 2013 / March 23, 2007 started: Sun May 12 03:48:56 2013 64067 : ODEN_S(1977)48:495 - 2013-05-12 07:27:43.063000 >>> End of processing of WoS file number of works = 1625722 number of authors = 416231 number of journals = 63591 number of keywords = 124057 number of records = 64067 number of duplicates = 0 clean WoS data: clean.WoS *** FILES: year of publication partition: C:/Users/Batagelj/work/Python/WoS/SN9\Year.clu described / cited only partition: C:/Users/Batagelj/work/Python/WoS/SN9\DC.clu number of pages vector: C:/Users/Batagelj/work/Python/WoS/SN9\NP.vec citation network: C:/Users/Batagelj/work/Python/WoS/SN9\Cite.net works X journals network: C:/Users/Batagelj/work/Python/WoS/SN9\WJ.net works X keywords network: C:/Users/Batagelj/work/Python/WoS/SN9\WK.net works X authors network: C:/Users/Batagelj/work/Python/WoS/SN9\WA.net finished: Sun May 12 07:34:09 2013 time used: 3:45:13.804000 >>> ===== Replacing multiple whitespaces with a single space ===== Processing the SN9.WoS some errors appeared in function nameG. Their reason are the multiple spaces in some places in ISI names. >>> a = "de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE" >>> b = "FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB" >>> c = "TRIBUNE 0502, P9 : TRIBUNE" >>> import re >>> p = re.compile(r'\s+') >>> p.sub(' ',a) 'de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE' >>> p.sub(' ',b) 'FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB' >>> p.sub(' ',c) 'TRIBUNE 0502, P9 : TRIBUNE' I replaced in nameG the part else: q = s[0].split(" ") n = q[0][:8] try: if len(q) > 1 : n = n+'_'+q[1][0] except: print "*** Error ", name, " : ", n with else: q = s[0].replace(' ','',1).split(" "); lq = len(q) n = q[0][:8] if lq > 1 : if len(q[1])>0: n = n+'_'+q[1][0] else: n = n+'_'+q[lq-1] >>> a = "de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42" >>> b = "FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD" >>> c = "TRIBUNE 0502, P9" >>> wdir = r'C:\Users\Batagelj\work\Python\WoS' >>> import os; os.chdir(wdir); import sys; sys.path = [wdir]+sys.path; from names import * >>> nameG(a) 'DESOUZA_F(2010):42' >>> nameG(b) 'FURSTENB_F(1976):' >>> nameG(c) 'TRIBUNE_0502(0):P9'