Differences

This shows you the differences between two versions of the page.

Link to this comparison view

notes:net:wos2paj [2018/07/10 10:54]
vlado
notes:net:wos2paj [2018/07/10 14:10] (current)
vlado
Line 1: Line 1:
 ====== WoS2Pajek ====== ====== WoS2Pajek ======
  
-[[notes:net:wos:sn9]]+  * [[notes:net:wos:sn9]] 
 +  * {{pajek:data:zip:bm2-citeb.zip|CiteB for BM2}} 
 +  * {{pub:pdf:bm2-chapter2.pdf|Chapter 2 from BM2}}
  
-===== Running ===== 
- 
-<code> 
->>> import sys; wdir = r'd:\data\WoS'; sys.path.append(wdir) 
->>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python' 
->>> import os; os.chdir(wdir); sys.path.append(MLdir) 
->>> import WoS2Pajek 
-</code> 
- 
-In the Tk window enter the specifications. 
- 
-===== SN9 first run ===== 
- 
-<code> 
-Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32 
-Type "copyright", "credits" or "license()" for more information. 
->>> import sys; wdir = r'C:\Users\Batagelj\work\Python\WoS'; sys.path.append(wdir) 
->>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python' 
->>> sys.path.append(MLdir) 
->>> import WoS2Pajek 
-Module Wos2Pajek imported. 
- 
-*** WoS2Pajek 1.0 by V. Batagelj, October 24, 2011 / March 23, 2007 
- 
-WoS2Pajek parameters 
-WoS  dir:  C:\Users\Batagelj\work\Python\WoS 
-ML   dir:  c:\Python25\Lib\site-packages\MontyLingua-2.1\Python 
-Proj dir:  C:/Users/Batagelj/work/Python/WoS/SN9 
-WoS file:  C:/Users/Batagelj/work/Python/WoS/SN9/SN9.wos 
-MaxNum  :  3800000 
-step    :  1000 
-ISI name:  0 
-clean   :  True 
-keywords:  [True, True, True, False] 
- 
-*** WoS2Pajek 1.0 
-by V. Batagelj, October 24, 2011 / March 23, 2007 
- 
-started: Sat May 11 23:48:39 2013 
- 
->>>  -------------------------------------------------------- 
->>>  Data on social networks 
->>>  TS="social network*" OR SO=(Social Networks) 
->>>  May 9th 2013 
->>>  by Monika Cerinsek and Vladimir Batagelj 
->>>  -------------------------------------------------------- 
- 
- 
-69852 : ODEN_S(1977)48:495  -  2013-05-12 00:53:39.851000 
- 
->>> End of processing of WoS file 
-number of works      =  1625956 
-number of authors    =  416266 
-number of journals   =  63611 
-number of keywords   =  46659 
-number of records    =  69852 
-number of duplicates =  5785 
-clean WoS data: clean.WoS 
- 
-*** FILES: 
-year of publication partition: C:/Users/Batagelj/work/Python/WoS/SN9\Year.clu 
-described / cited only partition: C:/Users/Batagelj/work/Python/WoS/SN9\DC.clu 
-number of pages vector: C:/Users/Batagelj/work/Python/WoS/SN9\NP.vec 
-citation network: C:/Users/Batagelj/work/Python/WoS/SN9\Cite.net 
-works X journals network: C:/Users/Batagelj/work/Python/WoS/SN9\WJ.net 
-works X keywords network: C:/Users/Batagelj/work/Python/WoS/SN9\WK.net 
-works X authors  network: C:/Users/Batagelj/work/Python/WoS/SN9\WA.net 
-finished: Sun May 12 00:57:54 2013 
-time used:  1:09:15.692000 
-</code> 
- 
-There were some errors in the trace. 
-<code> 
-*** Error  de  SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42  :  DE 
-*** Error  de  Silva K. M., 2005, HIST SRI LANKA  :  DE 
-*** Error  Van  Dijck J, 2007, MEDIATED MEMORIES DI  :  VAN 
-*** Error  Stewart  TA, 2000, FORTUNE, V142, P390  :  STEWART 
-*** Error  FURSTENBERG  Jr Franck F, 1976, UNPLANNED PARENTHOOD  :  FURSTENB 
-... 
-</code> 
-Similar errors appeared already in the trace for SN8. 
-<code> 
-*** Error  REUTERS         0531  :  REUTERS 
-*** Error  TRIBUNE         0502, P9  :  TRIBUNE 
-*** Error  VORBOTE         0317, P8  :  VORBOTE 
-*** Error  RAZON           0518, A8  :  RAZON 
-*** Error  ECONOMIST       1022, P65  :  ECONOMIS 
-</code> 
-The common reason are double spaces in the names. I removed them in an editor. 
-In the next version of WoS2Pajek this will be included in the name routines. 
- 
-Current SN9 is smaller than SN8: 
-<code> 
-147377 : ODEN_S(1977)48:495  -  2011-12-04 14:06:22.571705 
->>> End of processing of WoS file 
-number of works =  2357436 
-number of authors =  527079 
-number of journals =  88518 
-number of keywords =  64993 
-number of records =  147377 
-number of duplicates =  35574 
- 
-*** FILES: 
-year of publication partition: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\Year.clu 
-described / cited only partition: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\DC.clu 
-number of pages vector: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\NP.vec 
-citation network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\Cite.net 
-works X journals network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WJ.net 
-works X keywords network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WK.net 
-works X authors  network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WA.net 
-finished: Sun Dec  4 14:10:46 2011 
-time used:  2 days, 3:30:49.275828 
-</code> 
- 
-===== SN9 second run ===== 
- 
-<code> 
-Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32 
-Type "copyright", "credits" or "license()" for more information. 
->>> import sys; wdir = r'C:\Users\Batagelj\work\Python\WoS'; sys.path.append(wdir) 
->>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python' 
->>> import os; os.chdir(wdir); sys.path.append(MLdir) 
->>> import WoS2Pajek 
-Module Wos2Pajek imported. 
- 
-*** WoS2Pajek 1.1 by V. Batagelj, May 12, 2013 / March 23, 2007 
- 
-WoS2Pajek parameters 
-WoS  dir:  C:\Users\Batagelj\work\Python\WoS 
-ML   dir:  c:\Python25\Lib\site-packages\MontyLingua-2.1\Python 
-Proj dir:  C:/Users/Batagelj/work/Python/WoS/SN9 
-WoS file:  C:/Users/Batagelj/work/Python/WoS/SN9/SN9clean.WoS 
-MaxNum  :  2500000 
-step    :  1000 
-ISI name:  0 
-clean   :  True 
-keywords:  [True, True, True, True] 
- 
-****** MontyLingua v.2.1 ****** 
-***** by hugo@media.mit.edu ***** 
- 
-*** WoS2Pajek 1.1  
-by V. Batagelj, May 12, 2013 / March 23, 2007 
- 
-started: Sun May 12 03:48:56 2013 
- 
-64067 : ODEN_S(1977)48:495  -  2013-05-12 07:27:43.063000 
->>> End of processing of WoS file 
-number of works      =  1625722 
-number of authors    =  416231 
-number of journals   =  63591 
-number of keywords   =  124057 
-number of records    =  64067 
-number of duplicates =  0 
-clean WoS data: clean.WoS 
- 
-*** FILES: 
-year of publication partition: C:/Users/Batagelj/work/Python/WoS/SN9\Year.clu 
-described / cited only partition: C:/Users/Batagelj/work/Python/WoS/SN9\DC.clu 
-number of pages vector: C:/Users/Batagelj/work/Python/WoS/SN9\NP.vec 
-citation network: C:/Users/Batagelj/work/Python/WoS/SN9\Cite.net 
-works X journals network: C:/Users/Batagelj/work/Python/WoS/SN9\WJ.net 
-works X keywords network: C:/Users/Batagelj/work/Python/WoS/SN9\WK.net 
-works X authors  network: C:/Users/Batagelj/work/Python/WoS/SN9\WA.net 
-finished: Sun May 12 07:34:09 2013 
-time used:  3:45:13.804000 
->>> 
-</code> 
- 
-===== Replacing multiple whitespaces with a single space ===== 
- 
-Processing the SN9.WoS some errors appeared in function nameG. Their reason are the multiple spaces in some places in ISI names. 
-<code> 
->>> a = "de  SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42  :  DE" 
->>> b = "FURSTENBERG  Jr Franck F, 1976, UNPLANNED PARENTHOOD  :  FURSTENB" 
->>> c = "TRIBUNE         0502, P9  :  TRIBUNE" 
->>> import re 
->>> p = re.compile(r'\s+') 
->>> p.sub(' ',a) 
-'de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE' 
->>> p.sub(' ',b) 
-'FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB' 
->>> p.sub(' ',c) 
-'TRIBUNE 0502, P9 : TRIBUNE' 
-</code>  
-I replaced in nameG the part 
-<code> 
-  else: 
-    q = s[0].split(" ") 
-    n = q[0][:8] 
-    try: 
-      if len(q) > 1 : n = n+'_'+q[1][0] 
-    except: 
-      print "*** Error ", name, " : ", n 
-</code> 
-with 
-<code> 
-  else: 
-    q = s[0].replace('  ','',1).split(" "); lq = len(q) 
-    n = q[0][:8] 
-    if lq > 1 : 
-      if len(q[1])>0: n = n+'_'+q[1][0] 
-      else: n = n+'_'+q[lq-1] 
-</code> 
- 
-<code> 
->>> a = "de  SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42" 
->>> b = "FURSTENBERG  Jr Franck F, 1976, UNPLANNED PARENTHOOD" 
->>> c = "TRIBUNE         0502, P9" 
->>> wdir = r'C:\Users\Batagelj\work\Python\WoS' 
->>> import os; os.chdir(wdir); import sys; sys.path = [wdir]+sys.path; from names import * 
->>> nameG(a) 
-'DESOUZA_F(2010):42' 
->>> nameG(b) 
-'FURSTENB_F(1976):' 
->>> nameG(c) 
-'TRIBUNE_0502(0):P9'  
-</code> 
  
notes/net/wos2paj.txt · Last modified: 2018/07/10 14:10 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki