This shows you the differences between two versions of the page.
notes:net:wos2paj [2018/07/10 10:54] vlado |
notes:net:wos2paj [2018/07/10 14:10] (current) vlado |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== WoS2Pajek ====== | ====== WoS2Pajek ====== | ||
- | [[notes:net:wos:sn9]] | + | * [[notes:net:wos:sn9]] |
+ | * {{pajek:data:zip:bm2-citeb.zip|CiteB for BM2}} | ||
+ | * {{pub:pdf:bm2-chapter2.pdf|Chapter 2 from BM2}} | ||
- | ===== Running ===== | ||
- | |||
- | <code> | ||
- | >>> import sys; wdir = r'd:\data\WoS'; sys.path.append(wdir) | ||
- | >>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python' | ||
- | >>> import os; os.chdir(wdir); sys.path.append(MLdir) | ||
- | >>> import WoS2Pajek | ||
- | </code> | ||
- | |||
- | In the Tk window enter the specifications. | ||
- | |||
- | ===== SN9 first run ===== | ||
- | |||
- | <code> | ||
- | Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32 | ||
- | Type "copyright", "credits" or "license()" for more information. | ||
- | >>> import sys; wdir = r'C:\Users\Batagelj\work\Python\WoS'; sys.path.append(wdir) | ||
- | >>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python' | ||
- | >>> sys.path.append(MLdir) | ||
- | >>> import WoS2Pajek | ||
- | Module Wos2Pajek imported. | ||
- | |||
- | *** WoS2Pajek 1.0 by V. Batagelj, October 24, 2011 / March 23, 2007 | ||
- | |||
- | WoS2Pajek parameters | ||
- | WoS dir: C:\Users\Batagelj\work\Python\WoS | ||
- | ML dir: c:\Python25\Lib\site-packages\MontyLingua-2.1\Python | ||
- | Proj dir: C:/Users/Batagelj/work/Python/WoS/SN9 | ||
- | WoS file: C:/Users/Batagelj/work/Python/WoS/SN9/SN9.wos | ||
- | MaxNum : 3800000 | ||
- | step : 1000 | ||
- | ISI name: 0 | ||
- | clean : True | ||
- | keywords: [True, True, True, False] | ||
- | |||
- | *** WoS2Pajek 1.0 | ||
- | by V. Batagelj, October 24, 2011 / March 23, 2007 | ||
- | |||
- | started: Sat May 11 23:48:39 2013 | ||
- | |||
- | >>> -------------------------------------------------------- | ||
- | >>> Data on social networks | ||
- | >>> TS="social network*" OR SO=(Social Networks) | ||
- | >>> May 9th 2013 | ||
- | >>> by Monika Cerinsek and Vladimir Batagelj | ||
- | >>> -------------------------------------------------------- | ||
- | |||
- | |||
- | 69852 : ODEN_S(1977)48:495 - 2013-05-12 00:53:39.851000 | ||
- | |||
- | >>> End of processing of WoS file | ||
- | number of works = 1625956 | ||
- | number of authors = 416266 | ||
- | number of journals = 63611 | ||
- | number of keywords = 46659 | ||
- | number of records = 69852 | ||
- | number of duplicates = 5785 | ||
- | clean WoS data: clean.WoS | ||
- | |||
- | *** FILES: | ||
- | year of publication partition: C:/Users/Batagelj/work/Python/WoS/SN9\Year.clu | ||
- | described / cited only partition: C:/Users/Batagelj/work/Python/WoS/SN9\DC.clu | ||
- | number of pages vector: C:/Users/Batagelj/work/Python/WoS/SN9\NP.vec | ||
- | citation network: C:/Users/Batagelj/work/Python/WoS/SN9\Cite.net | ||
- | works X journals network: C:/Users/Batagelj/work/Python/WoS/SN9\WJ.net | ||
- | works X keywords network: C:/Users/Batagelj/work/Python/WoS/SN9\WK.net | ||
- | works X authors network: C:/Users/Batagelj/work/Python/WoS/SN9\WA.net | ||
- | finished: Sun May 12 00:57:54 2013 | ||
- | time used: 1:09:15.692000 | ||
- | </code> | ||
- | |||
- | There were some errors in the trace. | ||
- | <code> | ||
- | *** Error de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE | ||
- | *** Error de Silva K. M., 2005, HIST SRI LANKA : DE | ||
- | *** Error Van Dijck J, 2007, MEDIATED MEMORIES DI : VAN | ||
- | *** Error Stewart TA, 2000, FORTUNE, V142, P390 : STEWART | ||
- | *** Error FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB | ||
- | ... | ||
- | </code> | ||
- | Similar errors appeared already in the trace for SN8. | ||
- | <code> | ||
- | *** Error REUTERS 0531 : REUTERS | ||
- | *** Error TRIBUNE 0502, P9 : TRIBUNE | ||
- | *** Error VORBOTE 0317, P8 : VORBOTE | ||
- | *** Error RAZON 0518, A8 : RAZON | ||
- | *** Error ECONOMIST 1022, P65 : ECONOMIS | ||
- | </code> | ||
- | The common reason are double spaces in the names. I removed them in an editor. | ||
- | In the next version of WoS2Pajek this will be included in the name routines. | ||
- | |||
- | Current SN9 is smaller than SN8: | ||
- | <code> | ||
- | 147377 : ODEN_S(1977)48:495 - 2011-12-04 14:06:22.571705 | ||
- | >>> End of processing of WoS file | ||
- | number of works = 2357436 | ||
- | number of authors = 527079 | ||
- | number of journals = 88518 | ||
- | number of keywords = 64993 | ||
- | number of records = 147377 | ||
- | number of duplicates = 35574 | ||
- | |||
- | *** FILES: | ||
- | year of publication partition: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\Year.clu | ||
- | described / cited only partition: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\DC.clu | ||
- | number of pages vector: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\NP.vec | ||
- | citation network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\Cite.net | ||
- | works X journals network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WJ.net | ||
- | works X keywords network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WK.net | ||
- | works X authors network: /Users/monikacerinsek/PhD/ARTICLE/SocialNetworks/RawData_new\WA.net | ||
- | finished: Sun Dec 4 14:10:46 2011 | ||
- | time used: 2 days, 3:30:49.275828 | ||
- | </code> | ||
- | |||
- | ===== SN9 second run ===== | ||
- | |||
- | <code> | ||
- | Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32 | ||
- | Type "copyright", "credits" or "license()" for more information. | ||
- | >>> import sys; wdir = r'C:\Users\Batagelj\work\Python\WoS'; sys.path.append(wdir) | ||
- | >>> MLdir = r'c:\Python25\Lib\site-packages\MontyLingua-2.1\Python' | ||
- | >>> import os; os.chdir(wdir); sys.path.append(MLdir) | ||
- | >>> import WoS2Pajek | ||
- | Module Wos2Pajek imported. | ||
- | |||
- | *** WoS2Pajek 1.1 by V. Batagelj, May 12, 2013 / March 23, 2007 | ||
- | |||
- | WoS2Pajek parameters | ||
- | WoS dir: C:\Users\Batagelj\work\Python\WoS | ||
- | ML dir: c:\Python25\Lib\site-packages\MontyLingua-2.1\Python | ||
- | Proj dir: C:/Users/Batagelj/work/Python/WoS/SN9 | ||
- | WoS file: C:/Users/Batagelj/work/Python/WoS/SN9/SN9clean.WoS | ||
- | MaxNum : 2500000 | ||
- | step : 1000 | ||
- | ISI name: 0 | ||
- | clean : True | ||
- | keywords: [True, True, True, True] | ||
- | |||
- | ****** MontyLingua v.2.1 ****** | ||
- | ***** by hugo@media.mit.edu ***** | ||
- | |||
- | *** WoS2Pajek 1.1 | ||
- | by V. Batagelj, May 12, 2013 / March 23, 2007 | ||
- | |||
- | started: Sun May 12 03:48:56 2013 | ||
- | |||
- | 64067 : ODEN_S(1977)48:495 - 2013-05-12 07:27:43.063000 | ||
- | >>> End of processing of WoS file | ||
- | number of works = 1625722 | ||
- | number of authors = 416231 | ||
- | number of journals = 63591 | ||
- | number of keywords = 124057 | ||
- | number of records = 64067 | ||
- | number of duplicates = 0 | ||
- | clean WoS data: clean.WoS | ||
- | |||
- | *** FILES: | ||
- | year of publication partition: C:/Users/Batagelj/work/Python/WoS/SN9\Year.clu | ||
- | described / cited only partition: C:/Users/Batagelj/work/Python/WoS/SN9\DC.clu | ||
- | number of pages vector: C:/Users/Batagelj/work/Python/WoS/SN9\NP.vec | ||
- | citation network: C:/Users/Batagelj/work/Python/WoS/SN9\Cite.net | ||
- | works X journals network: C:/Users/Batagelj/work/Python/WoS/SN9\WJ.net | ||
- | works X keywords network: C:/Users/Batagelj/work/Python/WoS/SN9\WK.net | ||
- | works X authors network: C:/Users/Batagelj/work/Python/WoS/SN9\WA.net | ||
- | finished: Sun May 12 07:34:09 2013 | ||
- | time used: 3:45:13.804000 | ||
- | >>> | ||
- | </code> | ||
- | |||
- | ===== Replacing multiple whitespaces with a single space ===== | ||
- | |||
- | Processing the SN9.WoS some errors appeared in function nameG. Their reason are the multiple spaces in some places in ISI names. | ||
- | <code> | ||
- | >>> a = "de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE" | ||
- | >>> b = "FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB" | ||
- | >>> c = "TRIBUNE 0502, P9 : TRIBUNE" | ||
- | >>> import re | ||
- | >>> p = re.compile(r'\s+') | ||
- | >>> p.sub(' ',a) | ||
- | 'de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42 : DE' | ||
- | >>> p.sub(' ',b) | ||
- | 'FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD : FURSTENB' | ||
- | >>> p.sub(' ',c) | ||
- | 'TRIBUNE 0502, P9 : TRIBUNE' | ||
- | </code> | ||
- | I replaced in nameG the part | ||
- | <code> | ||
- | else: | ||
- | q = s[0].split(" ") | ||
- | n = q[0][:8] | ||
- | try: | ||
- | if len(q) > 1 : n = n+'_'+q[1][0] | ||
- | except: | ||
- | print "*** Error ", name, " : ", n | ||
- | </code> | ||
- | with | ||
- | <code> | ||
- | else: | ||
- | q = s[0].replace(' ','',1).split(" "); lq = len(q) | ||
- | n = q[0][:8] | ||
- | if lq > 1 : | ||
- | if len(q[1])>0: n = n+'_'+q[1][0] | ||
- | else: n = n+'_'+q[lq-1] | ||
- | </code> | ||
- | |||
- | <code> | ||
- | >>> a = "de SOUZA Francoise Jean Oliveira, 2010, THESIS UERJ RIO DE J, P42" | ||
- | >>> b = "FURSTENBERG Jr Franck F, 1976, UNPLANNED PARENTHOOD" | ||
- | >>> c = "TRIBUNE 0502, P9" | ||
- | >>> wdir = r'C:\Users\Batagelj\work\Python\WoS' | ||
- | >>> import os; os.chdir(wdir); import sys; sys.path = [wdir]+sys.path; from names import * | ||
- | >>> nameG(a) | ||
- | 'DESOUZA_F(2010):42' | ||
- | >>> nameG(b) | ||
- | 'FURSTENB_F(1976):' | ||
- | >>> nameG(c) | ||
- | 'TRIBUNE_0502(0):P9' | ||
- | </code> | ||