Daria collected the data for 2007-2018 from WoS and cleaned it.
I merged the SN5 data with the new data and added also terminals (manual descriptions of important works without the CR field) from blockmodeling BM.WoS. It is stored in SN17.zip.
Then I processed it with WoS2Pajek.
*** WoS2Pajek 1.5 by V. Batagelj, February 23, 2017 / March 23, 2007 WoS2Pajek parameters WoS dir: C:\Users\batagelj\work\Python\WoS ML dir: c:\Python27\Lib\site-packages\MontyLingua-2.1\Python Proj dir: C:/Users/batagelj/work/Python/WoS/SocNet/2018 WoS file: C:/Users/batagelj/work/Python/WoS/SocNet/2018/WoS/SN17.WoS MaxNum : 1500000 step : 1000 ISI name: 0 clean : True keywords: [True, True, True, False] titles : True index : True started: Sat Jun 2 15:50:39 2018 >>> Terminals >>> End of processing of WoS file number of works = 1291899 number of authors = 394995 number of journals = 70250 number of keywords = 32303 number of records = 71476 number of duplicates = 1382 finished: Sat Jun 2 16:10:01 2018 time used: 0:19:22.009000
Computing the indegree vector in the citation network cite
and extracting nodes with DC=0 (only cited) we get a frequently cited works with 'missing' descriptions list.
read Cite.net Network/Create vector/centrality/degree/input read DC.clu Operations/Vector+Partition/Extract Subvector [0] Operations/Network+Partition/Extract/Subnetwork induced by union [0] Info button-Vector [+1000][10]
Daria collected additional papers from WoS:
Daria's data included into WoS file. New version of networks produced.
A new list of frequently cited works with 'missing' descriptions was created.
Дарья Мальцева wrote:
Thank you very much for the information. I have some questions about the process, though.
By the link that you have sent the information on BibTeX format is provided. But when we download
data from WoS, we use “other file format option”. Do you mean that for those articles that do not
have descriptions:
1) we should at first collect the data in BibTeX format, and then tranfrom all data set into “other” format,
OR
2) we can use BibTeX format as a source for data, and fill in the fields that are presented after the
“\*\*Terminals” manually using these data?
I collected the BiBTeX sources for other purposes - getting BiBTeX descriptions for references in my documents prepared in LaTeX. But, they can provide missing data also for some entries from our SNA list.
There are different options:
I added some additional items to bib cite resources.
In general, is it the correct structure of the fields?
Books:SK IP - (What is that?) PT - Publication Type (J=Journal; B=Book; S=Series; P=Patent) B AU - Authors AF - Author Full Name PY - Year Published TI - Document Title PI - Publisher City PU - Publisher ER - End of Record> Journals:
SK IP - (What is that?) PT - Publication Type (J=Journal; B=Book; S=Series; P=Patent) J AU - Authors AF - Author Full Name PY - Year Published TI - Document Title SO - Publication Name (= name of the Journal) VL - Volume IS - Issue BP - Beginning Page EP - Ending Page ER - End of Record
In principle it is OK.
SK IP
is a line introduced by WoS2Pajek - replacement for an empty line. You can use empty line instead. I would add, to improve the quality of data, when the title is not enough specific, also keywords in the format
DE kw1; kw2; kw3; kw4; kw5
For converting BibTeX bibliography into WoS format see bib to WoS.
We`ve finished the collection of additional articles descriptions for our SNA data set :) Once again, all the works which are supposed to be included are listed here: Google docs. There are 169 WoS descriptions and 171 RIS description, in total = 340 publications.
I`ve put all the separate files into 2 files with WoS and RIS data sets. They are in the attachment. Then I also converted RIS file into WoS (one more file) and then I checked the data and manually made some changes (one more file). So, 4 files in total.
The changes I`ve done are connected to some mistakes in RIS descriptions. It was quite often when the BOOK was registered as a JOURnal there (as in the example below).
ER - TY - JOUR T1 - Distinction: A social critique of taste A1 - Bourdieu, Pierre JO - Trans. Richard Nice. Cambridge: Harvard UP Y1 - 1984
In such cases I changed the TY of the description, and put PU and PI instead of JO.
There were also cases when in the SO field there was information on Publisher (PU and PI), like below:
PT J TI Family and Social Network: Roles AU Bott, Elizabeth SO Norms, and External Relationships in Ordinary Urban Families, London: Tavistock Publications PY 1957 ER
Such cases were changed into:
PT B TI Family and Social Network: Roles AU Bott, Elizabeth SO Norms, and External Relationships in Ordinary Urban Families PU Tavistock Publications PI London PY 1957 ER
Attached: Complete_RIS_final.txt
, Complete_RIS.WoS
, CompleteRISDataBase.txt
, CompleteWoSDataBase.txt
Subject Re[8]: SNA project
From Дарья Мальцева
Thank you very much for your comments! We are having a summer school on Python this week - each day from 10 to 6, that`s why I could not respond to your letter very fast. But finally I looked trough the bugs identified by you and made some changes in data file. I also looked through the file by myself, found some other problems and fixed them, so now it should be better.
The file is in the attachment. There is also Excel file with my comments to the bugs that you found. We still need to understand what to do with D.Boyd work.
About the different editions of the same work - I thought that we could merge the nodes in the network file; however, then we can have some problems making the Main path analysis. Probably, we could merge the nodes only to count indergree, and then to regard them as different nodes in furter analysis?
Attached: Vlado`s comments.xlsx
, Complete_RIS_final(2).txt
*** WoS2Pajek 1.5 by V. Batagelj, February 23, 2017 / March 23, 2007 WoS2Pajek parameters WoS dir: C:\Users\batagelj\work\Python\WoS ML dir: c:\Python27\Lib\site-packages\MontyLingua-2.1\Python Proj dir: C:\Users\batagelj\work\Python\WoS WoS file: C:/Users/batagelj/work/Python/WoS/SocNet/2018/sept/clean.WoS MaxNum : 1500000 step : 1000 ISI name: 0 clean : True keywords: [True, True, True, False] titles : True index : True ****** MontyLingua v.2.1 ****** ***** by hugo@media.mit.edu ***** Lemmatiser OK! Custom Lexicon Found! Now Loading! Fast Lexicon Found! Now Loading! Lexicon OK! LexicalRuleParser OK! ContextualRuleParser OK! Commonsense OK! Semantic Interpreter OK! Loading Morph Dictionary! ********************************* started: Thu Sep 13 11:47:40 2018 >>> Hits for the query >>> TS="social network*" OR SO=(Social Networks) >>> collected from Wos by Vladimir Batagelj, January 5, 2008 >>> and Daria Maltseva, June 1 2018 >>> On September 13 2018 info about terminals with indeg >= 150 added >>> ----------------------------------------------------------- Common sense violated! Correcting... ... Common sense violated! Correcting... >>> End of hits >>> Terminals from WoS Common sense violated! Correcting... Common sense violated! Correcting... Common sense violated! Correcting... >>> Terminals from RIS 70810 : XIANG_R(2010):981 - 2018-09-13 12:06:58.930000 >>> End of processing of WoS file number of works = 1297260 number of authors = 395973 number of journals = 70425 number of keywords = 32409 number of records = 70810 number of duplicates = 15 clean WoS data : clean.WoS works + titles : titles.csv works index file: vtxIndex.txt *** FILES: year of publication partition: C:\Users\batagelj\work\Python\WoS\Year.clu described / cited only partition: C:\Users\batagelj\work\Python\WoS\DC.clu number of pages vector: C:\Users\batagelj\work\Python\WoS\NP.vec citation network: C:\Users\batagelj\work\Python\WoS\Cite.net works X journals network: C:\Users\batagelj\work\Python\WoS\WJ.net works X keywords network: C:\Users\batagelj\work\Python\WoS\WK.net works X authors network: C:\Users\batagelj\work\Python\WoS\WA.net finished: Thu Sep 13 12:08:04 2018 time used: 0:20:24.819000 ***
Tasks:
Daša's stay in Ljubljana December 1-28, 2019 - Plan
Finish papers on SNA project:
Other papers:
Conferences:
WoS2Pajek: new version
Russian data from WoS: