Social networks from WoS

June 1, 2018

Daria collected the data for 2007-2018 from WoS and cleaned it.

June 2, 2018

I merged the SN5 data with the new data and added also terminals (manual descriptions of important works without the CR field) from blockmodeling BM.WoS. It is stored in SN17.zip.

Then I processed it with WoS2Pajek.

*** WoS2Pajek 1.5 
by V. Batagelj, February 23, 2017 / March 23, 2007

WoS2Pajek parameters
WoS  dir:  C:\Users\batagelj\work\Python\WoS
ML   dir:  c:\Python27\Lib\site-packages\MontyLingua-2.1\Python
Proj dir:  C:/Users/batagelj/work/Python/WoS/SocNet/2018
WoS file:  C:/Users/batagelj/work/Python/WoS/SocNet/2018/WoS/SN17.WoS
MaxNum  :  1500000
step    :  1000
ISI name:  0
clean   :  True
keywords:  [True, True, True, False]
titles  :  True
index   :  True

started: Sat Jun  2 15:50:39 2018

>>>  Terminals
>>> End of processing of WoS file
number of works      =  1291899
number of authors    =  394995
number of journals   =  70250
number of keywords   =  32303
number of records    =  71476
number of duplicates =  1382

finished: Sat Jun  2 16:10:01 2018
time used:  0:19:22.009000

Computing the indegree vector in the citation network cite and extracting nodes with DC=0 (only cited) we get a frequently cited works with 'missing' descriptions list.

read Cite.net
Network/Create vector/centrality/degree/input
read DC.clu
Operations/Vector+Partition/Extract Subvector [0]
Operations/Network+Partition/Extract/Subnetwork induced by union [0]
Info button-Vector [+1000][10]

June 5, 2018

June 6, 2018

Daria collected additional papers from WoS:

  • “social network*” - c 2007 + Journals “Social Networks” - c 2007 = 576
  • all papers from the journals: Network Science, Cambridge UP; Computational Social Networks, Springer; Applied Network Science, Springer; Social Network Analysis and Mining, Springer; Online Social Networks and Media, Elsevier; Journal of Complex Networks, Oxford UP; Journal of Social Structure, CMU; Connections, INSNA. = 431

June 12, 2018

Daria's data included into WoS file. New version of networks produced.



A new list of frequently cited works with 'missing' descriptions was created.

July 27, 2018

Дарья Мальцева wrote:

Thank you very much for the information. I have some questions about the process, though.

By the link that you have sent the information on BibTeX format is provided. But when we download
data from WoS, we use “other file format option”. Do you mean that for those articles that do not
have descriptions:
1) we should at first collect the data in BibTeX format, and then tranfrom all data set into “other” format,
OR
2) we can use BibTeX format as a source for data, and fill in the fields that are presented after the
“\*\*Terminals” manually using these data?

I collected the BiBTeX sources for other purposes - getting BiBTeX descriptions for references in my documents prepared in LaTeX. But, they can provide missing data also for some entries from our SNA list.

There are different options:

  • some sources provide export of reference data in different formats - the closest for our SNA needs is RIS
  • BiBTeX data can be transformed into RIS using different programs. I tested JabRef - it produces some additional empty lines. RIS format can bi converted in WoS using simple text replacement in some text editor - I can also write some lines in R to do this job.
  • BiBTeX data can be transformed into WoS manually - takes a lot of time.

I added some additional items to bib cite resources.

In general, is it the correct structure of the fields?

Books:
  SK IP - (What is that?)
  PT - Publication Type (J=Journal; B=Book; S=Series; P=Patent) B
  AU - Authors
  AF - Author Full Name
  PY - Year Published
  TI - Document Title
  PI - Publisher City
  PU - Publisher
  ER - End of Record

> Journals:

  SK IP - (What is that?)
  PT - Publication Type (J=Journal; B=Book; S=Series; P=Patent) J
  AU - Authors
  AF - Author Full Name
  PY - Year Published
  TI - Document Title
  SO - Publication Name (= name of the Journal)
  VL - Volume
  IS - Issue
  BP - Beginning Page
  EP - Ending Page
  ER - End of Record

In principle it is OK.
SK IP is a line introduced by WoS2Pajek - replacement for an empty line. You can use empty line instead. I would add, to improve the quality of data, when the title is not enough specific, also keywords in the format

DE kw1; kw2; kw3; kw4; kw5

July 28, 2018

For converting BibTeX bibliography into WoS format see bib to WoS.

August 3, 2018

We`ve finished the collection of additional articles descriptions for our SNA data set :) Once again, all the works which are supposed to be included are listed here: Google docs. There are 169 WoS descriptions and 171 RIS description, in total = 340 publications.

I`ve put all the separate files into 2 files with WoS and RIS data sets. They are in the attachment. Then I also converted RIS file into WoS (one more file) and then I checked the data and manually made some changes (one more file). So, 4 files in total.

The changes I`ve done are connected to some mistakes in RIS descriptions. It was quite often when the BOOK was registered as a JOURnal there (as in the example below).

ER - 
TY - JOUR
T1 - Distinction: A social critique of taste
A1 - Bourdieu, Pierre
JO - Trans. Richard Nice. Cambridge: Harvard UP
Y1 - 1984

In such cases I changed the TY of the description, and put PU and PI instead of JO.

There were also cases when in the SO field there was information on Publisher (PU and PI), like below:

PT J
TI Family and Social Network: Roles
AU Bott, Elizabeth
SO Norms, and External Relationships in Ordinary Urban Families, London: Tavistock Publications
PY 1957
ER

Such cases were changed into:

PT B
TI Family and Social Network: Roles
AU Bott, Elizabeth
SO Norms, and External Relationships in Ordinary Urban Families
PU Tavistock Publications
PI London
PY 1957
ER

Attached: Complete_RIS_final.txt, Complete_RIS.WoS, CompleteRISDataBase.txt, CompleteWoSDataBase.txt

August 9, 2018

Subject Re[8]: SNA project
From Дарья Мальцева

Thank you very much for your comments! We are having a summer school on Python this week - each day from 10 to 6, that`s why I could not respond to your letter very fast. But finally I looked trough the bugs identified by you and made some changes in data file. I also looked through the file by myself, found some other problems and fixed them, so now it should be better.

The file is in the attachment. There is also Excel file with my comments to the bugs that you found. We still need to understand what to do with D.Boyd work.

About the different editions of the same work - I thought that we could merge the nodes in the network file; however, then we can have some problems making the Main path analysis. Probably, we could merge the nodes only to count indergree, and then to regard them as different nodes in furter analysis?

Attached: Vlado`s comments.xlsx, Complete_RIS_final(2).txt

September 13, 2018

*** WoS2Pajek 1.5 
by V. Batagelj, February 23, 2017 / March 23, 2007

WoS2Pajek parameters
WoS  dir:  C:\Users\batagelj\work\Python\WoS
ML   dir:  c:\Python27\Lib\site-packages\MontyLingua-2.1\Python
Proj dir:  C:\Users\batagelj\work\Python\WoS
WoS file:  C:/Users/batagelj/work/Python/WoS/SocNet/2018/sept/clean.WoS
MaxNum  :  1500000
step    :  1000
ISI name:  0
clean   :  True
keywords:  [True, True, True, False]
titles  :  True
index   :  True

****** MontyLingua v.2.1 ******
***** by hugo@media.mit.edu *****
Lemmatiser OK!
Custom Lexicon Found! Now Loading!
Fast Lexicon Found! Now Loading!
Lexicon OK!
LexicalRuleParser OK!
ContextualRuleParser OK!
Commonsense OK!
Semantic Interpreter OK!
Loading Morph Dictionary!
*********************************

started: Thu Sep 13 11:47:40 2018

>>>  Hits for the query 
>>>  TS="social network*" OR SO=(Social Networks)
>>>  collected from Wos by Vladimir Batagelj, January 5, 2008
>>>  and Daria Maltseva, June 1 2018
>>>  On September 13 2018 info about terminals with indeg >= 150 added
>>>  -----------------------------------------------------------
Common sense violated! Correcting...
...
Common sense violated! Correcting...
>>>  End of hits
>>>  Terminals from WoS
Common sense violated! Correcting...
Common sense violated! Correcting...
Common sense violated! Correcting...
>>>  Terminals from RIS
70810 : XIANG_R(2010):981  -  2018-09-13 12:06:58.930000
>>> End of processing of WoS file
number of works      =  1297260
number of authors    =  395973
number of journals   =  70425
number of keywords   =  32409
number of records    =  70810
number of duplicates =  15
clean WoS data  : clean.WoS
works + titles  : titles.csv
works index file: vtxIndex.txt

*** FILES:
year of publication partition: C:\Users\batagelj\work\Python\WoS\Year.clu
described / cited only partition: C:\Users\batagelj\work\Python\WoS\DC.clu
number of pages vector: C:\Users\batagelj\work\Python\WoS\NP.vec
citation network: C:\Users\batagelj\work\Python\WoS\Cite.net
works X journals network: C:\Users\batagelj\work\Python\WoS\WJ.net
works X keywords network: C:\Users\batagelj\work\Python\WoS\WK.net
works X authors  network: C:\Users\batagelj\work\Python\WoS\WA.net
finished: Thu Sep 13 12:08:04 2018
time used:  0:20:24.819000
***

September 14, 2018

October 24, 2018

December 5, 2018

January 21, 2019

Tasks:

  1. corrections of the first paper
  2. the second and the third paper based on authors and journals
  3. submit abstracts to Sunbelt 2019 - DONE
  4. Sunbelt workshop will be based on St Petersburg WS, but updated (6 hours)
  5. IFCS, Thessaloniki, (26-29 August) - topic clustering - abstract
  6. other conferences:
    1. EuSN, Zurich (9-12 September) - abstract: Citation analysis
    2. ARS, Salerno/Amalfi (21-25 October)
    3. AS, Ribno (22-25 September)
  7. theoretical paper on bibliographic temporal networks

December 2019

Daša's stay in Ljubljana December 1-28, 2019 - Plan

Finish papers on SNA project:

  • Journals - DONE, sent to Social Networks
  • Keywords
  • Temporal
  • Fractional
  • Authors: Collaboration
  • Authors: Citation and Bibliographic coupling

Other papers:

  • X-metrics –> Proofreading
  • Mixed Methods

Conferences:

  • NetGlow St.Petersburg
  • SUNBELT Paris
  • Sredin Seminar

WoS2Pajek: new version

  • information on country, institution, keywords as phrases
  • Scopus to WoS ?
  • eLibrary to WoS ?

Russian data from WoS:

  • Test downloading
  • Ask Grisha about the code for data collection

To do

  1. data preprocessing: in selected fields replace substrings according to the dictionary patterns (regex ?); for example: convert phrases to “words” by replacing spaces with “nonbreakable” spaces.
  2. convert RIS ↔ Scopus ↔ WoS
  3. entity resolution: authors, journals

Files

Data: WoS / SN17

URLs

ru/dm/sn.txt · Last modified: 2020/07/04 21:11 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki