====== Social networks from WoS ======
===== June 1, 2018 =====
Daria collected the data for 2007-2018 from WoS and cleaned it.
===== June 2, 2018 =====
I merged the SN5 data with the new data and added also terminals (manual descriptions of important works without the CR field) from blockmodeling BM.WoS. It is stored in [[http://vlado.fmf.uni-lj.si/dl/WoS/SN17.zip|SN17.zip]].
Then I processed it with WoS2Pajek.
*** WoS2Pajek 1.5
by V. Batagelj, February 23, 2017 / March 23, 2007
WoS2Pajek parameters
WoS dir: C:\Users\batagelj\work\Python\WoS
ML dir: c:\Python27\Lib\site-packages\MontyLingua-2.1\Python
Proj dir: C:/Users/batagelj/work/Python/WoS/SocNet/2018
WoS file: C:/Users/batagelj/work/Python/WoS/SocNet/2018/WoS/SN17.WoS
MaxNum : 1500000
step : 1000
ISI name: 0
clean : True
keywords: [True, True, True, False]
titles : True
index : True
started: Sat Jun 2 15:50:39 2018
>>> Terminals
>>> End of processing of WoS file
number of works = 1291899
number of authors = 394995
number of journals = 70250
number of keywords = 32303
number of records = 71476
number of duplicates = 1382
finished: Sat Jun 2 16:10:01 2018
time used: 0:19:22.009000
Computing the indegree vector in the citation network ''cite'' and extracting nodes with DC=0 (only cited) we get a frequently cited works with 'missing' descriptions [[ru:dm:sn:clean:f1|list]].
read Cite.net
Network/Create vector/centrality/degree/input
read DC.clu
Operations/Vector+Partition/Extract Subvector [0]
Operations/Network+Partition/Extract/Subnetwork induced by union [0]
Info button-Vector [+1000][10]
===== June 5, 2018 =====
* add to the data set papers from SNA related [[pajek:info:journals|journals]].
* in abstract for [[http://conferences.nib.si/AS2018/|Applied statistics]] include some [[..:dm:sn:th|theoretical background]].
===== June 6, 2018 =====
Daria collected additional papers from WoS:
* "social network*" - c 2007 + Journals "Social Networks" - c 2007 = **576**
* all papers from the journals: Network Science, Cambridge UP; Computational Social Networks, Springer; Applied Network Science, Springer; Social Network Analysis and Mining, Springer; Online Social Networks and Media, Elsevier; Journal of Complex Networks, Oxford UP; Journal of Social Structure, CMU; Connections, INSNA. = **431**
===== June 12, 2018 =====
Daria's data included into [[http://vlado.fmf.uni-lj.si/dl/WoS/clean_jun12.zip|WoS file]]. New version of [[http://vlado.fmf.uni-lj.si/dl/WoS/SN17_jun12.zip|networks]] produced.
A new [[ru:dm:sn:clean:f1|list]] of frequently cited works with 'missing' descriptions was created.
===== July 27, 2018 =====
Дарья Мальцева wrote:
> Thank you very much for the information. I have some questions about the process, though.
>
> By the link that you have sent the information on BibTeX format is provided. But when we download
> data from WoS, we use "other file format option". Do you mean that for those articles that do not
> have descriptions:
> 1) we should at first collect the data in BibTeX format, and then tranfrom all data set into "other" format,
> OR
> 2) we can use BibTeX format as a source for data, and fill in the fields that are presented after the
> "\*\*Terminals" manually using these data?
I collected the BiBTeX sources for other purposes - getting BiBTeX descriptions for references in my documents prepared in LaTeX. But, they can provide missing data also for some entries from our SNA list.
There are different options:
* some sources provide export of reference data in different formats - the closest for our SNA needs is RIS
* BiBTeX data can be transformed into RIS using different programs. I tested JabRef - it produces some additional empty lines. RIS format can bi converted in WoS using simple text replacement in some text editor - I can also write some lines in R to do this job.
* BiBTeX data can be transformed into WoS manually - takes a lot of time.
I added some additional items to [[http://vladowiki.fmf.uni-lj.si/doku.php?id=pro:bib:citr|bib cite resources]].
> In general, is it the correct structure of the fields?
>
> Books:
SK IP - (What is that?)
PT - Publication Type (J=Journal; B=Book; S=Series; P=Patent) B
AU - Authors
AF - Author Full Name
PY - Year Published
TI - Document Title
PI - Publisher City
PU - Publisher
ER - End of Record
> Journals:
SK IP - (What is that?)
PT - Publication Type (J=Journal; B=Book; S=Series; P=Patent) J
AU - Authors
AF - Author Full Name
PY - Year Published
TI - Document Title
SO - Publication Name (= name of the Journal)
VL - Volume
IS - Issue
BP - Beginning Page
EP - Ending Page
ER - End of Record
In principle it is OK.\\
''SK IP'' is a line introduced by WoS2Pajek - replacement for an empty line. You can use empty line instead. I would add, to improve the quality of data, when the title is not enough specific, also keywords in the format
DE kw1; kw2; kw3; kw4; kw5
===== July 28, 2018 =====
For converting BibTeX bibliography into WoS format see [[https://github.com/bavla/biblio/blob/master/BibTeX2Pajek/|bib to WoS]].
===== August 3, 2018 =====
We`ve finished the collection of additional articles descriptions for our SNA data set :)
Once again, all the works which are supposed to be included are listed here:
[[https://docs.google.com/spreadsheets/d/14IEzkR0STg_R1_Gu3SVk8nXmNgR1iz21zc740J1YJ2o/edit?usp=sharing|Google docs]].
There are 169 WoS descriptions and 171 RIS description, in total = 340 publications.
I`ve put all the separate files into 2 files with WoS and RIS data sets. They are in the attachment. Then I also converted RIS file into WoS (one more file) and then I checked the data and manually made some changes (one more file). So, 4 files in total.
The changes I`ve done are connected to some mistakes in RIS descriptions. It was quite often when the BOOK was registered as a JOURnal there (as in the example below).
ER -
TY - JOUR
T1 - Distinction: A social critique of taste
A1 - Bourdieu, Pierre
JO - Trans. Richard Nice. Cambridge: Harvard UP
Y1 - 1984
In such cases I changed the TY of the description, and put PU and PI instead of JO.
There were also cases when in the SO field there was information on Publisher (PU and PI), like below:
PT J
TI Family and Social Network: Roles
AU Bott, Elizabeth
SO Norms, and External Relationships in Ordinary Urban Families, London: Tavistock Publications
PY 1957
ER
Such cases were changed into:
PT B
TI Family and Social Network: Roles
AU Bott, Elizabeth
SO Norms, and External Relationships in Ordinary Urban Families
PU Tavistock Publications
PI London
PY 1957
ER
Attached: ''Complete_RIS_final.txt'', ''Complete_RIS.WoS'', ''CompleteRISDataBase.txt'', ''CompleteWoSDataBase.txt''
===== August 9, 2018 =====
Subject Re[8]: SNA project\\
From Дарья Мальцева
Thank you very much for your comments! We are having a summer school on Python this week - each day from 10 to 6, that`s why I could not respond to your letter very fast. But finally I looked trough the bugs identified by you and made some changes in data file. I also looked through the file by myself, found some other problems and fixed them, so now it should be better.
The file is in the attachment. There is also Excel file with my comments to the bugs that you found. We still need to understand what to do with D.Boyd work.
About the different editions of the same work - I thought that we could merge the nodes in the network file; however, then we can have some problems making the Main path analysis. Probably, we could merge the nodes only to count indergree, and then to regard them as different nodes in furter analysis?
Attached: ''Vlado`s comments.xlsx'', ''Complete_RIS_final(2).txt''
===== September 13, 2018 =====
*** WoS2Pajek 1.5
by V. Batagelj, February 23, 2017 / March 23, 2007
WoS2Pajek parameters
WoS dir: C:\Users\batagelj\work\Python\WoS
ML dir: c:\Python27\Lib\site-packages\MontyLingua-2.1\Python
Proj dir: C:\Users\batagelj\work\Python\WoS
WoS file: C:/Users/batagelj/work/Python/WoS/SocNet/2018/sept/clean.WoS
MaxNum : 1500000
step : 1000
ISI name: 0
clean : True
keywords: [True, True, True, False]
titles : True
index : True
****** MontyLingua v.2.1 ******
***** by hugo@media.mit.edu *****
Lemmatiser OK!
Custom Lexicon Found! Now Loading!
Fast Lexicon Found! Now Loading!
Lexicon OK!
LexicalRuleParser OK!
ContextualRuleParser OK!
Commonsense OK!
Semantic Interpreter OK!
Loading Morph Dictionary!
*********************************
started: Thu Sep 13 11:47:40 2018
>>> Hits for the query
>>> TS="social network*" OR SO=(Social Networks)
>>> collected from Wos by Vladimir Batagelj, January 5, 2008
>>> and Daria Maltseva, June 1 2018
>>> On September 13 2018 info about terminals with indeg >= 150 added
>>> -----------------------------------------------------------
Common sense violated! Correcting...
...
Common sense violated! Correcting...
>>> End of hits
>>> Terminals from WoS
Common sense violated! Correcting...
Common sense violated! Correcting...
Common sense violated! Correcting...
>>> Terminals from RIS
70810 : XIANG_R(2010):981 - 2018-09-13 12:06:58.930000
>>> End of processing of WoS file
number of works = 1297260
number of authors = 395973
number of journals = 70425
number of keywords = 32409
number of records = 70810
number of duplicates = 15
clean WoS data : clean.WoS
works + titles : titles.csv
works index file: vtxIndex.txt
*** FILES:
year of publication partition: C:\Users\batagelj\work\Python\WoS\Year.clu
described / cited only partition: C:\Users\batagelj\work\Python\WoS\DC.clu
number of pages vector: C:\Users\batagelj\work\Python\WoS\NP.vec
citation network: C:\Users\batagelj\work\Python\WoS\Cite.net
works X journals network: C:\Users\batagelj\work\Python\WoS\WJ.net
works X keywords network: C:\Users\batagelj\work\Python\WoS\WK.net
works X authors network: C:\Users\batagelj\work\Python\WoS\WA.net
finished: Thu Sep 13 12:08:04 2018
time used: 0:20:24.819000
***
===== September 14, 2018 =====
http://vlado.fmf.uni-lj.si/dl/WoS/SN17_sep.zip http://vlado.fmf.uni-lj.si/dl/WoS/SN17new.zip
===== October 24, 2018 =====
[[.:sn:jeq|Equivalent journal names]]
[[.:sn:ner|Entity resolution]]
===== December 5, 2018 =====
[[http://vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=pub:pdf:son_2018_500_original_v0.pdf|son_2018_500]]
===== January 21, 2019 =====
Tasks:
- corrections of the first paper
- the second and the third paper based on authors and journals
- submit abstracts to [[https://www.insna.org/|Sunbelt 2019]] - DONE
- Sunbelt workshop will be based on St Petersburg WS, but updated (6 hours)
- [[http://www.ifcs2019.gr/|IFCS]], Thessaloniki, (26-29 August) - topic clustering - abstract
- other conferences:
- [[https://www.eusn2019.ethz.ch/|EuSN]], Zurich (9-12 September) - abstract: Citation analysis
- [[|ARS]], Salerno/Amalfi (21-25 October)
- [[http://conferences.nib.si/AS2019/default.htm|AS]], Ribno (22-25 September)
- theoretical paper on bibliographic temporal networks
- [[twitter]]
===== December 2019 =====
Daša's stay in Ljubljana December 1-28, 2019 - Plan
Finish papers on SNA project:
* Journals - DONE, sent to Social Networks
* Keywords
* Temporal
* Fractional
* Authors: Collaboration
* Authors: Citation and Bibliographic coupling
Other papers:
* X-metrics --> Proofreading
* Mixed Methods
Conferences:
* NetGlow St.Petersburg
* SUNBELT Paris
* Sredin Seminar
WoS2Pajek: new version
* information on country, institution, keywords as phrases
* Scopus to WoS ?
* eLibrary to WoS ?
Russian data from WoS:
* Test downloading
* Ask Grisha about the code for data collection
===== To do =====
- data preprocessing: in selected fields replace substrings according to the dictionary patterns (regex ?); for example: convert phrases to "words" by replacing spaces with "nonbreakable" spaces.
- convert RIS <-> Scopus <-> WoS
- entity resolution: authors, journals
===== Files =====
Data: [[dl:WoS:SN17|WoS / SN17]]
* [[http://vlado.fmf.uni-lj.si/dl/WoS/SN17-sep.zip|SN17-sep]] 14. Sep 2018 (362M)
{{dl:pics:SN17-sep.png?700}}
* [[http://vlado.fmf.uni-lj.si/dl/WoS/SN17new.zip|SN17new]] 14. Sep 2018 (99M)
{{dl:pics:SN17new.png?700}}
===== URLs =====
* [[https://www.researchgate.net/publication/277434768_A_Bibliometric_Analysis_of_30_Years_of_Research_and_Theory_on_Corporate_Social_Responsibility_and_Corporate_Social_Performance|A_Bibliometric_Analysis_of_30_Years...]]
* [[https://www.nap.edu/download/25119|Understanding Narratives for National Security]], Proceedings of a Workshop
* [[http://revistacomsoc.pt/index.php/comsoc/article/view/2914/2819|Potentialities and limitations of network analysis methodologies]]
* [[https://www.researchgate.net/publication/327412613_Evolution_of_the_Scientific_Literature_on_Input-Output_Analysis_A_Bibliometric_Analysis_of_1990-2017/references|Evolution of the Scientific Literature on Input-Output Analysis]]
* [[https://www.researchgate.net/publication/220066075_A_bibliometric_investigation_of_research_performance_in_emerging_nanobiopharmaceuticals|A bibliometric investigation of research performance in emerging nanobiopharmaceuticals]]
* [[https://link.springer.com/article/10.1007/s11192-018-2917-1|An integrated approach to path analysis for weighted citation networks]]
* Olesia Iefremova, Kamil Wais, Marcin Kozak: Biographical articles in scientific literature: analysis of articles indexed in Web of Science. [[https://link.springer.com/article/10.1007/s11192-018-2923-3|Scientometrics]], pp 1–25. [[https://genderize.io/|genderize.io]], [[https://cran.r-project.org/web/packages/genderizeR/vignettes/tutorial.html|genderizeR]]
* Matthew Hutchinson, Katy Borner: [[https://www.researchgate.net/publication/322700802_Web_of_Science_as_a_Research_Dataset|Web of Science as a Research Dataset]], ISSI 2017
* Xiaohui Yao, Jingwen Yan, Michael Ginda, Li Shen: [[https://www.researchgate.net/publication/320820413_Mapping_longitudinal_scientific_progress_collaboration_and_impact_of_the_Alzheimer's_disease_neuroimaging_initiative|Mapping longitudinal scientific progress, collaboration and impact of the Alzheimer’s disease neuroimaging initiative]], November 2017, PLoS ONE 12(11):e0186095
* Xiaohui Yao, Jingwen Yan, Michael Ginda, Li Shen: [[https://www.researchgate.net/publication/321177407_S1_Dataset|S1 Dataset]], November 2017
* http://www.cwts.nl/pdf/CWTS_bibliometrics.pdf
* [[https://www.nature.com/polopoly_fs/1.17351!/menu/main/topColumns/topLeftColumn/pdf/520429a.pdf|The Leiden Manifesto]]
* [[books]]
* Sebastian Böll: A Scientometric Method to Analyze Scientific Journals as Exemplified by the Area of Information Science. Master thesis, Saarland University (Germany) [[http://eprints.rclis.org/3949/1/Boell%2C_Sebastian_K-2007-Master_Thesis-body.pdf|PDF]]