====== Social networks from WoS ====== ===== June 1, 2018 ===== Daria collected the data for 2007-2018 from WoS and cleaned it. ===== June 2, 2018 ===== I merged the SN5 data with the new data and added also terminals (manual descriptions of important works without the CR field) from blockmodeling BM.WoS. It is stored in [[http://vlado.fmf.uni-lj.si/dl/WoS/SN17.zip|SN17.zip]]. Then I processed it with WoS2Pajek. *** WoS2Pajek 1.5 by V. Batagelj, February 23, 2017 / March 23, 2007 WoS2Pajek parameters WoS dir: C:\Users\batagelj\work\Python\WoS ML dir: c:\Python27\Lib\site-packages\MontyLingua-2.1\Python Proj dir: C:/Users/batagelj/work/Python/WoS/SocNet/2018 WoS file: C:/Users/batagelj/work/Python/WoS/SocNet/2018/WoS/SN17.WoS MaxNum : 1500000 step : 1000 ISI name: 0 clean : True keywords: [True, True, True, False] titles : True index : True started: Sat Jun 2 15:50:39 2018 >>> Terminals >>> End of processing of WoS file number of works = 1291899 number of authors = 394995 number of journals = 70250 number of keywords = 32303 number of records = 71476 number of duplicates = 1382 finished: Sat Jun 2 16:10:01 2018 time used: 0:19:22.009000 Computing the indegree vector in the citation network ''cite'' and extracting nodes with DC=0 (only cited) we get a frequently cited works with 'missing' descriptions [[ru:dm:sn:clean:f1|list]]. read Cite.net Network/Create vector/centrality/degree/input read DC.clu Operations/Vector+Partition/Extract Subvector [0] Operations/Network+Partition/Extract/Subnetwork induced by union [0] Info button-Vector [+1000][10] ===== June 5, 2018 ===== * add to the data set papers from SNA related [[pajek:info:journals|journals]]. * in abstract for [[http://conferences.nib.si/AS2018/|Applied statistics]] include some [[..:dm:sn:th|theoretical background]]. ===== June 6, 2018 ===== Daria collected additional papers from WoS: * "social network*" - c 2007 + Journals "Social Networks" - c 2007 = **576** * all papers from the journals: Network Science, Cambridge UP; Computational Social Networks, Springer; Applied Network Science, Springer; Social Network Analysis and Mining, Springer; Online Social Networks and Media, Elsevier; Journal of Complex Networks, Oxford UP; Journal of Social Structure, CMU; Connections, INSNA. = **431** ===== June 12, 2018 ===== Daria's data included into [[http://vlado.fmf.uni-lj.si/dl/WoS/clean_jun12.zip|WoS file]]. New version of [[http://vlado.fmf.uni-lj.si/dl/WoS/SN17_jun12.zip|networks]] produced. A new [[ru:dm:sn:clean:f1|list]] of frequently cited works with 'missing' descriptions was created. ===== July 27, 2018 ===== Дарья Мальцева wrote: > Thank you very much for the information. I have some questions about the process, though. > > By the link that you have sent the information on BibTeX format is provided. But when we download > data from WoS, we use "other file format option". Do you mean that for those articles that do not > have descriptions: > 1) we should at first collect the data in BibTeX format, and then tranfrom all data set into "other" format, > OR > 2) we can use BibTeX format as a source for data, and fill in the fields that are presented after the > "\*\*Terminals" manually using these data? I collected the BiBTeX sources for other purposes - getting BiBTeX descriptions for references in my documents prepared in LaTeX. But, they can provide missing data also for some entries from our SNA list. There are different options: * some sources provide export of reference data in different formats - the closest for our SNA needs is RIS * BiBTeX data can be transformed into RIS using different programs. I tested JabRef - it produces some additional empty lines. RIS format can bi converted in WoS using simple text replacement in some text editor - I can also write some lines in R to do this job. * BiBTeX data can be transformed into WoS manually - takes a lot of time. I added some additional items to [[http://vladowiki.fmf.uni-lj.si/doku.php?id=pro:bib:citr|bib cite resources]]. > In general, is it the correct structure of the fields? > > Books: SK IP - (What is that?) PT - Publication Type (J=Journal; B=Book; S=Series; P=Patent) B AU - Authors AF - Author Full Name PY - Year Published TI - Document Title PI - Publisher City PU - Publisher ER - End of Record > Journals: SK IP - (What is that?) PT - Publication Type (J=Journal; B=Book; S=Series; P=Patent) J AU - Authors AF - Author Full Name PY - Year Published TI - Document Title SO - Publication Name (= name of the Journal) VL - Volume IS - Issue BP - Beginning Page EP - Ending Page ER - End of Record In principle it is OK.\\ ''SK IP'' is a line introduced by WoS2Pajek - replacement for an empty line. You can use empty line instead. I would add, to improve the quality of data, when the title is not enough specific, also keywords in the format DE kw1; kw2; kw3; kw4; kw5 ===== July 28, 2018 ===== For converting BibTeX bibliography into WoS format see [[https://github.com/bavla/biblio/blob/master/BibTeX2Pajek/|bib to WoS]]. ===== August 3, 2018 ===== We`ve finished the collection of additional articles descriptions for our SNA data set :) Once again, all the works which are supposed to be included are listed here: [[https://docs.google.com/spreadsheets/d/14IEzkR0STg_R1_Gu3SVk8nXmNgR1iz21zc740J1YJ2o/edit?usp=sharing|Google docs]]. There are 169 WoS descriptions and 171 RIS description, in total = 340 publications. I`ve put all the separate files into 2 files with WoS and RIS data sets. They are in the attachment. Then I also converted RIS file into WoS (one more file) and then I checked the data and manually made some changes (one more file). So, 4 files in total. The changes I`ve done are connected to some mistakes in RIS descriptions. It was quite often when the BOOK was registered as a JOURnal there (as in the example below). ER - TY - JOUR T1 - Distinction: A social critique of taste A1 - Bourdieu, Pierre JO - Trans. Richard Nice. Cambridge: Harvard UP Y1 - 1984 In such cases I changed the TY of the description, and put PU and PI instead of JO. There were also cases when in the SO field there was information on Publisher (PU and PI), like below: PT J TI Family and Social Network: Roles AU Bott, Elizabeth SO Norms, and External Relationships in Ordinary Urban Families, London: Tavistock Publications PY 1957 ER Such cases were changed into: PT B TI Family and Social Network: Roles AU Bott, Elizabeth SO Norms, and External Relationships in Ordinary Urban Families PU Tavistock Publications PI London PY 1957 ER Attached: ''Complete_RIS_final.txt'', ''Complete_RIS.WoS'', ''CompleteRISDataBase.txt'', ''CompleteWoSDataBase.txt'' ===== August 9, 2018 ===== Subject Re[8]: SNA project\\ From Дарья Мальцева Thank you very much for your comments! We are having a summer school on Python this week - each day from 10 to 6, that`s why I could not respond to your letter very fast. But finally I looked trough the bugs identified by you and made some changes in data file. I also looked through the file by myself, found some other problems and fixed them, so now it should be better. The file is in the attachment. There is also Excel file with my comments to the bugs that you found. We still need to understand what to do with D.Boyd work. About the different editions of the same work - I thought that we could merge the nodes in the network file; however, then we can have some problems making the Main path analysis. Probably, we could merge the nodes only to count indergree, and then to regard them as different nodes in furter analysis? Attached: ''Vlado`s comments.xlsx'', ''Complete_RIS_final(2).txt'' ===== September 13, 2018 ===== *** WoS2Pajek 1.5 by V. Batagelj, February 23, 2017 / March 23, 2007 WoS2Pajek parameters WoS dir: C:\Users\batagelj\work\Python\WoS ML dir: c:\Python27\Lib\site-packages\MontyLingua-2.1\Python Proj dir: C:\Users\batagelj\work\Python\WoS WoS file: C:/Users/batagelj/work/Python/WoS/SocNet/2018/sept/clean.WoS MaxNum : 1500000 step : 1000 ISI name: 0 clean : True keywords: [True, True, True, False] titles : True index : True ****** MontyLingua v.2.1 ****** ***** by hugo@media.mit.edu ***** Lemmatiser OK! Custom Lexicon Found! Now Loading! Fast Lexicon Found! Now Loading! Lexicon OK! LexicalRuleParser OK! ContextualRuleParser OK! Commonsense OK! Semantic Interpreter OK! Loading Morph Dictionary! ********************************* started: Thu Sep 13 11:47:40 2018 >>> Hits for the query >>> TS="social network*" OR SO=(Social Networks) >>> collected from Wos by Vladimir Batagelj, January 5, 2008 >>> and Daria Maltseva, June 1 2018 >>> On September 13 2018 info about terminals with indeg >= 150 added >>> ----------------------------------------------------------- Common sense violated! Correcting... ... Common sense violated! Correcting... >>> End of hits >>> Terminals from WoS Common sense violated! Correcting... Common sense violated! Correcting... Common sense violated! Correcting... >>> Terminals from RIS 70810 : XIANG_R(2010):981 - 2018-09-13 12:06:58.930000 >>> End of processing of WoS file number of works = 1297260 number of authors = 395973 number of journals = 70425 number of keywords = 32409 number of records = 70810 number of duplicates = 15 clean WoS data : clean.WoS works + titles : titles.csv works index file: vtxIndex.txt *** FILES: year of publication partition: C:\Users\batagelj\work\Python\WoS\Year.clu described / cited only partition: C:\Users\batagelj\work\Python\WoS\DC.clu number of pages vector: C:\Users\batagelj\work\Python\WoS\NP.vec citation network: C:\Users\batagelj\work\Python\WoS\Cite.net works X journals network: C:\Users\batagelj\work\Python\WoS\WJ.net works X keywords network: C:\Users\batagelj\work\Python\WoS\WK.net works X authors network: C:\Users\batagelj\work\Python\WoS\WA.net finished: Thu Sep 13 12:08:04 2018 time used: 0:20:24.819000 *** ===== September 14, 2018 ===== http://vlado.fmf.uni-lj.si/dl/WoS/SN17_sep.zip http://vlado.fmf.uni-lj.si/dl/WoS/SN17new.zip ===== October 24, 2018 ===== [[.:sn:jeq|Equivalent journal names]] [[.:sn:ner|Entity resolution]] ===== December 5, 2018 ===== [[http://vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=pub:pdf:son_2018_500_original_v0.pdf|son_2018_500]] ===== January 21, 2019 ===== Tasks: - corrections of the first paper - the second and the third paper based on authors and journals - submit abstracts to [[https://www.insna.org/|Sunbelt 2019]] - DONE - Sunbelt workshop will be based on St Petersburg WS, but updated (6 hours) - [[http://www.ifcs2019.gr/|IFCS]], Thessaloniki, (26-29 August) - topic clustering - abstract - other conferences: - [[https://www.eusn2019.ethz.ch/|EuSN]], Zurich (9-12 September) - abstract: Citation analysis - [[|ARS]], Salerno/Amalfi (21-25 October) - [[http://conferences.nib.si/AS2019/default.htm|AS]], Ribno (22-25 September) - theoretical paper on bibliographic temporal networks - [[twitter]] ===== December 2019 ===== Daša's stay in Ljubljana December 1-28, 2019 - Plan Finish papers on SNA project: * Journals - DONE, sent to Social Networks * Keywords * Temporal * Fractional * Authors: Collaboration * Authors: Citation and Bibliographic coupling Other papers: * X-metrics --> Proofreading * Mixed Methods Conferences: * NetGlow St.Petersburg * SUNBELT Paris * Sredin Seminar WoS2Pajek: new version * information on country, institution, keywords as phrases * Scopus to WoS ? * eLibrary to WoS ? Russian data from WoS: * Test downloading * Ask Grisha about the code for data collection ===== To do ===== - data preprocessing: in selected fields replace substrings according to the dictionary patterns (regex ?); for example: convert phrases to "words" by replacing spaces with "nonbreakable" spaces. - convert RIS <-> Scopus <-> WoS - entity resolution: authors, journals ===== Files ===== Data: [[dl:WoS:SN17|WoS / SN17]] * [[http://vlado.fmf.uni-lj.si/dl/WoS/SN17-sep.zip|SN17-sep]] 14. Sep 2018 (362M) {{dl:pics:SN17-sep.png?700}} * [[http://vlado.fmf.uni-lj.si/dl/WoS/SN17new.zip|SN17new]] 14. Sep 2018 (99M) {{dl:pics:SN17new.png?700}} ===== URLs ===== * [[https://www.researchgate.net/publication/277434768_A_Bibliometric_Analysis_of_30_Years_of_Research_and_Theory_on_Corporate_Social_Responsibility_and_Corporate_Social_Performance|A_Bibliometric_Analysis_of_30_Years...]] * [[https://www.nap.edu/download/25119|Understanding Narratives for National Security]], Proceedings of a Workshop * [[http://revistacomsoc.pt/index.php/comsoc/article/view/2914/2819|Potentialities and limitations of network analysis methodologies]] * [[https://www.researchgate.net/publication/327412613_Evolution_of_the_Scientific_Literature_on_Input-Output_Analysis_A_Bibliometric_Analysis_of_1990-2017/references|Evolution of the Scientific Literature on Input-Output Analysis]] * [[https://www.researchgate.net/publication/220066075_A_bibliometric_investigation_of_research_performance_in_emerging_nanobiopharmaceuticals|A bibliometric investigation of research performance in emerging nanobiopharmaceuticals]] * [[https://link.springer.com/article/10.1007/s11192-018-2917-1|An integrated approach to path analysis for weighted citation networks]] * Olesia Iefremova, Kamil Wais, Marcin Kozak: Biographical articles in scientific literature: analysis of articles indexed in Web of Science. [[https://link.springer.com/article/10.1007/s11192-018-2923-3|Scientometrics]], pp 1–25. [[https://genderize.io/|genderize.io]], [[https://cran.r-project.org/web/packages/genderizeR/vignettes/tutorial.html|genderizeR]] * Matthew Hutchinson, Katy Borner: [[https://www.researchgate.net/publication/322700802_Web_of_Science_as_a_Research_Dataset|Web of Science as a Research Dataset]], ISSI 2017 * Xiaohui Yao, Jingwen Yan, Michael Ginda, Li Shen: [[https://www.researchgate.net/publication/320820413_Mapping_longitudinal_scientific_progress_collaboration_and_impact_of_the_Alzheimer's_disease_neuroimaging_initiative|Mapping longitudinal scientific progress, collaboration and impact of the Alzheimer’s disease neuroimaging initiative]], November 2017, PLoS ONE 12(11):e0186095 * Xiaohui Yao, Jingwen Yan, Michael Ginda, Li Shen: [[https://www.researchgate.net/publication/321177407_S1_Dataset|S1 Dataset]], November 2017 * http://www.cwts.nl/pdf/CWTS_bibliometrics.pdf * [[https://www.nature.com/polopoly_fs/1.17351!/menu/main/topColumns/topLeftColumn/pdf/520429a.pdf|The Leiden Manifesto]] * [[books]] * Sebastian Böll: A Scientometric Method to Analyze Scientific Journals as Exemplified by the Area of Information Science. Master thesis, Saarland University (Germany) [[http://eprints.rclis.org/3949/1/Boell%2C_Sebastian_K-2007-Master_Thesis-body.pdf|PDF]]