====== Important bibliographic subnetworks ====== March 2022 Construction of important bibliographic subnetworks from a collection of bibliographic networks obtained from WoS using [[https://github.com/bavla/biblio/blob/master/WoS2Pajek/WoS2Pajek14.pdf|WoS2Pajek]]. ===== SocNet 2018 networks ===== For illustration, we shall use the collection on network analysis (2018) [[https://link.springer.com/article/10.1007/s11192-019-03193-x|paper]], [[https://github.com/bavla/SocNet/tree/master/nets|files]]. ^ Network ^ # Nodes (sum) ^ # Mode 1 ^ # Mode 2 ^ # Arcs ^ | CiteN | 1,297,133 | | | 2,753,633 | | CiteR | 70,792 | | | 398,199 | | WAn | 1,693,104 | 1,297,133 | 395,971 | 1,442,240 | | WAr | 163,803 | 70,792 | 93,011 | 215,901 | | WKn | 1,329,542 | 1,297,133 | 32,409 | 1,167,666 | | WKr | 103,201 | 70,792 | 32,409 | 1,167,666 | | WJn | 1,366,279 | 1,297,133 | 69,146 | 720,044 | | WJr | 79,735 | 70,792 | 8943 | 61,741 | ===== Co - First Co-authorship network ===== **Co** = **WAr**T * **WAr** Co[a,b] = # works that authors a and b co-authored Co[a,a] = # works of author a Problem: **Co** is a sum of complete subgraphs spanned on co-authors of each work. Works with large number of co-authors blur the picture. read WAr info 2-Mode Network: Rows=70792, Cols=93011 Network/Create vector/centrality/degree/input Info vector [+200] chinese Network/2-mode/transpose select WAr as second Networks/multiply [yes] Network/crate new/tranform/remove/loops [yes] Network/crate new/tranform/arcs->edges/bidirected/min [no] File/Network/change label [Co] Network/Info/Line values Line Values Frequency Freq% CumFreq CumFreq% ---------------------------------------------------------------------------- ( ... 1.0000] 251440 86.1772 251440 86.1772 ( 1.0000 ... 2.0000] 27072 9.2785 278512 95.4557 ( 2.0000 ... 3.0000] 7252 2.4855 285764 97.9412 ( 3.0000 ... 4.0000] 2895 0.9922 288659 98.9334 ( 4.0000 ... 5.0000] 1350 0.4627 290009 99.3961 ( 5.0000 ... 6.0000] 720 0.2468 290729 99.6429 ( 6.0000 ... 7.0000] 329 0.1128 291058 99.7556 ( 7.0000 ... 8.0000] 225 0.0771 291283 99.8327 ( 8.0000 ... 9.0000] 143 0.0490 291426 99.8818 ( 9.0000 ... 10.0000] 86 0.0295 291512 99.9112 ( 10.0000 ... 11.0000] 58 0.0199 291570 99.9311 ( 11.0000 ... 12.0000] 57 0.0195 291627 99.9506 ( 12.0000 ... 13.0000] 32 0.0110 291659 99.9616 ( 13.0000 ... 14.0000] 23 0.0079 291682 99.9695 ( 14.0000 ... 15.0000] 17 0.0058 291699 99.9753 ( 15.0000 ... 16.0000] 9 0.0031 291708 99.9784 ( 16.0000 ... 17.0000] 9 0.0031 291717 99.9815 ( 17.0000 ... 18.0000] 5 0.0017 291722 99.9832 ( 18.0000 ... 19.0000] 8 0.0027 291730 99.9859 ( 19.0000 ... 20.0000] 4 0.0014 291734 99.9873 ( 20.0000 ... 21.0000] 1 0.0003 291735 99.9877 ( 21.0000 ... 22.0000] 13 0.0045 291748 99.9921 ( 22.0000 ... 23.0000] 4 0.0014 291752 99.9935 ( 23.0000 ... 24.0000] 2 0.0007 291754 99.9942 ( 24.0000 ... 25.0000] 3 0.0010 291757 99.9952 ( 25.0000 ... 26.0000] 1 0.0003 291758 99.9955 ( 26.0000 ... 27.0000] 3 0.0010 291761 99.9966 ( 27.0000 ... 28.0000] 1 0.0003 291762 99.9969 ( 28.0000 ... 29.0000] 1 0.0003 291763 99.9973 ( 29.0000 ... 30.0000] 1 0.0003 291764 99.9976 ( 30.0000 ... 31.0000] 3 0.0010 291767 99.9986 ( 31.0000 ... 32.0000] 1 0.0003 291768 99.9990 ( 32.0000 ... 33.0000] 1 0.0003 291769 99.9993 ( 33.0000 ... 34.0000] 0 0.0000 291769 99.9993 ( 34.0000 ... 35.0000] 0 0.0000 291769 99.9993 ( 35.0000 ... 36.0000] 0 0.0000 291769 99.9993 ( 36.0000 ... 37.0000] 0 0.0000 291769 99.9993 ( 37.0000 ... 38.0000] 1 0.0003 291770 99.9997 ( 38.0000 ... 39.0000] 0 0.0000 291770 99.9997 ( 39.0000 ... 40.0000] 0 0.0000 291770 99.9997 ( 40.0000 ... 41.0000] 0 0.0000 291770 99.9997 ( 41.0000 ... 42.0000] 0 0.0000 291770 99.9997 ( 42.0000 ... 43.0000] 1 0.0003 291771 100.0000 ---------------------------------------------------------------------------- The most active co-authors. % edge cut at 10 Network/crate new/tranform/remove/lines with value/lower than [10][yes] Network/create partition/degree/all File/partition/change label [cut 10] Operations/network+partition/extract/subnetwork [1-*] Draw/network+first partition Mostly small components. To get more structure we must lower the threshold. Select Co Network/create new/tranform/remove/lines with value/lower than [5][yes] Network/create partition/components/weak [5] Partition/canonical/decreasing Info partition Operations/network+partition/extract/subnetwork [1-*] Draw/network+first partition Large (661 nodes) component of mainly Chinese authors. We exclude them. Select canonical partition Partition/binarize [2-*] Select partition cut 10 as second Partitions max Select Co Operations/network+partition/extract/subnetwork [1-*] Network/create partition/components/weak [1] Draw/network+first partition Large component with the main authors. Another option is to make a list (partition) of "interesting" authors (for example SNA) and extract the corresponding subnetwork from **Co**. ===== Ct' - Newman's strict co-authorship ===== The works with many co-authors are overrepresented (contribute more to the total weight) in the network **Co**. To make contributions equal we use a [[https://link.springer.com/article/10.1007/s11192-020-03383-y|fractional approach]] - normalization. Normalization of (2-mode) binary network **N**, n[w,a] ∈ {0,1} [[https://github.com/bavla/biblio/tree/master/Pajek/macro|macros]] n(**N**)[w,a] = n[w,a]/max(1,outdeg(w)) n'(**N**)[w,a] = n[w,a]/max(1,outdeg(w)-1) Ct' = n(**WAr**)T * n'(**WAr**) Select WAr Macro/play/ norm2p [70792] -> n'(WAr) Select WAr Run macro norm2 [70792] Network/2-mode/transpose Select n'(WAr) as second Networks/Multiply [yes] Network/crate new/tranform/remove/loops [yes] Network/crate new/tranform/arcs->edges/bidirected/sum [no] File/Network/change label [Ct'] Network/Info/Line values [#25] Line Values Frequency Freq% CumFreq CumFreq% --------------------------------------------------------------------------- ( ... 0.0001] 7861 2.6942 7861 2.6942 ( 0.0001 ... 1.0348] 278858 95.5743 286719 98.2685 ( 1.0348 ... 2.0696] 3943 1.3514 290662 99.6199 ( 2.0696 ... 3.1043] 751 0.2574 291413 99.8773 ( 3.1043 ... 4.1390] 216 0.0740 291629 99.9513 ( 4.1390 ... 5.1737] 58 0.0199 291687 99.9712 ( 5.1737 ... 6.2084] 44 0.0151 291731 99.9863 ( 6.2084 ... 7.2432] 23 0.0079 291754 99.9942 ( 7.2432 ... 8.2779] 7 0.0024 291761 99.9966 ( 8.2779 ... 9.3126] 3 0.0010 291764 99.9976 ( 9.3126 ... 10.3473] 3 0.0010 291767 99.9986 ( 10.3473 ... 11.3820] 1 0.0003 291768 99.9990 ( 11.3820 ... 12.4167] 0 0.0000 291768 99.9990 ( 12.4167 ... 13.4515] 0 0.0000 291768 99.9990 ( 13.4515 ... 14.4862] 0 0.0000 291768 99.9990 ( 14.4862 ... 15.5209] 1 0.0003 291769 99.9993 ( 15.5209 ... 16.5556] 0 0.0000 291769 99.9993 ( 16.5556 ... 17.5903] 1 0.0003 291770 99.9997 ( 17.5903 ... 18.6250] 0 0.0000 291770 99.9997 ( 18.6250 ... 19.6597] 0 0.0000 291770 99.9997 ( 19.6597 ... 20.6945] 0 0.0000 291770 99.9997 ( 20.6945 ... 21.7292] 0 0.0000 291770 99.9997 ( 21.7292 ... 22.7639] 0 0.0000 291770 99.9997 ( 22.7639 ... 23.7986] 0 0.0000 291770 99.9997 ( 23.7986 ... 24.8333] 1 0.0003 291771 100.0000 --------------------------------------------------------------------------- The threshold is now much lower. Ct' edge cut 4 Ct' edge cut 2 cut 2 components >= 5 union Ct' extract union Partition weak Draw Partition for extraction of selected components. select union Partition binarize [1-*] Operations/network+partition/transform/remove lines/between clusters Operations/network+partition/transform/remove lines/between two clusters [0][0] Network/create partition/components/weak [2] Partition/canonical/decreasing !!! for extracting selected components Info partition Select Ct' Operations/network+partition/extract/subnetwork [1] Draw Partition for extraction of the main component. Select canonical Partition binarize [1] Info save partition to Main.clu ===== Jaccard weights using Pajek ===== April 6, 2022 select Co (with loops) Network/Create vector/Get loops Operations/Network+vector/Transform/Vector -> Line value/initial File/Network/Change label [Co[e,e]] select Co (with loops) Operations/Network+vector/Transform/Vector -> Line value/terminal File/Network/Change label [Co[f,f]] Select Co[e,e] as Second Networks/Cross intersection/add select Co (with loops) as Second Networks/Cross intersection/subtract Select Last as Second select Co (with loops) as First Networks/Cross intersection/divide Network/Create new network/Transform/Remove/loops Network/Create new network/Transform/Arcs -> edges/bidirected/min File/Network/Change label [Jaccard] ===== Temporal co-authorship networks ===== [[https://www.sciencedirect.com/science/article/pii/S1751157719301439|paper]], [[https://github.com/bavla/SocNet/wiki/SetUp|SetUp]] gdir = 'C:/Users/vlado/work/Python/graph/Nets' wdir = 'C:/Users/vlado/work/Python/WoS/SocNet/2022' ndir = 'C:/Users/vlado/work/Python/WoS/SocNet/2022' cdir = 'C:/Users/vlado/work/Python/graph/Nets/chart' import sys, os, datetime, json sys.path = [gdir]+sys.path; os.chdir(wdir) from TQ import * from Nets import Network as N net = ndir+"/WAr.net" clu = ndir+"/YearR.clu" t1 = datetime.datetime.now(); print("started: ",t1.ctime(),"\n") WAc = N.twoMode2netsJSON(clu,net,"WAcum.json",instant=False) t2 = datetime.datetime.now() print("\nconverted to cumulative TN: ",t2.ctime(),"\ntime used: ", t2-t1) WAi = N.twoMode2netsJSON(clu,net,"WAins.json",instant=True) t3 = datetime.datetime.now() print("\nconverted to instantaneous TN: ",t3.ctime(),"\ntime used: ", t3-t2) ia = WAi.Index() Co = WAi.TQtwo2oneCols() Co.saveNetsJSON("CoIns.json",indent=2) Co.delLoops() C = Co.TQtopLinks(thresh=15) len(C) C[0] C[1] C[2] tit = C[0][2]+" - "+C[0][3]; bd = C[0][5] TQmax = 15; Tmin = 2000; Tmax = 2017; w = 600; h = 150 N.TQshow(bd,cdir,TQmax,Tmin,Tmax,w,h,tit,fill="red") tit = C[2][2]+" - "+C[2][3]; ra = C[2][5] TQmax = 10; Tmin = 1996; Tmax = 2017; w = 600; h = 150 N.TQshow(ra,cdir,TQmax,Tmin,Tmax,w,h,tit,fill="blue") TQ.total(bd), TQ.total(ra) We can compute also fractional temporal co-authorship networks [[https://github.com/bavla/SocNet/wiki/WAt#using-python-for-graphs|bavla]]. ===== To do ===== **AKr** = **WAr**T * **WKr** [[https://en.wikipedia.org/wiki/Tf%E2%80%93idf|tf-idf]] weights for Main : All. Select 200 most important keywords as Keys. Extract from **AKr** the subnetwork on Main X Keys. **AJr** = **WAr**T * **WJr** Instead of tf-idf weights we can use the fractional approach. Add to Nets the procedure that converts a temporal network into a sequence of temporal slices (in Pajek format).