====== Olaf ====== March 14, 2015 - the Pi day 3/14/15 ===== Problem ===== Subject [Pajek] 2-mode networks From Olaf Rank Sender pajek-bounces@list.fmf.uni-lj.si To pajek@list.fmf.uni-lj.siAdd contact Date Fri 08:27 Dear all, I am new to this list and my question may be rather generic in nature. I have a large 2-mode dataset consisting of approximately 90.000 customers and 1.200 products. The ties are defined as valued purchasing relationships (in Euros). My first idea was to identify the structural patterns of the network by grouping the network in two ways, and my idea was to use blockmodeling: (1) Define blocks of products (based on the purchasing ties) and investigate whether customers purchasing one or a combination of blocks of products can be characterized by specific attributes. (2) Define blocks of customers (based on their purchasing acts) and investigate whether customers belonging to a block are homogeneous with respect to their characteristics. As far as I can see, Pajek is only able to handle blockmodels with up to 1.000 vertices, and I could not find the respective option in Pajek XXL. My question would be if anyone is able to recommend an alternative analytic approach? Any suggestions are highly appreciated! Best, Olaf ===== Answer ===== The current versions of blockmodeling procedures are too time consuming for large networks. There are alternative methods to identify important subnetworks in two-mode networks: * direct methods for analysis of two-mode networks: * ''Network/2-Mode Network/Core'' * ''Network/Create New Network/with Ring Counts stored as Line Values'' * ''Network/2-Mode Network/Important Vertices'' * indirect methods. We transform a two-mode network into a corresponding one-mode network. The obtained network is then analyzed using standard methods. ==== An approach ==== Here I will describe an approach based on ideas from our [[http://link.springer.com/article/10.1007%2Fs11192-012-0940-1|paper]] (see also the [[http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470714522.html|book]]). Let C be the set of customers, A the set of articles (products), and the network data represented as the corresponding matrix **B** = [bca], c ∈ C, a ∈ A. We define a new network (matrix) **P** = **B**T * bin(**B**) or paa' = ∑ { bcaT . bin(bca') : c ∈ C } where bin(b) = 0 if b = 0, otherwise bin(b) = 1. The meaning of paa' is: it is equal to the value spent on article a by customers that bought also article a'; and the value of paa is equal to the total value spent on article a. In general paa' ≠ pa'a. To get a symmetric measure and to make articles comparable we normalize these weights and take a geometric average of symmetric values saa' = √ ( (paa'/paa).(pa'a/pa'a') ) The measure saa' is a similarity measure. To analyze it we could apply for example the cuts or the islands method. Another option is to use the hierarchical clustering method. In this case we have to convert it first to a dissimilarity **D** = [ daa' ] daa' = 1 - saa' Essentially the same approach could be applied also for partitioning the customers. The problem is that the size of the set C is probably already too large. For very large data sets a special version of leaders method could be the right solution. Another option would be to define **P** = **B**T * frac(**B**) or paa' = ∑ { bcaT . frac(bca') : c ∈ C } where frac(bca) = bca / ∑ { bca : c ∈ C } ==== Example ==== To illustrate the procedure I created a toy example: 11 12 13 14 15 16 a b c d e f ----------------------------- 1 A. 20 1 50 0 12 4 2 B. 50 5 80 7 0 5 3 C. 0 0 20 0 17 0 4 D. 0 0 30 9 0 3 5 E. 0 0 25 5 6 0 6 F. 30 4 10 0 0 0 7 G. 0 0 85 8 0 7 8 H. 0 2 25 0 13 0 9 I. 25 3 90 6 0 6 10 J. 15 0 35 0 5 4 [[http://zvonka.fmf.uni-lj.si/netbook/lib/exe/fetch.php?media=notes:data:buy.net|buy.net]] or {{notes:data:buy.net|buy.net}} N1 = File/Network/Read buy.net N2 = Network/Create New Network/Transform/Line Values/Set all Values to 1 N3 = Network/2-Mode Network/Transpose 2-Mode select N2 as Second network N4 = Networks/Multiply Networks we get the matrix **P** 1 2 3 4 5 6 Label ----------------------------------------- 1. 140 125 140 75 35 110 a 2. 13 15 15 8 3 9 b 3. 265 255 450 310 155 370 c 4. 13 13 35 35 5 30 d 5. 17 25 53 6 53 17 e 6. 19 15 29 21 8 29 f we normalize it N5 = Network/2-Mode Network/2-Mode to 1-Mode/Normalize 1-Mode/Output 1 2 3 4 5 6 Label ----------------------------------------------- 1. 1.00 0.89 1.00 0.54 0.25 0.79 a 2. 0.87 1.00 1.00 0.53 0.20 0.60 b 3. 0.59 0.57 1.00 0.69 0.34 0.82 c 4. 0.37 0.37 1.00 1.00 0.14 0.86 d 5. 0.32 0.47 1.00 0.11 1.00 0.32 e 6. 0.66 0.52 1.00 0.72 0.28 1.00 f we multiply element-wise the symmetric elements N6 = Network/Create New Network/Transform/Transpose 1-Mode select N5 as Second network N7 = Networks/Cross-Intersection/Multiply N8 = Network/Create New Network/Transform/Line Values/Absolute+Sqrt We could apply cuts or islands on the similarity network N8. 1 2 3 4 5 6 Label ----------------------------------------------- 1. 1.00 0.88 0.77 0.45 0.28 0.72 a 2. 0.88 1.00 0.75 0.45 0.31 0.56 b 3. 0.77 0.75 1.00 0.83 0.59 0.91 c 4. 0.45 0.45 0.83 1.00 0.13 0.79 d 5. 0.28 0.31 0.59 0.13 1.00 0.30 e 6. 0.72 0.56 0.91 0.79 0.30 1.00 f Before doing this we should remove the loops (diagonal). We continue by computing N9 = Network/Create New Network/Transform/Line Values/Multiply [-1] N10 = Network/Create New Network/Transform/Line Values/Add Constant [1] the dissimilarity **D** 1 2 3 4 5 6 Label ----------------------------------------------- 1. 0.00 0.12 0.23 0.55 0.72 0.28 a 2. 0.12 0.00 0.25 0.55 0.69 0.44 b 3. 0.23 0.25 0.00 0.17 0.41 0.09 c 4. 0.55 0.55 0.17 0.00 0.87 0.21 d 5. 0.72 0.69 0.41 0.87 0.00 0.70 e 6. 0.28 0.44 0.09 0.21 0.70 0.00 f and apply the Ward's hierarchical method on it Cluster/Create Complete Cluster [OK] Network/Create Hierarchy/Clustering [Ward][RUN][dendro.eps] we obtain a clustering represented by a dendrogram {{notes:pics:dendro.svg?350x470}} About using the clustering in Pajek see [[http://www.cambridge.org/de/academic/subjects/sociology/research-methods-sociology-and-criminology/exploratory-social-network-analysis-pajek-2nd-edition|ESNA2]].