Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== Olaf ======
+March 14, 2015 - the Pi day  3/14/15
+===== Problem =====
+<code>
+Subject	[Pajek] 2-mode networks
+From	Olaf Rank
+Sender	pajek-bounces@list.fmf.uni-lj.si
+To	pajek@list.fmf.uni-lj.siAdd contact
+Date	Fri 08:27
+Dear all,
+I am new to this list and my question may be rather generic in nature.
+I have a large 2-mode dataset consisting of approximately 90.000 customers and 1.200 products.
+The ties are defined as valued purchasing relationships (in Euros).
+My first idea was to identify the structural patterns of the network by grouping the network
+in two ways, and my idea was to use blockmodeling:
+(1) Define blocks of products (based on the purchasing ties) and investigate whether customers
+    purchasing one or a combination of blocks of products can be characterized by specific attributes.
+(2) Define blocks of customers (based on their purchasing acts) and investigate whether customers
+    belonging to a block are homogeneous with respect to their characteristics.
+As far as I can see, Pajek is only able to handle blockmodels with up to 1.000 vertices, and I
+could not find the respective option in Pajek XXL.
+My question would be if anyone is able to recommend an alternative analytic approach?
+Any suggestions are highly appreciated!
+Best,
+Olaf
+</code>
+===== Answer =====
+The current versions of blockmodeling procedures are too time consuming for large networks.
+There are alternative methods to identify important subnetworks in two-mode networks:
+  * direct methods for analysis of two-mode networks:
+    * ''Network/2-Mode Network/Core''
+    * ''Network/Create New Network/with Ring Counts stored as Line Values''
+    * ''Network/2-Mode Network/Important Vertices''
+  * indirect methods. We transform a two-mode network into a corresponding one-mode network. The obtained network is then analyzed using standard methods.
+==== An approach ====
+Here I will describe an approach based on ideas from our [[http://link.springer.com/article/10.1007%2Fs11192-012-0940-1|paper]] (see also the [[http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470714522.html|book]]).
+Let C be the set of customers, A the set of articles (products), and the network data represented as the corresponding matrix **B** = [b<sub>ca</sub>], c ∈ C, a ∈ A. We define a new network (matrix)  **P** = **B**<sup>T</sup> * bin(**B**) or
+p<sub>aa'</sub> = ∑ { b<sub>ca</sub><sup>T</sup> . bin(b<sub>ca'</sub>) : c ∈ C }
+where bin(b) = 0 if b = 0, otherwise bin(b) = 1.
+The meaning of p<sub>aa'</sub> is: it is equal to the value spent on article a by customers that bought also article a'; and the value of p<sub>aa</sub> is equal to the total value spent on article a.
+In general  p<sub>aa'</sub> ≠ p<sub>a'a</sub>. To get a symmetric measure and to make articles comparable we normalize these weights and take a geometric average of symmetric values
+s<sub>aa'</sub> = √ ( (p<sub>aa'</sub>/p<sub>aa</sub>).(p<sub>a'a</sub>/p<sub>a'a'</sub>) )
+The measure s<sub>aa'</sub> is a similarity measure. To analyze it we could apply for example the cuts or the islands method.
+Another option is to use the hierarchical clustering method. In this case we have to convert it first to a dissimilarity **D** = [ d<sub>aa'</sub> ]
+d<sub>aa'</sub> = 1 - s<sub>aa'</sub>
+Essentially the same approach could be applied also for partitioning the customers. The problem is that the size of the set C is probably already too large. For very large data sets a special version of leaders method could be the right solution.
+Another option would be to define **P** = **B**<sup>T</sup> * frac(**B**) or
+p<sub>aa'</sub> = ∑ { b<sub>ca</sub><sup>T</sup> . frac(b<sub>ca'</sub>) : c ∈ C }
+where
+frac(b<sub>ca</sub>) = b<sub>ca</sub> / ∑ { b<sub>ca</sub> : c ∈ C }
+==== Example ====
+To illustrate the procedure I created a toy example:
+<code>
+12 13 14 15 16
+            a  b  c  d	e  f
+-----------------------------
+  A.   20  1 50  0 12  4
+  B.   50  5 80  7  0  5
+  C.    0  0 20  0 17  0
+  D.    0  0 30  9  0  3
+  E.    0  0 25  5  6  0
+  F.   30  4 10  0  0  0
+  G.    0  0 85  8  0  7
+  H.    0  2 25  0 13  0
+  I.   25  3 90  6  0  6
+  J.   15  0 35  0  5  4
+</code>
+[[http://zvonka.fmf.uni-lj.si/netbook/lib/exe/fetch.php?media=notes:data:buy.net|buy.net]] or {{notes:data:buy.net|buy.net}}
+<code>
+N1 = File/Network/Read buy.net
+N2 = Network/Create New Network/Transform/Line Values/Set all Values to 1
+N3 = Network/2-Mode Network/Transpose 2-Mode
+select N2 as Second network
+N4 = Networks/Multiply Networks
+</code>
+we get the matrix **P**
+<code>
+   2   3   4   5   6  Label
+-----------------------------------------
+.   140 125 140  75  35 110  a
+.    13  15  15   8   3   9  b
+.   265 255 450 310 155 370  c
+.    13  13  35  35   5  30  d
+.    17  25  53   6  53  17  e
+.    19  15  29  21   8  29  f
+</code>
+we normalize it
+<code>
+N5 = Network/2-Mode Network/2-Mode to 1-Mode/Normalize 1-Mode/Output
+</code>
+<code>
+    2    3    4    5    6  Label
+-----------------------------------------------
+.   1.00 0.89 1.00 0.54 0.25 0.79  a
+.   0.87 1.00 1.00 0.53 0.20 0.60  b
+.   0.59 0.57 1.00 0.69 0.34 0.82  c
+.   0.37 0.37 1.00 1.00 0.14 0.86  d
+.   0.32 0.47 1.00 0.11 1.00 0.32  e
+.   0.66 0.52 1.00 0.72 0.28 1.00  f
+</code>
+we multiply element-wise the symmetric elements
+<code>
+N6 = Network/Create New Network/Transform/Transpose 1-Mode
+select N5 as Second network
+N7 = Networks/Cross-Intersection/Multiply
+N8 = Network/Create New Network/Transform/Line Values/Absolute+Sqrt
+</code>
+We could apply cuts or islands on the similarity network N8.
+<code>
+    2    3    4    5    6  Label
+-----------------------------------------------
+.   1.00 0.88 0.77 0.45 0.28 0.72  a
+.   0.88 1.00 0.75 0.45 0.31 0.56  b
+.   0.77 0.75 1.00 0.83 0.59 0.91  c
+.   0.45 0.45 0.83 1.00 0.13 0.79  d
+.   0.28 0.31 0.59 0.13 1.00 0.30  e
+.   0.72 0.56 0.91 0.79 0.30 1.00  f
+</code>
+Before doing this we should remove the loops (diagonal).
+We continue by computing
+<code>
+N9 = Network/Create New Network/Transform/Line Values/Multiply [-1]
+N10 = Network/Create New Network/Transform/Line Values/Add Constant [1]
+</code>
+the dissimilarity **D**
+<code>
+    2    3    4    5    6  Label
+-----------------------------------------------
+.   0.00 0.12 0.23 0.55 0.72 0.28  a
+.   0.12 0.00 0.25 0.55 0.69 0.44  b
+.   0.23 0.25 0.00 0.17 0.41 0.09  c
+.   0.55 0.55 0.17 0.00 0.87 0.21  d
+.   0.72 0.69 0.41 0.87 0.00 0.70  e
+.   0.28 0.44 0.09 0.21 0.70 0.00  f
+</code>
+and apply the Ward's hierarchical method on it
+<code>
+Cluster/Create Complete Cluster [OK]
+Network/Create Hierarchy/Clustering [Ward][RUN][dendro.eps]
+</code>
+we obtain a clustering represented by a dendrogram
+{{notes:pics:dendro.svg?350x470}}
+About using the clustering in Pajek see [[http://www.cambridge.org/de/academic/subjects/sociology/research-methods-sociology-and-criminology/exploratory-social-network-analysis-pajek-2nd-edition|ESNA2]].