Differences

This shows you the differences between two versions of the page.

Link to this comparison view

notes:net:olaf [2015/07/16 21:29] (current)
vlado created
Line 1: Line 1:
 +====== Olaf ======
 +
 +March 14, 2015 - the Pi day  3/14/15  
 +
 +===== Problem =====
 +
 +<code>
 +Subject [Pajek] 2-mode networks
 +From Olaf Rank
 +Sender pajek-bounces@list.fmf.uni-lj.si
 +To pajek@list.fmf.uni-lj.siAdd contact
 +Date Fri 08:27
 +Dear all,
 +
 +I am new to this list and my question may be rather generic in nature.
 +
 +I have a large 2-mode dataset consisting of approximately 90.000 customers and 1.200 products. 
 +The ties are defined as valued purchasing relationships (in Euros).
 +
 +My first idea was to identify the structural patterns of the network by grouping the network 
 +in two ways, and my idea was to use blockmodeling:
 +
 +(1) Define blocks of products (based on the purchasing ties) and investigate whether customers
 +    purchasing one or a combination of blocks of products can be characterized by specific attributes.
 +(2) Define blocks of customers (based on their purchasing acts) and investigate whether customers
 +    belonging to a block are homogeneous with respect to their characteristics.
 +
 +As far as I can see, Pajek is only able to handle blockmodels with up to 1.000 vertices, and I
 +could not find the respective option in Pajek XXL.
 +
 +My question would be if anyone is able to recommend an alternative analytic approach?
 +
 +Any suggestions are highly appreciated!
 +
 +Best,
 +Olaf
 +</code>
 +
 +===== Answer =====
 +
 +The current versions of blockmodeling procedures are too time consuming for large networks.
 +
 +There are alternative methods to identify important subnetworks in two-mode networks:
 +
 +  * direct methods for analysis of two-mode networks:
 +    * ''Network/2-Mode Network/Core''
 +    * ''Network/Create New Network/with Ring Counts stored as Line Values''
 +    * ''Network/2-Mode Network/Important Vertices''
 +  * indirect methods. We transform a two-mode network into a corresponding one-mode network. The obtained network is then analyzed using standard methods.
 +
 +==== An approach ====
 +
 +
 +Here I will describe an approach based on ideas from our [[http://link.springer.com/article/10.1007%2Fs11192-012-0940-1|paper]] (see also the [[http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470714522.html|book]]).
 +
 +Let C be the set of customers, A the set of articles (products), and the network data represented as the corresponding matrix **B** = [b<sub>ca</sub>], c ∈ C, a ∈ A. We define a new network (matrix)  **P** = **B**<sup>T</sup> * bin(**B**) or
 +
 +p<sub>aa'</sub> = ∑ { b<sub>ca</sub><sup>T</sup> . bin(b<sub>ca'</sub>) : c ∈ C } 
 +
 +where bin(b) = 0 if b = 0, otherwise bin(b) = 1.
 +
 +The meaning of p<sub>aa'</sub> is: it is equal to the value spent on article a by customers that bought also article a'; and the value of p<sub>aa</sub> is equal to the total value spent on article a.
 +
 +In general  p<sub>aa'</sub> ≠ p<sub>a'a</sub>. To get a symmetric measure and to make articles comparable we normalize these weights and take a geometric average of symmetric values
 +
 +s<sub>aa'</sub> = √ ( (p<sub>aa'</sub>/p<sub>aa</sub>).(p<sub>a'a</sub>/p<sub>a'a'</sub>) )
 +
 +The measure s<sub>aa'</sub> is a similarity measure. To analyze it we could apply for example the cuts or the islands method.
 +
 +Another option is to use the hierarchical clustering method. In this case we have to convert it first to a dissimilarity **D** = [ d<sub>aa'</sub> ]
 +
 +d<sub>aa'</sub> = 1 - s<sub>aa'</sub>
 +
 +Essentially the same approach could be applied also for partitioning the customers. The problem is that the size of the set C is probably already too large. For very large data sets a special version of leaders method could be the right solution.
 +
 +Another option would be to define **P** = **B**<sup>T</sup> * frac(**B**) or
 +
 +p<sub>aa'</sub> = ∑ { b<sub>ca</sub><sup>T</sup> . frac(b<sub>ca'</sub>) : c ∈ C } 
 +
 +where
 +
 +frac(b<sub>ca</sub>) = b<sub>ca</sub> / ∑ { b<sub>ca</sub> : c ∈ C }
 +
 +
 +==== Example ====
 +
 +
 +To illustrate the procedure I created a toy example:
 +
 +<code>
 +           11 12 13 14 15 16
 +            a  b  c  d e  f
 +-----------------------------
 +   1  A.   20  1 50  0 12  4
 +   2  B.   50  5 80  7  0  5
 +   3  C.    0  0 20  0 17  0
 +   4  D.    0  0 30  9  0  3
 +   5  E.    0  0 25  5  6  0
 +   6  F.   30  4 10  0  0  0
 +   7  G.    0  0 85  8  0  7
 +   8  H.    0  2 25  0 13  0
 +   9  I.   25  3 90  6  0  6
 +  10  J.   15  0 35  0  5  4
 +</code>
 +[[http://zvonka.fmf.uni-lj.si/netbook/lib/exe/fetch.php?media=notes:data:buy.net|buy.net]] or {{notes:data:buy.net|buy.net}}
 +
 +<code>
 +N1 = File/Network/Read buy.net
 +N2 = Network/Create New Network/Transform/Line Values/Set all Values to 1
 +N3 = Network/2-Mode Network/Transpose 2-Mode
 +select N2 as Second network
 +N4 = Networks/Multiply Networks
 +</code>
 +we get the matrix **P**
 +<code>
 +          1   2   3   4   5   6  Label
 +-----------------------------------------
 +   1.   140 125 140  75  35 110  a
 +   2.    13  15  15   8   3   9  b
 +   3.   265 255 450 310 155 370  c
 +   4.    13  13  35  35   5  30  d
 +   5.    17  25  53   6  53  17  e
 +   6.    19  15  29  21   8  29  f
 +</code>
 +we normalize it
 +<code>
 +N5 = Network/2-Mode Network/2-Mode to 1-Mode/Normalize 1-Mode/Output
 +</code>
 +<code>
 +           1    2    3    4    5    6  Label
 +-----------------------------------------------
 +   1.   1.00 0.89 1.00 0.54 0.25 0.79  a
 +   2.   0.87 1.00 1.00 0.53 0.20 0.60  b
 +   3.   0.59 0.57 1.00 0.69 0.34 0.82  c
 +   4.   0.37 0.37 1.00 1.00 0.14 0.86  d
 +   5.   0.32 0.47 1.00 0.11 1.00 0.32  e
 +   6.   0.66 0.52 1.00 0.72 0.28 1.00  f
 +
 +</code>
 +we multiply element-wise the symmetric elements
 +<code>
 +N6 = Network/Create New Network/Transform/Transpose 1-Mode
 +select N5 as Second network
 +N7 = Networks/Cross-Intersection/Multiply
 +N8 = Network/Create New Network/Transform/Line Values/Absolute+Sqrt
 +</code>
 +We could apply cuts or islands on the similarity network N8. 
 +<code>
 +           1    2    3    4    5    6  Label
 +-----------------------------------------------
 +   1.   1.00 0.88 0.77 0.45 0.28 0.72  a
 +   2.   0.88 1.00 0.75 0.45 0.31 0.56  b
 +   3.   0.77 0.75 1.00 0.83 0.59 0.91  c
 +   4.   0.45 0.45 0.83 1.00 0.13 0.79  d
 +   5.   0.28 0.31 0.59 0.13 1.00 0.30  e
 +   6.   0.72 0.56 0.91 0.79 0.30 1.00  f
 +</code>
 +Before doing this we should remove the loops (diagonal). 
 +
 +We continue by computing 
 +<code>
 +N9 = Network/Create New Network/Transform/Line Values/Multiply [-1]
 +N10 = Network/Create New Network/Transform/Line Values/Add Constant [1]
 +</code>
 +the dissimilarity **D** 
 +<code>
 +           1    2    3    4    5    6  Label
 +-----------------------------------------------
 +   1.   0.00 0.12 0.23 0.55 0.72 0.28  a
 +   2.   0.12 0.00 0.25 0.55 0.69 0.44  b
 +   3.   0.23 0.25 0.00 0.17 0.41 0.09  c
 +   4.   0.55 0.55 0.17 0.00 0.87 0.21  d
 +   5.   0.72 0.69 0.41 0.87 0.00 0.70  e
 +   6.   0.28 0.44 0.09 0.21 0.70 0.00  f
 +</code>
 +and apply the Ward's hierarchical method on it
 +<code>
 +Cluster/Create Complete Cluster [OK]
 +Network/Create Hierarchy/Clustering [Ward][RUN][dendro.eps]
 +</code>
 +we obtain a clustering represented by a dendrogram
 +
 +{{notes:pics:dendro.svg?350x470}}
 +
 +About using the clustering in Pajek see [[http://www.cambridge.org/de/academic/subjects/sociology/research-methods-sociology-and-criminology/exploratory-social-network-analysis-pajek-2nd-edition|ESNA2]].
  
notes/net/olaf.txt · Last modified: 2015/07/16 21:29 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki