====== Olaf ======

March 14, 2015 - the Pi day  3/14/15  

===== Problem =====

<code>
Subject	[Pajek] 2-mode networks
From	Olaf Rank
Sender	pajek-bounces@list.fmf.uni-lj.si
To	pajek@list.fmf.uni-lj.siAdd contact
Date	Fri 08:27
Dear all,

I am new to this list and my question may be rather generic in nature.

I have a large 2-mode dataset consisting of approximately 90.000 customers and 1.200 products. 
The ties are defined as valued purchasing relationships (in Euros).

My first idea was to identify the structural patterns of the network by grouping the network 
in two ways, and my idea was to use blockmodeling:

(1) Define blocks of products (based on the purchasing ties) and investigate whether customers
    purchasing one or a combination of blocks of products can be characterized by specific attributes.
(2) Define blocks of customers (based on their purchasing acts) and investigate whether customers
    belonging to a block are homogeneous with respect to their characteristics.

As far as I can see, Pajek is only able to handle blockmodels with up to 1.000 vertices, and I
could not find the respective option in Pajek XXL.

My question would be if anyone is able to recommend an alternative analytic approach?

Any suggestions are highly appreciated!

Best,
Olaf
</code>

===== Answer =====

The current versions of blockmodeling procedures are too time consuming for large networks.

There are alternative methods to identify important subnetworks in two-mode networks:

  * direct methods for analysis of two-mode networks:
    * ''Network/2-Mode Network/Core''
    * ''Network/Create New Network/with Ring Counts stored as Line Values''
    * ''Network/2-Mode Network/Important Vertices''
  * indirect methods. We transform a two-mode network into a corresponding one-mode network. The obtained network is then analyzed using standard methods.

==== An approach ====


Here I will describe an approach based on ideas from our [[http://link.springer.com/article/10.1007%2Fs11192-012-0940-1|paper]] (see also the [[http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470714522.html|book]]).

Let C be the set of customers, A the set of articles (products), and the network data represented as the corresponding matrix **B** = [b<sub>ca</sub>], c ∈ C, a ∈ A. We define a new network (matrix)  **P** = **B**<sup>T</sup> * bin(**B**) or

p<sub>aa'</sub> = ∑ { b<sub>ca</sub><sup>T</sup> . bin(b<sub>ca'</sub>) : c ∈ C } 

where bin(b) = 0 if b = 0, otherwise bin(b) = 1.

The meaning of p<sub>aa'</sub> is: it is equal to the value spent on article a by customers that bought also article a'; and the value of p<sub>aa</sub> is equal to the total value spent on article a.

In general  p<sub>aa'</sub> ≠ p<sub>a'a</sub>. To get a symmetric measure and to make articles comparable we normalize these weights and take a geometric average of symmetric values

s<sub>aa'</sub> = √ ( (p<sub>aa'</sub>/p<sub>aa</sub>).(p<sub>a'a</sub>/p<sub>a'a'</sub>) )

The measure s<sub>aa'</sub> is a similarity measure. To analyze it we could apply for example the cuts or the islands method.

Another option is to use the hierarchical clustering method. In this case we have to convert it first to a dissimilarity **D** = [ d<sub>aa'</sub> ]

d<sub>aa'</sub> = 1 - s<sub>aa'</sub>

Essentially the same approach could be applied also for partitioning the customers. The problem is that the size of the set C is probably already too large. For very large data sets a special version of leaders method could be the right solution.

Another option would be to define **P** = **B**<sup>T</sup> * frac(**B**) or

p<sub>aa'</sub> = ∑ { b<sub>ca</sub><sup>T</sup> . frac(b<sub>ca'</sub>) : c ∈ C } 

where

frac(b<sub>ca</sub>) = b<sub>ca</sub> / ∑ { b<sub>ca</sub> : c ∈ C }


==== Example ====


To illustrate the procedure I created a toy example:

<code>
           11 12 13 14 15 16
            a  b  c  d	e  f
-----------------------------
   1  A.   20  1 50  0 12  4
   2  B.   50  5 80  7  0  5
   3  C.    0  0 20  0 17  0
   4  D.    0  0 30  9  0  3
   5  E.    0  0 25  5  6  0
   6  F.   30  4 10  0  0  0
   7  G.    0  0 85  8  0  7
   8  H.    0  2 25  0 13  0
   9  I.   25  3 90  6  0  6
  10  J.   15  0 35  0  5  4
</code>
[[http://zvonka.fmf.uni-lj.si/netbook/lib/exe/fetch.php?media=notes:data:buy.net|buy.net]] or {{notes:data:buy.net|buy.net}}

<code>
N1 = File/Network/Read buy.net
N2 = Network/Create New Network/Transform/Line Values/Set all Values to 1
N3 = Network/2-Mode Network/Transpose 2-Mode
select N2 as Second network
N4 = Networks/Multiply Networks
</code>
we get the matrix **P**
<code>
          1   2   3   4   5   6  Label
-----------------------------------------
   1.   140 125 140  75  35 110  a
   2.    13  15  15   8   3   9  b
   3.   265 255 450 310 155 370  c
   4.    13  13  35  35   5  30  d
   5.    17  25  53   6  53  17  e
   6.    19  15  29  21   8  29  f
</code>
we normalize it
<code>
N5 = Network/2-Mode Network/2-Mode to 1-Mode/Normalize 1-Mode/Output
</code>
<code>
           1    2    3    4    5    6  Label
-----------------------------------------------
   1.   1.00 0.89 1.00 0.54 0.25 0.79  a
   2.   0.87 1.00 1.00 0.53 0.20 0.60  b
   3.   0.59 0.57 1.00 0.69 0.34 0.82  c
   4.   0.37 0.37 1.00 1.00 0.14 0.86  d
   5.   0.32 0.47 1.00 0.11 1.00 0.32  e
   6.   0.66 0.52 1.00 0.72 0.28 1.00  f

</code>
we multiply element-wise the symmetric elements
<code>
N6 = Network/Create New Network/Transform/Transpose 1-Mode
select N5 as Second network
N7 = Networks/Cross-Intersection/Multiply
N8 = Network/Create New Network/Transform/Line Values/Absolute+Sqrt
</code>
We could apply cuts or islands on the similarity network N8. 
<code>
           1    2    3    4    5    6  Label
-----------------------------------------------
   1.   1.00 0.88 0.77 0.45 0.28 0.72  a
   2.   0.88 1.00 0.75 0.45 0.31 0.56  b
   3.   0.77 0.75 1.00 0.83 0.59 0.91  c
   4.   0.45 0.45 0.83 1.00 0.13 0.79  d
   5.   0.28 0.31 0.59 0.13 1.00 0.30  e
   6.   0.72 0.56 0.91 0.79 0.30 1.00  f
</code>
Before doing this we should remove the loops (diagonal). 

We continue by computing 
<code>
N9 = Network/Create New Network/Transform/Line Values/Multiply [-1]
N10 = Network/Create New Network/Transform/Line Values/Add Constant [1]
</code>
the dissimilarity **D** 
<code>
           1    2    3    4    5    6  Label
-----------------------------------------------
   1.   0.00 0.12 0.23 0.55 0.72 0.28  a
   2.   0.12 0.00 0.25 0.55 0.69 0.44  b
   3.   0.23 0.25 0.00 0.17 0.41 0.09  c
   4.   0.55 0.55 0.17 0.00 0.87 0.21  d
   5.   0.72 0.69 0.41 0.87 0.00 0.70  e
   6.   0.28 0.44 0.09 0.21 0.70 0.00  f
</code>
and apply the Ward's hierarchical method on it
<code>
Cluster/Create Complete Cluster [OK]
Network/Create Hierarchy/Clustering [Ward][RUN][dendro.eps]
</code>
we obtain a clustering represented by a dendrogram

{{notes:pics:dendro.svg?350x470}}

About using the clustering in Pajek see [[http://www.cambridge.org/de/academic/subjects/sociology/research-methods-sociology-and-criminology/exploratory-social-network-analysis-pajek-2nd-edition|ESNA2]].