Table of Contents

Olaf

March 14, 2015 - the Pi day 3/14/15

Problem

Subject	[Pajek] 2-mode networks
From	Olaf Rank
Sender	pajek-bounces@list.fmf.uni-lj.si
To	pajek@list.fmf.uni-lj.siAdd contact
Date	Fri 08:27
Dear all,

I am new to this list and my question may be rather generic in nature.

I have a large 2-mode dataset consisting of approximately 90.000 customers and 1.200 products. 
The ties are defined as valued purchasing relationships (in Euros).

My first idea was to identify the structural patterns of the network by grouping the network 
in two ways, and my idea was to use blockmodeling:

(1) Define blocks of products (based on the purchasing ties) and investigate whether customers
    purchasing one or a combination of blocks of products can be characterized by specific attributes.
(2) Define blocks of customers (based on their purchasing acts) and investigate whether customers
    belonging to a block are homogeneous with respect to their characteristics.

As far as I can see, Pajek is only able to handle blockmodels with up to 1.000 vertices, and I
could not find the respective option in Pajek XXL.

My question would be if anyone is able to recommend an alternative analytic approach?

Any suggestions are highly appreciated!

Best,
Olaf

Answer

The current versions of blockmodeling procedures are too time consuming for large networks.

There are alternative methods to identify important subnetworks in two-mode networks:

  • direct methods for analysis of two-mode networks:
    • Network/2-Mode Network/Core
    • Network/Create New Network/with Ring Counts stored as Line Values
    • Network/2-Mode Network/Important Vertices
  • indirect methods. We transform a two-mode network into a corresponding one-mode network. The obtained network is then analyzed using standard methods.

An approach

Here I will describe an approach based on ideas from our paper (see also the book).

Let C be the set of customers, A the set of articles (products), and the network data represented as the corresponding matrix B = [bca], c ∈ C, a ∈ A. We define a new network (matrix) P = BT * bin(B) or

paa' = ∑ { bcaT . bin(bca') : c ∈ C }

where bin(b) = 0 if b = 0, otherwise bin(b) = 1.

The meaning of paa' is: it is equal to the value spent on article a by customers that bought also article a'; and the value of paa is equal to the total value spent on article a.

In general paa' ≠ pa'a. To get a symmetric measure and to make articles comparable we normalize these weights and take a geometric average of symmetric values

saa' = √ ( (paa'/paa).(pa'a/pa'a') )

The measure saa' is a similarity measure. To analyze it we could apply for example the cuts or the islands method.

Another option is to use the hierarchical clustering method. In this case we have to convert it first to a dissimilarity D = [ daa' ]

daa' = 1 - saa'

Essentially the same approach could be applied also for partitioning the customers. The problem is that the size of the set C is probably already too large. For very large data sets a special version of leaders method could be the right solution.

Another option would be to define P = BT * frac(B) or

paa' = ∑ { bcaT . frac(bca') : c ∈ C }

where

frac(bca) = bca / ∑ { bca : c ∈ C }

Example

To illustrate the procedure I created a toy example:

           11 12 13 14 15 16
            a  b  c  d	e  f
-----------------------------
   1  A.   20  1 50  0 12  4
   2  B.   50  5 80  7  0  5
   3  C.    0  0 20  0 17  0
   4  D.    0  0 30  9  0  3
   5  E.    0  0 25  5  6  0
   6  F.   30  4 10  0  0  0
   7  G.    0  0 85  8  0  7
   8  H.    0  2 25  0 13  0
   9  I.   25  3 90  6  0  6
  10  J.   15  0 35  0  5  4

buy.net or buy.net

N1 = File/Network/Read buy.net
N2 = Network/Create New Network/Transform/Line Values/Set all Values to 1
N3 = Network/2-Mode Network/Transpose 2-Mode
select N2 as Second network
N4 = Networks/Multiply Networks

we get the matrix P

          1   2   3   4   5   6  Label
-----------------------------------------
   1.   140 125 140  75  35 110  a
   2.    13  15  15   8   3   9  b
   3.   265 255 450 310 155 370  c
   4.    13  13  35  35   5  30  d
   5.    17  25  53   6  53  17  e
   6.    19  15  29  21   8  29  f

we normalize it

N5 = Network/2-Mode Network/2-Mode to 1-Mode/Normalize 1-Mode/Output
           1    2    3    4    5    6  Label
-----------------------------------------------
   1.   1.00 0.89 1.00 0.54 0.25 0.79  a
   2.   0.87 1.00 1.00 0.53 0.20 0.60  b
   3.   0.59 0.57 1.00 0.69 0.34 0.82  c
   4.   0.37 0.37 1.00 1.00 0.14 0.86  d
   5.   0.32 0.47 1.00 0.11 1.00 0.32  e
   6.   0.66 0.52 1.00 0.72 0.28 1.00  f

we multiply element-wise the symmetric elements

N6 = Network/Create New Network/Transform/Transpose 1-Mode
select N5 as Second network
N7 = Networks/Cross-Intersection/Multiply
N8 = Network/Create New Network/Transform/Line Values/Absolute+Sqrt

We could apply cuts or islands on the similarity network N8.

           1    2    3    4    5    6  Label
-----------------------------------------------
   1.   1.00 0.88 0.77 0.45 0.28 0.72  a
   2.   0.88 1.00 0.75 0.45 0.31 0.56  b
   3.   0.77 0.75 1.00 0.83 0.59 0.91  c
   4.   0.45 0.45 0.83 1.00 0.13 0.79  d
   5.   0.28 0.31 0.59 0.13 1.00 0.30  e
   6.   0.72 0.56 0.91 0.79 0.30 1.00  f

Before doing this we should remove the loops (diagonal).

We continue by computing

N9 = Network/Create New Network/Transform/Line Values/Multiply [-1]
N10 = Network/Create New Network/Transform/Line Values/Add Constant [1]

the dissimilarity D

           1    2    3    4    5    6  Label
-----------------------------------------------
   1.   0.00 0.12 0.23 0.55 0.72 0.28  a
   2.   0.12 0.00 0.25 0.55 0.69 0.44  b
   3.   0.23 0.25 0.00 0.17 0.41 0.09  c
   4.   0.55 0.55 0.17 0.00 0.87 0.21  d
   5.   0.72 0.69 0.41 0.87 0.00 0.70  e
   6.   0.28 0.44 0.09 0.21 0.70 0.00  f

and apply the Ward's hierarchical method on it

Cluster/Create Complete Cluster [OK]
Network/Create Hierarchy/Clustering [Ward][RUN][dendro.eps]

we obtain a clustering represented by a dendrogram

dendro.svg

About using the clustering in Pajek see ESNA2.

notes/net/olaf.txt · Last modified: 2015/07/16 21:29 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki