This shows you the differences between two versions of the page.
— |
notes:net:olaf [2015/07/16 21:29] (current) vlado created |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Olaf ====== | ||
+ | |||
+ | March 14, 2015 - the Pi day 3/14/15 | ||
+ | |||
+ | ===== Problem ===== | ||
+ | |||
+ | <code> | ||
+ | Subject [Pajek] 2-mode networks | ||
+ | From Olaf Rank | ||
+ | Sender pajek-bounces@list.fmf.uni-lj.si | ||
+ | To pajek@list.fmf.uni-lj.siAdd contact | ||
+ | Date Fri 08:27 | ||
+ | Dear all, | ||
+ | |||
+ | I am new to this list and my question may be rather generic in nature. | ||
+ | |||
+ | I have a large 2-mode dataset consisting of approximately 90.000 customers and 1.200 products. | ||
+ | The ties are defined as valued purchasing relationships (in Euros). | ||
+ | |||
+ | My first idea was to identify the structural patterns of the network by grouping the network | ||
+ | in two ways, and my idea was to use blockmodeling: | ||
+ | |||
+ | (1) Define blocks of products (based on the purchasing ties) and investigate whether customers | ||
+ | purchasing one or a combination of blocks of products can be characterized by specific attributes. | ||
+ | (2) Define blocks of customers (based on their purchasing acts) and investigate whether customers | ||
+ | belonging to a block are homogeneous with respect to their characteristics. | ||
+ | |||
+ | As far as I can see, Pajek is only able to handle blockmodels with up to 1.000 vertices, and I | ||
+ | could not find the respective option in Pajek XXL. | ||
+ | |||
+ | My question would be if anyone is able to recommend an alternative analytic approach? | ||
+ | |||
+ | Any suggestions are highly appreciated! | ||
+ | |||
+ | Best, | ||
+ | Olaf | ||
+ | </code> | ||
+ | |||
+ | ===== Answer ===== | ||
+ | |||
+ | The current versions of blockmodeling procedures are too time consuming for large networks. | ||
+ | |||
+ | There are alternative methods to identify important subnetworks in two-mode networks: | ||
+ | |||
+ | * direct methods for analysis of two-mode networks: | ||
+ | * ''Network/2-Mode Network/Core'' | ||
+ | * ''Network/Create New Network/with Ring Counts stored as Line Values'' | ||
+ | * ''Network/2-Mode Network/Important Vertices'' | ||
+ | * indirect methods. We transform a two-mode network into a corresponding one-mode network. The obtained network is then analyzed using standard methods. | ||
+ | |||
+ | ==== An approach ==== | ||
+ | |||
+ | |||
+ | Here I will describe an approach based on ideas from our [[http://link.springer.com/article/10.1007%2Fs11192-012-0940-1|paper]] (see also the [[http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470714522.html|book]]). | ||
+ | |||
+ | Let C be the set of customers, A the set of articles (products), and the network data represented as the corresponding matrix **B** = [b<sub>ca</sub>], c ∈ C, a ∈ A. We define a new network (matrix) **P** = **B**<sup>T</sup> * bin(**B**) or | ||
+ | |||
+ | p<sub>aa'</sub> = ∑ { b<sub>ca</sub><sup>T</sup> . bin(b<sub>ca'</sub>) : c ∈ C } | ||
+ | |||
+ | where bin(b) = 0 if b = 0, otherwise bin(b) = 1. | ||
+ | |||
+ | The meaning of p<sub>aa'</sub> is: it is equal to the value spent on article a by customers that bought also article a'; and the value of p<sub>aa</sub> is equal to the total value spent on article a. | ||
+ | |||
+ | In general p<sub>aa'</sub> ≠ p<sub>a'a</sub>. To get a symmetric measure and to make articles comparable we normalize these weights and take a geometric average of symmetric values | ||
+ | |||
+ | s<sub>aa'</sub> = √ ( (p<sub>aa'</sub>/p<sub>aa</sub>).(p<sub>a'a</sub>/p<sub>a'a'</sub>) ) | ||
+ | |||
+ | The measure s<sub>aa'</sub> is a similarity measure. To analyze it we could apply for example the cuts or the islands method. | ||
+ | |||
+ | Another option is to use the hierarchical clustering method. In this case we have to convert it first to a dissimilarity **D** = [ d<sub>aa'</sub> ] | ||
+ | |||
+ | d<sub>aa'</sub> = 1 - s<sub>aa'</sub> | ||
+ | |||
+ | Essentially the same approach could be applied also for partitioning the customers. The problem is that the size of the set C is probably already too large. For very large data sets a special version of leaders method could be the right solution. | ||
+ | |||
+ | Another option would be to define **P** = **B**<sup>T</sup> * frac(**B**) or | ||
+ | |||
+ | p<sub>aa'</sub> = ∑ { b<sub>ca</sub><sup>T</sup> . frac(b<sub>ca'</sub>) : c ∈ C } | ||
+ | |||
+ | where | ||
+ | |||
+ | frac(b<sub>ca</sub>) = b<sub>ca</sub> / ∑ { b<sub>ca</sub> : c ∈ C } | ||
+ | |||
+ | |||
+ | ==== Example ==== | ||
+ | |||
+ | |||
+ | To illustrate the procedure I created a toy example: | ||
+ | |||
+ | <code> | ||
+ | 11 12 13 14 15 16 | ||
+ | a b c d e f | ||
+ | ----------------------------- | ||
+ | 1 A. 20 1 50 0 12 4 | ||
+ | 2 B. 50 5 80 7 0 5 | ||
+ | 3 C. 0 0 20 0 17 0 | ||
+ | 4 D. 0 0 30 9 0 3 | ||
+ | 5 E. 0 0 25 5 6 0 | ||
+ | 6 F. 30 4 10 0 0 0 | ||
+ | 7 G. 0 0 85 8 0 7 | ||
+ | 8 H. 0 2 25 0 13 0 | ||
+ | 9 I. 25 3 90 6 0 6 | ||
+ | 10 J. 15 0 35 0 5 4 | ||
+ | </code> | ||
+ | [[http://zvonka.fmf.uni-lj.si/netbook/lib/exe/fetch.php?media=notes:data:buy.net|buy.net]] or {{notes:data:buy.net|buy.net}} | ||
+ | |||
+ | <code> | ||
+ | N1 = File/Network/Read buy.net | ||
+ | N2 = Network/Create New Network/Transform/Line Values/Set all Values to 1 | ||
+ | N3 = Network/2-Mode Network/Transpose 2-Mode | ||
+ | select N2 as Second network | ||
+ | N4 = Networks/Multiply Networks | ||
+ | </code> | ||
+ | we get the matrix **P** | ||
+ | <code> | ||
+ | 1 2 3 4 5 6 Label | ||
+ | ----------------------------------------- | ||
+ | 1. 140 125 140 75 35 110 a | ||
+ | 2. 13 15 15 8 3 9 b | ||
+ | 3. 265 255 450 310 155 370 c | ||
+ | 4. 13 13 35 35 5 30 d | ||
+ | 5. 17 25 53 6 53 17 e | ||
+ | 6. 19 15 29 21 8 29 f | ||
+ | </code> | ||
+ | we normalize it | ||
+ | <code> | ||
+ | N5 = Network/2-Mode Network/2-Mode to 1-Mode/Normalize 1-Mode/Output | ||
+ | </code> | ||
+ | <code> | ||
+ | 1 2 3 4 5 6 Label | ||
+ | ----------------------------------------------- | ||
+ | 1. 1.00 0.89 1.00 0.54 0.25 0.79 a | ||
+ | 2. 0.87 1.00 1.00 0.53 0.20 0.60 b | ||
+ | 3. 0.59 0.57 1.00 0.69 0.34 0.82 c | ||
+ | 4. 0.37 0.37 1.00 1.00 0.14 0.86 d | ||
+ | 5. 0.32 0.47 1.00 0.11 1.00 0.32 e | ||
+ | 6. 0.66 0.52 1.00 0.72 0.28 1.00 f | ||
+ | |||
+ | </code> | ||
+ | we multiply element-wise the symmetric elements | ||
+ | <code> | ||
+ | N6 = Network/Create New Network/Transform/Transpose 1-Mode | ||
+ | select N5 as Second network | ||
+ | N7 = Networks/Cross-Intersection/Multiply | ||
+ | N8 = Network/Create New Network/Transform/Line Values/Absolute+Sqrt | ||
+ | </code> | ||
+ | We could apply cuts or islands on the similarity network N8. | ||
+ | <code> | ||
+ | 1 2 3 4 5 6 Label | ||
+ | ----------------------------------------------- | ||
+ | 1. 1.00 0.88 0.77 0.45 0.28 0.72 a | ||
+ | 2. 0.88 1.00 0.75 0.45 0.31 0.56 b | ||
+ | 3. 0.77 0.75 1.00 0.83 0.59 0.91 c | ||
+ | 4. 0.45 0.45 0.83 1.00 0.13 0.79 d | ||
+ | 5. 0.28 0.31 0.59 0.13 1.00 0.30 e | ||
+ | 6. 0.72 0.56 0.91 0.79 0.30 1.00 f | ||
+ | </code> | ||
+ | Before doing this we should remove the loops (diagonal). | ||
+ | |||
+ | We continue by computing | ||
+ | <code> | ||
+ | N9 = Network/Create New Network/Transform/Line Values/Multiply [-1] | ||
+ | N10 = Network/Create New Network/Transform/Line Values/Add Constant [1] | ||
+ | </code> | ||
+ | the dissimilarity **D** | ||
+ | <code> | ||
+ | 1 2 3 4 5 6 Label | ||
+ | ----------------------------------------------- | ||
+ | 1. 0.00 0.12 0.23 0.55 0.72 0.28 a | ||
+ | 2. 0.12 0.00 0.25 0.55 0.69 0.44 b | ||
+ | 3. 0.23 0.25 0.00 0.17 0.41 0.09 c | ||
+ | 4. 0.55 0.55 0.17 0.00 0.87 0.21 d | ||
+ | 5. 0.72 0.69 0.41 0.87 0.00 0.70 e | ||
+ | 6. 0.28 0.44 0.09 0.21 0.70 0.00 f | ||
+ | </code> | ||
+ | and apply the Ward's hierarchical method on it | ||
+ | <code> | ||
+ | Cluster/Create Complete Cluster [OK] | ||
+ | Network/Create Hierarchy/Clustering [Ward][RUN][dendro.eps] | ||
+ | </code> | ||
+ | we obtain a clustering represented by a dendrogram | ||
+ | |||
+ | {{notes:pics:dendro.svg?350x470}} | ||
+ | |||
+ | About using the clustering in Pajek see [[http://www.cambridge.org/de/academic/subjects/sociology/research-methods-sociology-and-criminology/exploratory-social-network-analysis-pajek-2nd-edition|ESNA2]]. | ||