Nashville Meetup Network

17. maj 2023

Stephen Bailey: Nashville Meetup Network/Teaching Dataset for NashNetX Presentation (PyTN)

meetup.com is a website for people organizing and attending regular or semi-regular events (“meet-ups”). The relationships amongst users—who goes to what meetups—are a social network, ideal for graph-based analysis.

This dataset was generated for a talk titled Principles of Network Analysis with NetworkX. It forms the basis for a series of tutorials I presented at PyNash and PyTennessee. In them, we work through the basics of graph theory and how to use NetworkX, a popular open-source Python package. We then apply this knowledge to extract insights about the social fabric of Tennessee MeetUp groups.

Graph data

  • member-to-group-edges.csv: Edge list for constructing a member-to-group bipartite graph. Weights represent number of events attended in each group.
  • group-edges.csv: Edge list for constructing a group-to-group graph. Weights represent shared members between groups.
  • member-edges.csv: Edge list for constructing a member-to-member graph. Weights represent shared group membership.
  • rsvps.csv: Raw member-to-event attendance data, which was aggregated to form member-to-group-edges.csv.

Metadata

  • meta-groups.csv: Information for each group, including name and category. group_id can serve as index.
  • meta-members.csv: Information for each member, including name and location. member_id can serve as index.
  • meta-events.csv: Information for each event, including name and time. event_id can serve as index.

Acknowledgments

I'd like to acknowledge the folks at MeetUp.com, who have made their database publicly available via a convenient REST API. Even newbies like myself can access and enjoy!

Comments

The basic network is a 3-way network rsvps.csv with 126813 links (event, member, group) .

Additional data about the ways can be obtained from metadata files.

  • events (19307): event_id, group_id, name, time
  • members (24591): member_id, name, hometown, city, state, lat, lon
  • groups (602): group_id, group_name, num_members, category_id, category_name, organizer_id, group_urlname

The first three networks are derived from the given 3-way network.

Conversion to a multiway network

First we read and inspect the data

> wdir <- "D:/vlado/DL/data/Nashville"
> setwd(wdir)
> library(jsonlite)
> source("https://raw.githubusercontent.com/bavla/Rnet/master/R/Pajek.R")
> source("https://raw.githubusercontent.com/bavla/ibm3m/master/multiway/MWnets.R")

> M <- read.csv("meta-members.csv",sep=",",head=TRUE)
> E <- read.csv("meta-events.csv",sep=",",head=TRUE)
> G <- read.csv("meta-groups.csv",sep=",",head=TRUE)
> L <- read.csv("rsvps.csv",sep=",",head=TRUE)

> cbind(M=dim(M),E=dim(E),G=dim(G),L=dim(L))
         M     E   G      L
[1,] 24591 19307 602 126813
[2,]     7     4   7      4
> 
> head(M,2)
  member_id                name  hometown      city state   lat    lon
1      2069 Wesley Duffee-Braun Brentwood Brentwood    TN 36.00 -86.79
2      8386                 Tim Nashville Nashville    TN 36.07 -86.78
> head(E,2)
   event_id group_id                                                   name                time
1 243930425 26140018 2017 Nashville Walk to End AlzheimerÂ’s - October 14th 2017-10-14 12:00:00
2 244208851 25604533                              Steak Dinner on the Patio 2017-10-15 00:15:00
> head(G,2)
  group_id                       group_name num_members category_id        category_name organizer_id
1   339011          Nashville Hiking Meetup       15838          23 Outdoors & Adventure      4353803
2 19728145 Stepping Out Social Dance Meetup        1778           5              Dancing    118484462
           group_urlname
1       nashville-hiking
2 steppingoutsocialdance
> head(L,2)
  X  event_id member_id group_id
1 0 243930425   6770985 26140018
2 1 244208851 234724627 25604533
 

> G$organizer_id[1]
[1] 4353803
> which(M$member_id==4353803)
[1] 433
> M[433,]
    member_id          name  hometown      city state   lat    lon
433   4353803 Kelly Stewart Nashville Nashville    TN 36.07 -86.73
> table(M$state)

         17    AA    AB    AK    AL    AR    AZ    B7    BC    C3    CA    CE    CO    CT    DC    DE 
   94     8     1     5     4   100    16    37     1     3     2   180     1    48     7    26     4 
   F7    F8    FL    GA    HI    I9    IA    ID    IL    IN    KS    KY    LA    MA    MD    ME    MI 
    1     2   109   162     5     1     3     5   194    39    10   213    10    35    27     3    39 
   MN    MO    MS    N7    NC    ND    NE    NH    NJ    NM    NV    NY    OH    OK    ON    OR    PA 
   12    21     7     1    72     1     2     2    20     7     8   137    67     6     5    14    26 
   QC    RI    RM    SC    SD    T5    TN    TX    UT    VA    VT    WA    WI    WV 
    4     3     1    31     3     1 22560    91     8    44     4    30     7     1  

We transform the data frame L into the corresponding multiway network

> MT <- DF2MWN(L,c("event","member","group"),
+    network="Nashville",title="Nashville Meetup Network")
> str(MT)
List of 6
 $ format: chr "MWnets"
 $ info  :List of 4
  ..$ network: chr "Nashville"
  ..$ title  : chr "Nashville Meetup Network"
  ..$ by     : chr "DF2MWN"
  ..$ date   : chr "Wed May 17 22:02:10 2023"
 $ ways  :List of 3
  ..$ event : chr "event"
  ..$ member: chr "member"
  ..$ group : chr "group"
 $ nodes :List of 3
  ..$ event :'data.frame':      19031 obs. of  1 variable:
  .. ..$ ID: chr [1:19031] "107248742" "117878862" "133313452" "145868842" ...
  ..$ member:'data.frame':      24631 obs. of  1 variable:
  .. ..$ ID: chr [1:24631] "2069" "8386" "9205" "17903" ...
  ..$ group :'data.frame':      602 obs. of  1 variable:
  .. ..$ ID: chr [1:602] "47094" "168014" "191516" "204767" ...
 $ links :'data.frame': 126813 obs. of  4 variables:
  ..$ one   : num [1:126813] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ event : int [1:126813] 9945 10071 15394 15394 10143 10086 9973 9813 9732 9732 ...
  ..$ member: int [1:126813] 766 23679 13780 16347 124 124 124 124 124 12065 ...
  ..$ group : int [1:126813] 594 554 580 580 229 229 229 229 229 229 ...
 $ data  : list()
> MT$info$URL <- "https://www.kaggle.com/datasets/stkbailey/nashville-meetup"
> MT$info$by <- "Stephen Bailey"
> MT$info$creator <- "Vladimir Batagelj"
> names(MT$links)[1] <- "w"
> write(toJSON(MT),"Nashville0.json")
> #  member_id,  name,  hometown,  city,  state,  lat,  lon
> # n(M) = 24591  n(L) = 24631 
> n <- length(MT$nodes$member$ID)
> T <- paste("#",1:n,sep=""); names(T) <- MT$nodes$member$ID
> for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$name[i]
> MT$nodes$member$name <- T
> T[1:n] <- NA
> for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$hometown[i]
> MT$nodes$member$hometown <- T
> T[1:n] <- NA
> for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$city[i]
> MT$nodes$member$city <- T
> T[1:n] <- NA
> for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$state[i]
> MT$nodes$member$state <- T
> T[1:n] <- NA
> for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$lat[i]
> MT$nodes$member$lat <- as.numeric(T)
> T[1:n] <- NA
> for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$lon[i]
> MT$nodes$member$lon <- as.numeric(T)

> # event_id,  group_id,  name,  time
> # n(E) = 19307  n(L) = 19031
> n <- length(MT$nodes$event$ID)
> T <- paste("#",1:n,sep=""); names(T) <- MT$nodes$event$ID
> for(i in 1:n){j <- which(E$event_id[i] == MT$nodes$event$ID);
+   if(length(j)>0) T[j] <- E$name[i] }
> MT$nodes$event$name <- T
> T[1:n] <- NA
> for(i in 1:n){j <- which(E$event_id[i] == MT$nodes$event$ID);
+   if(length(j)>0) T[j] <- E$group_id[i] }
> MT$nodes$event$groupID <- T
> T[1:n] <- NA
> for(i in 1:n){j <- which(E$event_id[i] == MT$nodes$event$ID);
+   if(length(j)>0) {k <- which(E$group_id[i] == MT$nodes$group$ID);
+     if(length(k)>0) T[j] <- k }}
> MT$nodes$event$group <- as.integer(T)
> T[1:n] <- NA
> for(i in 1:n){j <- which(E$event_id[i] == MT$nodes$event$ID);
+   if(length(j)>0) T[j] <- E$time[i] }
> MT$nodes$event$time <- T

> # group_id,  group_name,  num_members,  category_id,  category_name,  organizer_id
> # n(G) = n(L) = 602 
> n <- length(MT$nodes$group$ID)
> T <- paste("#",1:n,sep=""); names(T) <- MT$nodes$group$ID
> for(i in 1:nrow(G)) T[as.character(G$group_id[i])] <- G$group_name[i]
> MT$nodes$group$name <- T
> T[1:n] <- NA
> for(i in 1:nrow(G)) T[as.character(G$group_id[i])] <- G$num_members[i]
> MT$nodes$group$num <- as.integer(T)
> T[1:n] <- NA
> for(i in 1:nrow(G)) T[as.character(G$group_id[i])] <- G$category_id[i]
> MT$nodes$group$catID <- as.integer(T)
> T[1:n] <- NA
> for(i in 1:nrow(G)) T[as.character(G$group_id[i])] <- G$category_name[i]
> MT$nodes$group$catName <- T
> T[1:n] <- NA
> for(i in 1:nrow(G)) { j <- which(G$organizer_id[i] == MT$nodes$member$ID);
+   if(length(j)>0) T[as.character(G$group_id[i])] <- j }
> MT$nodes$group$org <- as.integer(T)
> T[1:n] <- NA
> for(i in 1:nrow(G)) T[as.character(G$group_id[i])] <- G$organizer_id[i]
> MT$nodes$group$orgID <- T
> str(MT)
List of 6
 $ format: chr "MWnets"
 $ info  :List of 6
  ..$ network: chr "Nashville"
  ..$ title  : chr "Nashville Meetup Network"
  ..$ by     : chr "Stephen Bailey"
  ..$ date   : chr "Wed May 17 22:02:10 2023"
  ..$ URL    : chr "https://www.kaggle.com/datasets/stkbailey/nashville-meetup"
  ..$ creator: chr "Vladimir Batagelj"
 $ ways  :List of 3
  ..$ event : chr "event"
  ..$ member: chr "member"
  ..$ group : chr "group"
 $ nodes :List of 3
  ..$ event :'data.frame':      19031 obs. of  5 variables:
  .. ..$ ID     : chr [1:19031] "107248742" "117878862" "133313452" "145868842" ...
  .. ..$ name   : chr [1:19031] "Real Secrets to Money Meeting" "Google AdWords: Pay Per Click Advertising" "Gilda with  ...
  .. ..$ group  : int [1:19031] 98 74 63 40 NA NA 87 125 123 70 ...
  .. ..$ time   : chr [1:19031] "2016-07-30 15:00:00" "2016-02-19 00:30:00" "2017-08-12 23:00:00" "2016-04-30 00:00:00" ...
  .. ..$ groupID: chr [1:19031] "4515232" "1776274" "1642477" "1396244" ...
  ..$ member:'data.frame':      24631 obs. of  7 variables:
  .. ..$ ID      : chr [1:24631] "2069" "8386" "9205" "17903" ...
  .. ..$ name    : chr [1:24631] "Wesley Duffee-Braun" "Tim" "Brenda" "Steve" ...
  .. ..$ hometown: chr [1:24631] "Brentwood" "Nashville" "Brentwood" "" ...
  .. ..$ city    : chr [1:24631] "Brentwood" "Nashville" "Brentwood" "Nashville" ...
  .. ..$ state   : chr [1:24631] "TN" "TN" "TN" "TN" ...
  .. ..$ lat     : num [1:24631] 36 36.1 36 36.1 36.2 ...
  .. ..$ lon     : num [1:24631] -86.8 -86.8 -86.8 -86.8 -86.7 ...
  ..$ group :'data.frame':      602 obs. of  7 variables:
  .. ..$ ID     : chr [1:602] "47094" "168014" "191516" "204767" ...
  .. ..$ name   : chr [1:602] "The Greater Nashville RPG and Board Gamers Group" "The Nashville Writers Meetup" "Vegan Food & Friends" "Nashville Podcasters" ...
  .. ..$ num    : int [1:602] 2375 3286 850 292 1570 1417 1035 1910 15838 4017 ...
  .. ..$ catID  : int [1:602] 11 36 10 34 2 16 2 2 23 27 ...
  .. ..$ catName: chr [1:602] "Games" "Writing" "Food & Drink" "Tech" ...
  .. ..$ org    : int [1:602] 9920 101 2823 6774 534 197 2179 9380 433 NA ...
  .. ..$ orgID  : chr [1:602] "185299351" "1281684" "13911819" "119792682" ...
 $ links :'data.frame': 126813 obs. of  4 variables:
  ..$ w     : num [1:126813] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ event : int [1:126813] 9945 10071 15394 15394 10143 10086 9973 9813 9732 9732 ...
  ..$ member: int [1:126813] 766 23679 13780 16347 124 124 124 124 124 12065 ...
  ..$ group : int [1:126813] 594 554 580 580 229 229 229 229 229 229 ...
 $ data  : list()
> write(toJSON(MT),"Nashville.json")

The metadata are incomplete.

> nm <- which(is.na(MT$nodes$member$state))
> length(nm)
[1] 40
> MT$nodes$member$ID[nm]
 [1] "4401891"   "11768745"  "14174163"  "144948972" "186309120" "187247160" "195809605" "198239229"
 [9] "204144233" "213066630" "218376647" "218996189" "231160817" "239124265" "239332166" "239603664"
[17] "239605811" "239640173" "239641566" "239642192" "239643852" "239673334" "239678461" "239680203"
[25] "239706951" "239715174" "239716809" "239727417" "239747946" "239748033" "239751555" "239754094"
[33] "239782806" "239798750" "239801836" "239813038" "239817741" "239824858" "239909013" "240003395"
> ne <- which(is.na(MT$nodes$event$time))
> length(ne)
[1] 275
> gr <- MT$nodes$event$group[!is.na(MT$nodes$event$groupID)]
> length(gr)
[1] 18756
> neg <- which(is.na(gr))
> length(neg)
[1] 0
> ng <- which(is.na(MT$nodes$group$catName))
> length(ng)
[1] 0
> ngo <- which(is.na(MT$nodes$group$org))
> length(ngo)
[1] 35
> ngi <- which(is.na(MT$nodes$group$orgID))
> length(ngi)
[1] 0
> MT$nodes$group$orgID[is.na(MT$nodes$group$org)]
 [1] "147707862" "67608552"  "7476365"   "1099042"   "198986314" "198986314" "213302747" "194666305"
 [9] "7476365"   "204135137" "215201845" "86371752"  "187840592" "153961372" "59559622"  "198986314"
[17] "198069024" "96591722"  "67608552"  "193181718" "214700450" "201177047" "217832051" "10169306" 
[25] "98668942"  "221183678" "221183678" "221183678" "223501599" "114766932" "209818889" "228033479"
[33] "87574402"  "231551616" "12143808" 
> 
  • There are 40 members used in links but not described in metadata.
  • There are 275 events used in links but not described in metadata.
  • group ↔ event ??? Each event is linked to exactly one group - the network is not a multiway network!

We have two 2-mode networks - members X events and events X groups.

vlado/work/2m/mwn/nashville.txt · Last modified: 2023/05/18 04:40 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki