17. maj 2023
Stephen Bailey: Nashville Meetup Network/Teaching Dataset for NashNetX Presentation (PyTN)
meetup.com is a website for people organizing and attending regular or semi-regular events (“meet-ups”). The relationships amongst users—who goes to what meetups—are a social network, ideal for graph-based analysis.
This dataset was generated for a talk titled Principles of Network Analysis with NetworkX. It forms the basis for a series of tutorials I presented at PyNash and PyTennessee. In them, we work through the basics of graph theory and how to use NetworkX, a popular open-source Python package. We then apply this knowledge to extract insights about the social fabric of Tennessee MeetUp groups.
Graph data
member-to-group-edges.csv
: Edge list for constructing a member-to-group bipartite graph. Weights represent number of events attended in each group.group-edges.csv
: Edge list for constructing a group-to-group graph. Weights represent shared members between groups.member-edges.csv
: Edge list for constructing a member-to-member graph. Weights represent shared group membership.rsvps.csv
: Raw member-to-event attendance data, which was aggregated to form member-to-group-edges.csv.Metadata
meta-groups.csv
: Information for each group, including name and category. group_id can serve as index.meta-members.csv
: Information for each member, including name and location. member_id can serve as index.meta-events.csv
: Information for each event, including name and time. event_id can serve as index.Acknowledgments
I'd like to acknowledge the folks at MeetUp.com, who have made their database publicly available via a convenient REST API. Even newbies like myself can access and enjoy!
The basic network is a 3-way network rsvps.csv
with 126813 links (event, member, group) .
Additional data about the ways can be obtained from metadata files.
The first three networks are derived from the given 3-way network.
First we read and inspect the data
> wdir <- "D:/vlado/DL/data/Nashville" > setwd(wdir) > library(jsonlite) > source("https://raw.githubusercontent.com/bavla/Rnet/master/R/Pajek.R") > source("https://raw.githubusercontent.com/bavla/ibm3m/master/multiway/MWnets.R") > M <- read.csv("meta-members.csv",sep=",",head=TRUE) > E <- read.csv("meta-events.csv",sep=",",head=TRUE) > G <- read.csv("meta-groups.csv",sep=",",head=TRUE) > L <- read.csv("rsvps.csv",sep=",",head=TRUE) > cbind(M=dim(M),E=dim(E),G=dim(G),L=dim(L)) M E G L [1,] 24591 19307 602 126813 [2,] 7 4 7 4 > > head(M,2) member_id name hometown city state lat lon 1 2069 Wesley Duffee-Braun Brentwood Brentwood TN 36.00 -86.79 2 8386 Tim Nashville Nashville TN 36.07 -86.78 > head(E,2) event_id group_id name time 1 243930425 26140018 2017 Nashville Walk to End AlzheimerÂ’s - October 14th 2017-10-14 12:00:00 2 244208851 25604533 Steak Dinner on the Patio 2017-10-15 00:15:00 > head(G,2) group_id group_name num_members category_id category_name organizer_id 1 339011 Nashville Hiking Meetup 15838 23 Outdoors & Adventure 4353803 2 19728145 Stepping Out Social Dance Meetup 1778 5 Dancing 118484462 group_urlname 1 nashville-hiking 2 steppingoutsocialdance > head(L,2) X event_id member_id group_id 1 0 243930425 6770985 26140018 2 1 244208851 234724627 25604533 > G$organizer_id[1] [1] 4353803 > which(M$member_id==4353803) [1] 433 > M[433,] member_id name hometown city state lat lon 433 4353803 Kelly Stewart Nashville Nashville TN 36.07 -86.73 > table(M$state) 17 AA AB AK AL AR AZ B7 BC C3 CA CE CO CT DC DE 94 8 1 5 4 100 16 37 1 3 2 180 1 48 7 26 4 F7 F8 FL GA HI I9 IA ID IL IN KS KY LA MA MD ME MI 1 2 109 162 5 1 3 5 194 39 10 213 10 35 27 3 39 MN MO MS N7 NC ND NE NH NJ NM NV NY OH OK ON OR PA 12 21 7 1 72 1 2 2 20 7 8 137 67 6 5 14 26 QC RI RM SC SD T5 TN TX UT VA VT WA WI WV 4 3 1 31 3 1 22560 91 8 44 4 30 7 1
We transform the data frame L into the corresponding multiway network
> MT <- DF2MWN(L,c("event","member","group"), + network="Nashville",title="Nashville Meetup Network") > str(MT) List of 6 $ format: chr "MWnets" $ info :List of 4 ..$ network: chr "Nashville" ..$ title : chr "Nashville Meetup Network" ..$ by : chr "DF2MWN" ..$ date : chr "Wed May 17 22:02:10 2023" $ ways :List of 3 ..$ event : chr "event" ..$ member: chr "member" ..$ group : chr "group" $ nodes :List of 3 ..$ event :'data.frame': 19031 obs. of 1 variable: .. ..$ ID: chr [1:19031] "107248742" "117878862" "133313452" "145868842" ... ..$ member:'data.frame': 24631 obs. of 1 variable: .. ..$ ID: chr [1:24631] "2069" "8386" "9205" "17903" ... ..$ group :'data.frame': 602 obs. of 1 variable: .. ..$ ID: chr [1:602] "47094" "168014" "191516" "204767" ... $ links :'data.frame': 126813 obs. of 4 variables: ..$ one : num [1:126813] 1 1 1 1 1 1 1 1 1 1 ... ..$ event : int [1:126813] 9945 10071 15394 15394 10143 10086 9973 9813 9732 9732 ... ..$ member: int [1:126813] 766 23679 13780 16347 124 124 124 124 124 12065 ... ..$ group : int [1:126813] 594 554 580 580 229 229 229 229 229 229 ... $ data : list()
> MT$info$URL <- "https://www.kaggle.com/datasets/stkbailey/nashville-meetup" > MT$info$by <- "Stephen Bailey" > MT$info$creator <- "Vladimir Batagelj" > names(MT$links)[1] <- "w" > write(toJSON(MT),"Nashville0.json")
> # member_id, name, hometown, city, state, lat, lon > # n(M) = 24591 n(L) = 24631 > n <- length(MT$nodes$member$ID) > T <- paste("#",1:n,sep=""); names(T) <- MT$nodes$member$ID > for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$name[i] > MT$nodes$member$name <- T > T[1:n] <- NA > for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$hometown[i] > MT$nodes$member$hometown <- T > T[1:n] <- NA > for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$city[i] > MT$nodes$member$city <- T > T[1:n] <- NA > for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$state[i] > MT$nodes$member$state <- T > T[1:n] <- NA > for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$lat[i] > MT$nodes$member$lat <- as.numeric(T) > T[1:n] <- NA > for(i in 1:nrow(M)) T[as.character(M$member_id[i])] <- M$lon[i] > MT$nodes$member$lon <- as.numeric(T) > # event_id, group_id, name, time > # n(E) = 19307 n(L) = 19031 > n <- length(MT$nodes$event$ID) > T <- paste("#",1:n,sep=""); names(T) <- MT$nodes$event$ID > for(i in 1:n){j <- which(E$event_id[i] == MT$nodes$event$ID); + if(length(j)>0) T[j] <- E$name[i] } > MT$nodes$event$name <- T > T[1:n] <- NA > for(i in 1:n){j <- which(E$event_id[i] == MT$nodes$event$ID); + if(length(j)>0) T[j] <- E$group_id[i] } > MT$nodes$event$groupID <- T > T[1:n] <- NA > for(i in 1:n){j <- which(E$event_id[i] == MT$nodes$event$ID); + if(length(j)>0) {k <- which(E$group_id[i] == MT$nodes$group$ID); + if(length(k)>0) T[j] <- k }} > MT$nodes$event$group <- as.integer(T) > T[1:n] <- NA > for(i in 1:n){j <- which(E$event_id[i] == MT$nodes$event$ID); + if(length(j)>0) T[j] <- E$time[i] } > MT$nodes$event$time <- T > # group_id, group_name, num_members, category_id, category_name, organizer_id > # n(G) = n(L) = 602 > n <- length(MT$nodes$group$ID) > T <- paste("#",1:n,sep=""); names(T) <- MT$nodes$group$ID > for(i in 1:nrow(G)) T[as.character(G$group_id[i])] <- G$group_name[i] > MT$nodes$group$name <- T > T[1:n] <- NA > for(i in 1:nrow(G)) T[as.character(G$group_id[i])] <- G$num_members[i] > MT$nodes$group$num <- as.integer(T) > T[1:n] <- NA > for(i in 1:nrow(G)) T[as.character(G$group_id[i])] <- G$category_id[i] > MT$nodes$group$catID <- as.integer(T) > T[1:n] <- NA > for(i in 1:nrow(G)) T[as.character(G$group_id[i])] <- G$category_name[i] > MT$nodes$group$catName <- T > T[1:n] <- NA > for(i in 1:nrow(G)) { j <- which(G$organizer_id[i] == MT$nodes$member$ID); + if(length(j)>0) T[as.character(G$group_id[i])] <- j } > MT$nodes$group$org <- as.integer(T) > T[1:n] <- NA > for(i in 1:nrow(G)) T[as.character(G$group_id[i])] <- G$organizer_id[i] > MT$nodes$group$orgID <- T
> str(MT) List of 6 $ format: chr "MWnets" $ info :List of 6 ..$ network: chr "Nashville" ..$ title : chr "Nashville Meetup Network" ..$ by : chr "Stephen Bailey" ..$ date : chr "Wed May 17 22:02:10 2023" ..$ URL : chr "https://www.kaggle.com/datasets/stkbailey/nashville-meetup" ..$ creator: chr "Vladimir Batagelj" $ ways :List of 3 ..$ event : chr "event" ..$ member: chr "member" ..$ group : chr "group" $ nodes :List of 3 ..$ event :'data.frame': 19031 obs. of 5 variables: .. ..$ ID : chr [1:19031] "107248742" "117878862" "133313452" "145868842" ... .. ..$ name : chr [1:19031] "Real Secrets to Money Meeting" "Google AdWords: Pay Per Click Advertising" "Gilda with ... .. ..$ group : int [1:19031] 98 74 63 40 NA NA 87 125 123 70 ... .. ..$ time : chr [1:19031] "2016-07-30 15:00:00" "2016-02-19 00:30:00" "2017-08-12 23:00:00" "2016-04-30 00:00:00" ... .. ..$ groupID: chr [1:19031] "4515232" "1776274" "1642477" "1396244" ... ..$ member:'data.frame': 24631 obs. of 7 variables: .. ..$ ID : chr [1:24631] "2069" "8386" "9205" "17903" ... .. ..$ name : chr [1:24631] "Wesley Duffee-Braun" "Tim" "Brenda" "Steve" ... .. ..$ hometown: chr [1:24631] "Brentwood" "Nashville" "Brentwood" "" ... .. ..$ city : chr [1:24631] "Brentwood" "Nashville" "Brentwood" "Nashville" ... .. ..$ state : chr [1:24631] "TN" "TN" "TN" "TN" ... .. ..$ lat : num [1:24631] 36 36.1 36 36.1 36.2 ... .. ..$ lon : num [1:24631] -86.8 -86.8 -86.8 -86.8 -86.7 ... ..$ group :'data.frame': 602 obs. of 7 variables: .. ..$ ID : chr [1:602] "47094" "168014" "191516" "204767" ... .. ..$ name : chr [1:602] "The Greater Nashville RPG and Board Gamers Group" "The Nashville Writers Meetup" "Vegan Food & Friends" "Nashville Podcasters" ... .. ..$ num : int [1:602] 2375 3286 850 292 1570 1417 1035 1910 15838 4017 ... .. ..$ catID : int [1:602] 11 36 10 34 2 16 2 2 23 27 ... .. ..$ catName: chr [1:602] "Games" "Writing" "Food & Drink" "Tech" ... .. ..$ org : int [1:602] 9920 101 2823 6774 534 197 2179 9380 433 NA ... .. ..$ orgID : chr [1:602] "185299351" "1281684" "13911819" "119792682" ... $ links :'data.frame': 126813 obs. of 4 variables: ..$ w : num [1:126813] 1 1 1 1 1 1 1 1 1 1 ... ..$ event : int [1:126813] 9945 10071 15394 15394 10143 10086 9973 9813 9732 9732 ... ..$ member: int [1:126813] 766 23679 13780 16347 124 124 124 124 124 12065 ... ..$ group : int [1:126813] 594 554 580 580 229 229 229 229 229 229 ... $ data : list() > write(toJSON(MT),"Nashville.json")
The metadata are incomplete.
> nm <- which(is.na(MT$nodes$member$state)) > length(nm) [1] 40 > MT$nodes$member$ID[nm] [1] "4401891" "11768745" "14174163" "144948972" "186309120" "187247160" "195809605" "198239229" [9] "204144233" "213066630" "218376647" "218996189" "231160817" "239124265" "239332166" "239603664" [17] "239605811" "239640173" "239641566" "239642192" "239643852" "239673334" "239678461" "239680203" [25] "239706951" "239715174" "239716809" "239727417" "239747946" "239748033" "239751555" "239754094" [33] "239782806" "239798750" "239801836" "239813038" "239817741" "239824858" "239909013" "240003395" > ne <- which(is.na(MT$nodes$event$time)) > length(ne) [1] 275 > gr <- MT$nodes$event$group[!is.na(MT$nodes$event$groupID)] > length(gr) [1] 18756 > neg <- which(is.na(gr)) > length(neg) [1] 0 > ng <- which(is.na(MT$nodes$group$catName)) > length(ng) [1] 0 > ngo <- which(is.na(MT$nodes$group$org)) > length(ngo) [1] 35 > ngi <- which(is.na(MT$nodes$group$orgID)) > length(ngi) [1] 0 > MT$nodes$group$orgID[is.na(MT$nodes$group$org)] [1] "147707862" "67608552" "7476365" "1099042" "198986314" "198986314" "213302747" "194666305" [9] "7476365" "204135137" "215201845" "86371752" "187840592" "153961372" "59559622" "198986314" [17] "198069024" "96591722" "67608552" "193181718" "214700450" "201177047" "217832051" "10169306" [25] "98668942" "221183678" "221183678" "221183678" "223501599" "114766932" "209818889" "228033479" [33] "87574402" "231551616" "12143808" >
We have two 2-mode networks - members X events and events X groups.