The original data are on several excel files and contain 6849 variables (see the file Mastdata variables pselect1.xls
).
From these data we selected 1125 variables and stored them on the file(s)
counties.csv
, counties*.xls
- 1127 = 2 (STCOU
, Areaname
) + 1125 source variables; 3201 units (counties + states + USA)
selection.csv
- 116 = 2 (STCOU
, Areaname
) + 114 selected source variables; 3201 units (counties + states + USA)
vars.csv
- subset of 91 = 2 + 27 selected source variables + 62 computed variables
usc.paj
- 3111 mainland US counties neighborhood relation by Luc Anselin, http://sal.uiuc.edu/weights/index.html + additional data from ftp://spo.nos.noaa.gov/datasets/CADS/Data/references/reference_cnty.zip transformed in to Pajek format by Vladimir Batagelj, June 11, 2006.
usc3110.paj
- obtained from usc.paj
by removing vertex (2916; South Boston) for which no data are available in the vars.csv
.
The list of vertex labels is on the file usc3110lab.csv
. We extract and reorder the data units in the same order and extract the selected variables and standardize them. The final
data to be used in Pajek are saved at usc3110S.csv
.
To be able to visually chech the resulting partitions we would like to represent them on the maps. Again we have problem with alignment of units / vertices with the corresponding names of the shapes in the description of the map.
The correspondence can be established between data units Areaname
variable and pairs (Name_1
, Name_2
) of county shapes.