Discriminant analysis - interpretation

DISCRIMINANT ANALYSIS: 8 NATIONS AND 42 VARIABLES

Discriminant analysis; Maps

The goal of discriminant analysis is to find which variables distinguish (discriminate) as much as possible the nations described by Joel Garreau (1981).

We first made the classification of counties into eight large teritorial units - nations in USA (one nation is in Canada). We also selected variables that are congruent with the monography Nine Nations and are available. Some variables were excluded because of muticolinearity. At the end we have 42 variables:

 [1] "median age"                         "civilian labor force unemplyment rate"
 [3] "% bachelor's degree or hi. 25 +"    "% change housing units 90-00" 
 [5] "median household income"            "% people in poverty"
 [7] "per capita personal income"         "% change population 90-00"
 [9] "population per square mile"         "% female populsation"
[11] "% Black or African American"        "% Am. Indian or Alaska Native"
[13] "% Asian"                            "% Hispanic or Latino"      
[15] "birth rate"                         "death rate"
[17] "infant mortality"                   "water use per capita"
[19] "% pop.under18"                      "% pop.over85"
[21] "% land.farms"                       "% emply.ind.CONSTRUCTION"
[23] "% emply.ind.MANUFACTORING"          "% emply.ind.TRANSPORT.WAREHOUSING"
[25] "% emply.ind.FINANC.INSUR"           "% emply.ind.PROFscientTECH" 
[27] "% emply.ind.EDUC.HEALTH"            "% 25overLESS9thGRADE"
[29] "% employ.FARMING"                   "% employ.GOV.stateLoc"
[31] "% OWNERoccupiedHousingUnits"        "% occupiedHousingUnitsLackingPlumb"
[33] "% RURALpopul"                       "% CHANGEurban90to00"
[35] "CHANGEperCapitaIncome89to99"        "GroundWaterUsePerCapita"
[37] "% NET.DOMESTIC.MIGRATIONS"          "% NativePopulationBornInStateOfRes"
[39] "R.LABOR.FORCEmaleFemale"            "% VOTING.DEMOCRATESoverREPUBLICANS"
[41] "% PUBLIC.SCHOOL.ENROLNEMT"          "% CHANGEpverty95to00"  

The selected variables are discriminating the nations quite well. The variables that discriminate the most are the race variables, young and older population, poverty, income, education, land farms, native population born in the state of residence, and unemployment rate.

The first four discriminant functions are discriminating nations quite well, much less the last three discriminat functions. The eigenvalues for each discriminant function are:

(1) 35.33   (2) 24.44   (3) 19.04   (4) 17.47   (5) 8.66   (6) 6.58   (7) 4.48

Nevertheless all seven discriminant functions are interpreted. For the interpretation we consider the loadings of discriminant functions (these are the coefficients for the linear combination that defines a discriminant function) and centroids (means of the discriminant functions for each nation).

The first discriminant function is mostly defined by the following variables (loadings higher than 0.50):

POP405200D - % Hispanic or Latino population (-0.850)       
P.pop.under18 - % population under 18 years (-0.775) 
POP255200D - % Black or African American population (0.675)          
P.pop.over85 - % population over 85 years (-0.637) 
P.25overLESS9thGRADE - % population 25 years or over with less than 9th grade (0.553)
IPE120200D - % population in poverty (0.531)

The interpretation of the first discriminant function is the following one: higher values of the first discriminant function are when there is lower percentage of Hispanic or Lation population, lower percentage of young population (below 18 years), higher percentage of Black or African American population, lower percentage of old population (above 85 years), higher percentage of low educated population and higher percentage of population in poverty.

The ordered from the lowest to the highest mean of the first discriminant function (centroids) are:

Mexamerica    -2.945949 
Bread basket  -1.547266          
Empty Quarter -1.377872 
Ecotopia      -1.207992 
Islands       -0.521521 
New England   -0.487371  
Foundry        0.000119  
Dixie          1.930004 

This means that Mexamerica and Dixie are at the extremes, are the most different. In the Mexamerica there is typically the highest percentage of Hispanic or Latino population but the lowest percentage of Black or African American population, highest percentage of the youngest and the oldest population with lowest education and the highest rate of poverty. Similar properties have also Bread basket, Emty Quarter, and Ecotopia. The oposite is true for Dixie.

The second discriminant function is defined mostly with only one variable: percentage of Hispanic or Lation population (POP405200D) (-1.22). High value on the second discriminant function are defined by lower values on the percentage of Hispanic or Lation population. The ordered centroids are:

Mexamerica     -4.268  
Islands        -2.252 
Ecotopia       -1.032 
Empty Quarter  -0.459 
Dixie          -0.327 
Foundry         0.750 
Bread basket    0.927  
New England     1.040 

Here the most different nations are Mexamerica with the highest valuse of the percentage of Hispanic or Lation population and New England with the lowest.

A nice presentation of the nation's centroids in the two-dimensional space defined by these two discriminant functions is presented the picture:

9natld12.pdf

High values on the third discriminant function are mostly connected by high value on the percentage of land farms and high percentage of Black or African American population. The ordered centroids are:

New England    -2.109 
Ecotopia       -2.048 
Empty Quarter  -1.949  
Foundry        -0.937 
Islands        -0.645  
Dixie           0.239  
Bread basket    0.753  
Mexamerica      0.771 

Here the most typical nations with the lowest percentage of farm land and the lowest percentage of Black or African American population are New England, Ecotopia, and Empty Quarter. The oposite but less typical is in Mexamerica and Bread Basket.

The fourth discriminant function is mostly defined by

median houshold income (-1.042)
% Hispanic or Latino population (-0.711) 
% native population born in the state of residence (-0.615) 
% people in poverty (-0.583)
% employed in manufactoring (-0.553) 
% population 25 years old and over with bachelor's degree or higher (0.553) 

The correlation between income and poverty is very high and negative. Here we have regression coefficients and this is the reason why both variables have negative loadings. High value on the fourth discriminant function is mostly connected by low household income, and also with low percentage of native population born in the state of residence, with low percentage of Hispanic or Lation population, low percentage of employed in manufactoring, low percentage low percentage of population in poverty, and high education of population.

The centroids are:

New England    -1.559  
Foundry        -1.516 
Mexamerica     -1.004  
Ecotopia       -0.134  
Dixie           0.177  
Bread basket    0.210  
Islands         0.626  
Empty Quarter   1.817 

Empty Quarter has low household income and also low percentage of Hispanic or Lation population, low percentage of native born in the state of residence, low percentage employed in manufactoring and hight education. The opisite is true for New England and Foundry.

As was mentioned before the rest of discriminant functions less strongly discriminate nations. The fifth discriminant function is defined by the following variables:

% population 25 years or over with less than 9th grade (0.709) % population 25 years old and over with bachelor's degree or higher (0.672) rate of voating democrates over republicans (0.533) civilian labor force unemployment rate (-0.519) % native population born in the state of residence (-0.515)

The ordered centroids of nations are:

Foundry         -0.5228  
Empty Quarter   -0.2202  
Mexamerica       0.0208  
Dixie            0.0373 
Bread basket     0.0435 
Ecotopia         0.1092 
Islands          1.9537 
New England      2.4103  

Typically New England and also Islands (South Florida) have higher percentage very low and very high educated population, higher rate of democrate voaters, low unemployment rate and low percentage natvive born poplation in the state of residence. The oposite but with less extent is true for Foundry.

The sixth discriminant function is mostly defined by percentage of Asian population (-0.756) and unemplyment rate (-0.508). Th centroids are:

Ecotopia       -1.98706 
Islands        -1.38047 
Bread basket   -0.04461 
Dixie          -0.00244 
Foundry         0.09901 
Mexamerica      0.20593 
Empty Quarter   0.30408 
New England     0.45916 

The results show that especially Ecotopia has the highest percentage of Asian population the highest unemplyment rate.

The last discriminant function is defined by

% population 25 years old and over with bachelor's degree or higher (-0.858)
% Asian population (-0.682)
% population under 18 years (-0.645)
% of professors, scientists and technicians (0.537)

Centroids are:

Ecotopia       -0.36186
New England    -0.22428
Mexamerica     -0.04802
Dixie          -0.01740
Bread basket   -0.00498
Empty Quarter   0.01749
Foundry         0.08922
Islands         4.23150

South Florida (Islands) has extremly typical low high educated population, less percentage of Asian population, lower percentage of young population and high percentage of professors, scientists and technicians.

The results obtained by the discriminant analysis confirm most of the descriptions of Joel Garreau nevertheless he described the situation in late seveties/early eighties and the data are mostly from 2000.

notes/clu/counties/ldai.txt · Last modified: 2017/04/12 18:48 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki