NUTS

The current NUTS 2013 classification is valid from 1 January 2015 and lists 98 regions at NUTS 1, 276 regions at NUTS 2 and 1342 regions at NUTS 3 level.

17. april 2017 ob 13:00

21. april 2017 ob 12:05

Mislim, da lahko prijavimo tudi ta prispevek. Vidim dve poti. V obeh primerih je potrebno:

  1. pretvoriti podatke v simbolno obliko (za US counties že imamo, za Evropo še ne)
  2. ustvariti relacijo sosednosti (za US counties jo že imam, za Evropo jo dobimo iz datoteke shape (opis zemljevida) net from shape

Prva pot:

  1. relacijo sosednosti predelamo v uteženo Pajkovo omrežje, kjer so uteži simbolne različnosti med krajišči povezav. Napisati je potrebno ustrezno rutino v R ali Pythonu.
  2. nad omrežjem uporabimo razvrščanje z omejitvami iz Pajka - opisano v naši knjigi Understanding … Če je čas, lahko stvar sprogramiramo tudi v R-ju (po mojem ne bi smelo biti več kot 50 vrstic programa).
  3. predstavnike lahko naračunamo naknadno iz dobljene (hierarhične) razvrstitve - dodatna funkcija v R-ju.

Druga pot (če bo več časa, ali za kasneje):

  1. razvijemo postopek, ki uporablja našo kriterijsko funkcijo tudi za razvrščanje in sproti določa predstavnike.

Morda bi za srečanje obljubili le prvo pot. Če bomo s časom zelo na tesnem, lahko vse skupaj naredimo le za US counties. Glede na to, da s Simono prijavljava druga prispevka, bi pri tem prispevku bila prvi avtor Nataša - v navadi je, da vsak udeleženec srečanja nastopi le enkrat kot prvi avtor.

26-28. april 2017

Transforming EuroStat NUTS 3

Transforming EuroStat NUTS 3 population pyramids data into data frames.

In the file demo_r_pjangrp3.tsv I first replaced all ”,” and “\t” with ”;”. I also corrected the header and saved the content to the file NUTSpopyr.csv. Using a short program in R

> setwd("C:/Users/batagelj/data/popPyr")
> T <- read.csv2("NUTSpopyr.csv",colClasses="character")
> age <- c("Y_LT5", "Y5-9", "Y10-14", "Y15-19", "Y20-24", "Y25-29", "Y30-34", "Y35-39",
+          "Y40-44", "Y45-49", "Y50-54", "Y55-59", "Y60-64", "Y65-69", "Y70-74", "Y75-79",
+          "Y80-84", "Y85-89", "Y_GE85", "Y_GE90", "TOTAL", "UNK")
> a <- factor(T$age,levels=age,ordered=TRUE)
> g <- factor(T$geo); s <- factor(T$sex)
> m <- length(levels(g)); k <- length(levels(a)); n <- nrow(T)
> F <- matrix(0,nrow=m,ncol=k); M <- matrix(0,nrow=m,ncol=k)
> for(i in 1:n){
+    if(s[i]=="T") next
+    freq <- as.numeric(unlist(strsplit(T$y2016[i]," "))[1])
+    j <- as.integer(g[i]); b <- as.integer(a[i])
+    if(s[i]=="F") F[j,b] <- freq else M[j,b] <- freq 
+ }
> na <- levels(a); ng <- levels(g)
> rownames(F) <- ng; colnames(F) <- na
> rownames(M) <- ng; colnames(M) <- na
> s <- nchar(ng)==5
> F3 <- F[s,]; M3 <- M[s,]
> save(F3,M3,file="NUTSpopyr.Rdata")
> F3df <- as.data.frame(F3); M3df <- as.data.frame(M3)
> write.csv2(F3df,file="F3.csv"); write.csv2(M3df,file="M3.csv")

I produced two matrices F3 (female) and M3 (male) with pyramids. Rows represent NUTS3-units and columns correspond to age classes. They are saved on the file NUTSpopyr.Rdata. Afterward I transformed both matrices into corresponding data frames F3df and M3df and saved them on files F3.csv and M3.csv.

The data are relatively complete. Only data for Albania (Excel lines 2-13) and MKXXX (Macedonia, Excel line 999) are missing. In the Excel lines 852-859 (IE - Ireland ?) the data in the column T (Y_GE85) are missing. It seems that this is not a problem because Y_GE85 = Y85-90 + Y_GE90 . Therefore the columns Y_GE85, TOTAL and UNK are needless.

nutspopyr.zip

To do: Transform the data into Clamix format.

Drawing maps and creating neigbours relation

NUTS_2013_60M_SH

> library(maptools)
> setwd("C:/Users/batagelj/data/popPyr/NUTS_2013_60M_SH/data")
> NUTS <- readShapeSpatial("NUTS_RG_60M_2013.shp")
> plot(NUTS) 
> names(NUTS)
[1] "NUTS_ID"    "STAT_LEVL_" "SHAPE_AREA" "SHAPE_LEN" 
> head(NUTS$NUTS_ID)
[1] AT    AT1   AT11  AT111 AT112 AT113
1951 Levels: AT AT1 AT11 AT111 AT112 AT113 AT12 AT121 AT122 AT123 AT124 ... UKN05
> pdf("NUTS3pop.pdf",width=11.7,height=8.3,paper="a4r")
> plot(NUTS,xlim=c(-24,44),ylim=c(28,70),lwd=0.05,bg="skyblue",col="wheat",main="NUTS 3")
> dev.off()
 
> SEPA <- readShapeSpatial("NUTS_SEPA_LI_2013.shp")
> plot(SEPA)
 
> LB <- readShapeSpatial("NUTS_LB_2013.shp")
> plot(LB)
> plot(LB,main="Labels",lwd=0.05)
 
> BN <- readShapeSpatial("NUTS_BN_60M_2013.shp")
> pdf("BNpop.pdf",width=11.7,height=8.3,paper="a4r")
> plot(BN,xlim=c(-24,44),ylim=c(28,70),lwd=0.05,bg="skyblue",main="NUTS 3 / BN")
> dev.off()

nuts3pop.pdf, labels.pdf, bnpop.pdf

Changing projection
> proj4string(NUTS)
[1] NA
> proj4string(NUTS) <- "+proj=longlat +ellps=GRS80 +no_defs"
> EU <- spTransform(NUTS, 
+ CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 
+ +units=m +nadgrids=@null +no_defs"))
> plot(EU)

> bbox(EU)
       min      max
x -6880205  6214499
y -2433677 11445837
> bbox(NUTS)
        min     max
x -61.80593 55.8258
y -21.35013 71.1270
> proj4string(NUTS)
[1] "+proj=longlat +ellps=GRS80 +no_defs"
> proj4string(EU)
[1] "+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 
     +units=m +nadgrids=@null +no_defs"

> plot(NUTS,xlim=c(-24,44),ylim=c(28,70),lwd=0.05,bg="skyblue",col="white",main="NUTS3")
> plot(EU,xlim=c(-2100000,4600000),ylim=c(3200000,11200000),lwd=0.05,
+ bg="skyblue",col="white",main="NUTS3 - Google")
Selecting levels
> CountryBorder <- NUTS[NUTS@data$STAT_LEVL_ == 0, ]
> pdf("EUcountries.pdf",width=11.7,height=8.3,paper="a4r")
> plot(CountryBorder,xlim=c(-24,44),ylim=c(28,70),lwd=0.05,
+ bg="skyblue",col="white",main="NUTS0 - countries")
> dev.off()

eucountries.pdf

Creating neighbors network
> sp2Pajek <- function(sp,file="neighbors.net",name=0,queen=TRUE){
+   library(spdep)
+   nbs <- poly2nb(sp,queen=queen)
+   n <- length(nbs); L <- card(nbs)
+   xy <- coordinates(sp)
+   IDs <- as.character(if(name>0) sp[[name]] else 1:n)
+   net <- file(file,"w")
+   cat("% sp2Pajek:",date(),"\n*vertices",n,"\n",file=net)
+   for(i in 1:n) cat(i,' "',IDs[i],'" ',xy[i,1],' ',xy[i,2],' 0.5\n',sep='',file=net)
+   cat("*edgeslist\n",file=net)
+   for(i in 1:n) if(L[i]>0) cat(i,nbs[[i]],"\n",file=net)
+   close(net)
+ }
 
> NUTS3 <- NUTS[NUTS@data$STAT_LEVL_ == 3, ]
> plot(NUTS3,xlim=c(-24,44),ylim=c(28,70),lwd=0.05,bg="skyblue",col="white",main="NUTS3")
> cat(date(),"\n"); sp2Pajek(NUTS3,name="NUTS_ID",file="NUTS.net"); cat(date(),"\n")
Fri Apr 28 09:12:46 2017 
Fri Apr 28 09:12:47 2017  

We obtain o neighbors relation on the file NUTS.net.

nutsnet.zip

In its current version the neighbors network is not connected - there exist several connected components and isolated nodes.

NUTS_2013_01M_SH

These are more detailed shapes data. I repeated the procedures on them.

> setwd("C:/Users/batagelj/data/popPyr/NUTS_2013_01M_SH/data")
> NUTS <- readShapeSpatial("NUTS_RG_01M_2013.shp")
> plot(NUTS)
> pdf("NUTS3popD.pdf",width=11.7,height=8.3,paper="a4r")
> plot(NUTS,xlim=c(-24,44),ylim=c(28,70),lwd=0.05,bg="skyblue",col="white",main="NUTS3 - detailed")
> dev.off()

nuts3popd.pdf

Creating neighbors network from detailed map and drawing map with labels.

> setwd("C:/Users/batagelj/data/popPyr/NUTS_2013_01M_SH/data")
> NUTSd <- readShapeSpatial("NUTS_RG_01M_2013.shp")
> NUTSd3 <- NUTSd[NUTSd@data$STAT_LEVL_ == 3, ]
> summary(NUTSd3)
Object of class SpatialPolygonsDataFrame
Coordinates:
        min      max
x -63.15346 55.83663
y -21.38731 71.18532
Is projected: NA 
proj4string : [NA]
Data attributes:
    NUTS_ID       STAT_LEVL_   SHAPE_AREA          SHAPE_LEN       
 AT111  :   1   Min.   :3    Min.   : 0.001292   Min.   :  0.1503  
 AT112  :   1   1st Qu.:3    1st Qu.: 0.090017   1st Qu.:  2.0010  
 AT113  :   1   Median :3    Median : 0.222539   Median :  3.3631  
 AT121  :   1   Mean   :3    Mean   : 0.507958   Mean   :  4.7413  
 AT122  :   1   3rd Qu.:3    3rd Qu.: 0.623426   3rd Qu.:  5.5373  
 AT123  :   1   Max.   :3    Max.   :21.726085   Max.   :155.9323  
 (Other):1474                                                      
> plot(NUTSd3,xlim=c(-24,44),ylim=c(28,70),lwd=0.05,bg="skyblue",col="white",main="NUTS3 detailed")
> cat(date(),"\n"); sp2Pajek(NUTSd3,name="NUTS_ID",file="NUTSd.net"); cat(date(),"\n")
 
> P <- read.csv("weak.clu",head=FALSE,skip=1)$V1
> length(P)
[1] 1480
> head(P)
[1] 1 1 1 1 1 1
> table(P)
P
   0    1    2    3    4    5    6    7    8    9   10 
  40 1177    4    5   57    2    8   13    2    9  163 
> 
> library(RColorBrewer)
> pal <- brewer.pal(n = 12, name = "Set3")
> pdf("NUTSd3lab.pdf",width=11.7,height=8.3,paper="a4r")
> plot(NUTSd3,xlim=c(-24,44),ylim=c(28,70),lwd=0.05,bg="skyblue",col=pal[P+1])
> title("NUTS3 - detailed")
> text(coordinates(NUTSd3),labels=as.character(NUTSd3$NUTS_ID),cex=0.1)
> dev.off()

In the network NUTSd.net every edge is listed twice. We remove duplicated edges using Pajek and save the network to the file NUTSdSimple.net. On the file weak.clu the partition of nontrivial weak components is saved. All isolated nodes are in the class 0. I manually constructed additional edges collected in a relation 3:“sea links” (see file sea.txt) that make the network connected. The expanded network has name NUTSdSea.net.

nutsd3lab.pdf, nutsdsimple.zip, nutsdsea.pdf

regular neighbors - blue; sea links - red.

Coloring countries

> NUTSd0 <- NUTSd[NUTSd@data$STAT_LEVL_ == 0, ]
> labs <- as.character(NUTSd0$NUTS_ID)
> labs
 [1] "AT" "BE" "BG" "CY" "CZ" "CH" "DE" "DK" "ES" "EL" "EE" "FR" "FI" "IS" "IT"
[16] "HU" "IE" "HR" "NO" "ME" "MT" "NL" "MK" "LU" "LI" "LT" "LV" "PL" "SE" "RO"
[31] "PT" "TR" "SK" "SI" "UK"
> colors <- c("green","blue","blue","red","yellow","blue","red","green", 
+ "blue","green","green","green","blue","green","yellow","blue","red",
+ "green","red","blue","red","yellow","red","yellow","yellow","green",
+ "red","blue","yellow","yellow","red","yellow","red","red","yellow")
> pdf("NUTS0.pdf",width=11.7,height=8.3,paper="a4r")
> plot(NUTSd0,xlim=c(-24,44),ylim=c(28,70),lwd=0.1,bg="skyblue",col=colors)
> title("NUTS0 / Europe - detailed")
> text(coordinates(NUTSd0),labels=labs,cex=0.3)
> dev.off()

URLs

notes/da/data/nuts.txt · Last modified: 2017/04/30 21:07 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki