Displaying incomplete data on the map

On November 15, 2011 I downloaded from the SURS the data obisk2010.dat in csv format on the number of visitors of cultural events in 2010 in Slovenian communes. The file was edited (line 4 moved to line 2) to fit the format required by read.table.

From GU I downloaded the shape files of Slovenian communes. I saved the unzipped files into subdirectory OB11.

There are 211 communes in Slovenia (on the map), but some cultural events are registered in the period 2004-2010 only in 81 communes (the first entry is the Slovenia total). In 2010 some communes have the value - or 0.

We first read the data and draw the map.

> setwd("C:/Users/Batagelj/test/R/maps")
> library(maptools)
> gpclibPermit()
> SI <- readShapeSpatial("./ob11/OB.shp")
> plot(SI)
> t <- read.table("./ob11/obisk2010.dat",skip=2,header=TRUE,sep=";",nrows=82,
+ colClasses=c("NULL","character","integer"),na.strings=c("-","NA","0"))
> names(t)
[1] "občina" "X2010" 
> names(SI)
 [1] "OB_MID"  "OB_ID"   "OB_IME"  "OB_UIME" "OB_DJ"   "OB_TIP"  "D_OD_G"  "OB_POV"  "Y_C"     "X_C" 
> nrow(t)
[1] 82

We have to connect the map data with the visits data. The link is the name of the commune. The problem is that they are ordered differently and the visits data are incomplete (less than 81). We have also to remove the NA (not available) data. Then we can try to match the names.

> SInams <- SI$OB_UIME
> tnams <- t$občina[2:82]; visits <- t$X2010[2:82] 
> tnams <- tnams[!is.na(visits)]
> visits <- visits[!is.na(visits)]
> (p <- match(tnams,SInams))
 [1]  62 196  78 150  59  82 151 156 191  52  NA  17 187  60  69  76  NA 197  81  54  67  73  57  NA
[25]  77   6  36  38  27  NA  41  48  46  28 179 199  49  29  31  25  26  23  NA  93 148 114 202 100
[49] 119 177 116 123 170 122 110  98 193 209 206 171  96 111 198  89 105 152 154 139 158 155 
> tnams[is.na(p)]
[1] "Gorenja vas - Poljane" "Izola/Isola"           "Koper/Capodistria"     "Lendava/Lendva"       
[5] "Piran/Pirano"         

The p[i] == j tells us that the tnams[i] is at index j in SInams; and p[i] == NA means that tnams[i] is not in SInams. Inspecting the SInams we identify the differences in names and correct them.

> SInams[grep("vas",SInams)]
[1] Gorenja vas-Poljane Trnovska vas 
...
> (i <- which(is.na(p)))
[1] 11 17 24 30 43
> tnams[i] <- c("Gorenja vas-Poljane", "Izola", "Koper", "Lendava", "Piran")
> p <- match(tnams,SInams)
> (i <- which(is.na(p)))
integer(0)

Using the quantile function we encode the values of visits into 5 classes of equal size in represent them as levels of green. We also label the “active” communes.

> (brks <- quantile(visits, seq(0,1,1/4)))
       0%       25%       50%       75%      100% 
    250.0    1542.5    9477.5   21794.5 1201162.0 
> summary(visits)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    250    1542    9478   41180   21790 1201000 
> (brks <- quantile(visits, seq(0,1,1/5)))
       0%       20%       40%       60%       80%      100% 
    250.0     636.4    5034.2   15123.6   29239.8 1201162.0 
> i <- findInterval(visits,brks,all.inside=TRUE)
> ci <- rep(6,211); ci[p] <- i
> library(RColorBrewer)
> pal <- c(brewer.pal(5,'Greens'),'pink')
> plot(SI,col=pal[ci])
> lab <- rep(NA,211); lab[p] <- tnams
> text(coordinates(SI),lab,cex=0.5) 

Picture in PDF / A4.

> pdf("culture.pdf",width=11.7,height=8.3,paper="a4r")
> plot(SI,col=pal[ci],bg="lightyellow")
> title("Slovenia 2010 - Visits of cultural events")
> text(coordinates(SI),lab,cex=0.5)
> dev.off()

alt : culture.pdf

notes/match.txt · Last modified: 2017/11/17 00:45 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki