====== Displaying incomplete data on the map ====== On November 15, 2011 I downloaded from the [[http://pxweb.stat.si/pxweb/Database/Obcine/obcine.asp|SURS]] the data {{:notes:zip:obisk2010.zip|obisk2010.dat}} in ''csv'' format on the number of visitors of cultural events in 2010 in Slovenian communes. The file was edited (line 4 moved to line 2) to fit the format required by ''read.table''. From [[|GU]] I downloaded the {{:notes:zip:ob11.zip|shape files}} of Slovenian communes. I saved the unzipped files into subdirectory ''OB11''. There are 211 communes in Slovenia (on the map), but some cultural events are registered in the period 2004-2010 only in 81 communes (the first entry is the Slovenia total). In 2010 some communes have the value ''-'' or ''0''. We first read the data and draw the map. > setwd("C:/Users/Batagelj/test/R/maps") > library(maptools) > gpclibPermit() > SI <- readShapeSpatial("./ob11/OB.shp") > plot(SI) > t <- read.table("./ob11/obisk2010.dat",skip=2,header=TRUE,sep=";",nrows=82, + colClasses=c("NULL","character","integer"),na.strings=c("-","NA","0")) > names(t) [1] "občina" "X2010" > names(SI) [1] "OB_MID" "OB_ID" "OB_IME" "OB_UIME" "OB_DJ" "OB_TIP" "D_OD_G" "OB_POV" "Y_C" "X_C" > nrow(t) [1] 82 {{:notes:pics:SI.png}} We have to connect the map data with the visits data. The link is the name of the commune. The problem is that they are ordered differently and the visits data are incomplete (less than 81). We have also to remove the ''NA'' (not available) data. Then we can try to match the names. > SInams <- SI$OB_UIME > tnams <- t$občina[2:82]; visits <- t$X2010[2:82] > tnams <- tnams[!is.na(visits)] > visits <- visits[!is.na(visits)] > (p <- match(tnams,SInams)) [1] 62 196 78 150 59 82 151 156 191 52 NA 17 187 60 69 76 NA 197 81 54 67 73 57 NA [25] 77 6 36 38 27 NA 41 48 46 28 179 199 49 29 31 25 26 23 NA 93 148 114 202 100 [49] 119 177 116 123 170 122 110 98 193 209 206 171 96 111 198 89 105 152 154 139 158 155 > tnams[is.na(p)] [1] "Gorenja vas - Poljane" "Izola/Isola" "Koper/Capodistria" "Lendava/Lendva" [5] "Piran/Pirano" The ''p[i] == j'' tells us that the ''tnams[i]'' is at index ''j'' in ''SInams''; and ''p[i] == NA'' means that ''tnams[i]'' is not in ''SInams''. Inspecting the ''SInams'' we identify the differences in names and correct them. > SInams[grep("vas",SInams)] [1] Gorenja vas-Poljane Trnovska vas ... > (i <- which(is.na(p))) [1] 11 17 24 30 43 > tnams[i] <- c("Gorenja vas-Poljane", "Izola", "Koper", "Lendava", "Piran") > p <- match(tnams,SInams) > (i <- which(is.na(p))) integer(0) Using the ''quantile'' function we encode the values of ''visits'' into 5 classes of equal size in represent them as levels of green. We also label the "active" communes. > (brks <- quantile(visits, seq(0,1,1/4))) 0% 25% 50% 75% 100% 250.0 1542.5 9477.5 21794.5 1201162.0 > summary(visits) Min. 1st Qu. Median Mean 3rd Qu. Max. 250 1542 9478 41180 21790 1201000 > (brks <- quantile(visits, seq(0,1,1/5))) 0% 20% 40% 60% 80% 100% 250.0 636.4 5034.2 15123.6 29239.8 1201162.0 > i <- findInterval(visits,brks,all.inside=TRUE) > ci <- rep(6,211); ci[p] <- i > library(RColorBrewer) > pal <- c(brewer.pal(5,'Greens'),'pink') > plot(SI,col=pal[ci]) > lab <- rep(NA,211); lab[p] <- tnams > text(coordinates(SI),lab,cex=0.5) Picture in PDF / A4. > pdf("culture.pdf",width=11.7,height=8.3,paper="a4r") > plot(SI,col=pal[ci],bg="lightyellow") > title("Slovenia 2010 - Visits of cultural events") > text(coordinates(SI),lab,cex=0.5) > dev.off() alt : culture.pdf