Project 1

Make a data frame

From The World Factbook construct a data frame in which units (rows) are world countries with names from the “book” and variables (columns):

  1. the first variable V1 contains the two-character code of the country ISO - for labels in visualizations;
  2. the second, third, fourth, and fifth variables are as given in the table (select a row from the table and send me an e-mail for confirmation or select your variables V2, V3, V4, and V5 for which you expect that are somehow related, and send me an e-mail for confirmation). Try to avoid variables with many missing values;
  3. if you find it useful for your exploration you can add additional variables from the “book”.

To make variables comparable, you can select also derived variables such as (Labor force / Population), (Annual passenger traffic on registered air carriers / Population), or (Total airports / Land area).

The list of available variables can be seen from a description of the selected country - for example Italy. Here is also a world map.

For visual inspection, the additional variable region (North America, Central America, South America, Europe, Africa, Middle East, Central Asia, South Asia, East & Southeast Asia, Australia & Oceania, Antarctica) would be very useful. This partition can be constructed from the region's countries lists.

n student V1 V2 V3 V4 V5
1 Kayode Ahmed ISO Physicians density Hospital bed density Total fertility rate Total population life
expectancy at birth
2 Ilia Kazakov ISO Birth rate Death rate Net migration rate Urban population %
3 Александр Матвеев ISO Urban population Carbon dioxide emissions Energy consumption
per capita
Real GDP
4 Enrique Nuñez ISO Real Growth Rate Inflation Rate Exchange Rate Purchasing Power Parity
5 Артём Кузнецов ISO Labor force Population Urban population Land area + Real GDP
6 ISO
7 ISO
8 ISO
9 ISO
10 ISO
11 ISO
12 ISO
13 ISO
14 ISO
15 ISO

Save the created data frame as a CSV file.

Explore the collected data. For visualization on a map see Maps or rworldmap.

Write a report and save it as a PDF file. Put the report and CSV file into a ZIP file and send it to me.

Hint: The factbook data (for the year 2020) are available as a JSON file at GitHub / Download

> library(jsonlite)
> J <- fromJSON(readLines("factbook.json"))
> str(J,max.level=2)
> J$countries[[4]]$data$name
[1] "Albania"
> J$countries$albania$data$name
[1] "Albania"
> names(J$countries)
> names(J$countries$albania$data)


Example extracting a selected variable from the Factbook.



Students; EDA

ru/hse/eda24/stu/p1.txt · Last modified: 2024/04/09 04:33 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki