Project 3

Explore large data set

There are many data sets available: planes (1, 2), bikes, taxi, Kaggle, Data world,European Social Survey, V-dem, Food data, …, your own source. Explore selected data set: select variables and explore them (distribution, extreme values, …), explore relations among variables (pairs, clustering, regression, derived quantities, interesting observations), ideas for detailed analyses.

The selected data set has to have at least 10000 units or in the case of temporal data set the product Number of units X Number of time points is at least 10000.

Before starting the analysis send me a note about your selection for confirmation.

n student dataset
1 Alexandra Eremenko Airbnb in Barcelona
2 Софья Кошовец Airline data (January 2019 to October 2020)
3 Sofia Tkachenko Kaggle: Acea Smart Water Analytics
4 Tor Anders Høksås Kaggle: Shipping Analytics, World Merchant Fleet
5 Atakan Çavuşlu Kaggle: UK used car dataset / Volkswagen
6 Rakib Hassan Pran Kaggle: Credit Card Fraud Detection
7 Динара Хайруллина Kaggle: House Sales in King County, USA
8 Решетеева Регина Игоревна Kaggle: Spotify
9


Projects; EDA

ru/hse/eda20/stu/p3.txt · Last modified: 2021/01/18 00:37 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki