data formats: fixed, delimited (CSV), structural (RIS, GED,
XML (
SGML), JSON,
HTML,
SVG), download files, collect data from
WWW
Data frame; R
Data collection, Archiving, Privacy, Quality; metadata, missing data, not available
The task has its own name - Feature Engineering and it’s a hellishly laborious, manual and painful process. Feature Engineering is by far more impactful on predictive accuracy than anything you can do in the Modelling phase. The much more important Data Preparation process
Predictive Modelling; Anomaly Detection techniques; Transformations,