Stats
An anonymized dump of all user-contributed content on the Stats Stack Exchange network.
Original source: archive.org
Versions
Stats (by Jan Motl)
Stats_CEB (by Jan Motl)
- A simplified version that eliminate all the attributes with string type.
Dataset details
- Associated task:
- Regression
- Domain:
- Education
- Data types:
- Size:
- 658.4 MB
- Count of tables:
- 8
- Count of rows:
- 1,027,838
- Count of columns:
- 71
- Missing values:
- Yes
- Compound keys:
- No
- Loops:
- Yes
- Type:
- Real
- Instance count:
- 41,793
- Target table:
- users
- Target column:
- Reputation
- Target ID:
- Id
- Target timestamp:
- LastAccessDate
Algorithms
Dataset version | Target | Algorithm | Author text | Measure | Value |
---|---|---|---|---|---|
stats | Reputation | FastProp | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | R2 | 0.9777 |
stats | Reputation | Relboost | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | R2 | 0.9809 |
stats | Reputation | Deep Feature Synthesis | featuretools | R2 | 0.9624 |
stats | Reputation | FastProp | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | RMSE | 0.6533 |
stats | Reputation | Relboost | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | RMSE | 0.6076 |
stats | Reputation | Deep Feature Synthesis | featuretools | RMSE | 0.8499 |
stats | Reputation | FastProp | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | MAE | 0.3361 |
stats | Reputation | Relboost | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | MAE | 0.3114 |
stats | Reputation | Deep Feature Synthesis | featuretools | MAE | 0.3487 |
How to download the dataset
The datasets are publicly available directly from MariaDB database.
- Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
- Use following credentials:
- hostname: db.relational-data.org
- port: 3306
- username: guest
- password: relational
- Export "stats" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).