Stats

An anonymized dump of all user-contributed content on the Stats Stack Exchange network.

Original source: archive.org

Versions

  • Stats (by Jan Motl)

  • Stats_CEB (by Jan Motl)

    • A simplified version that eliminate all the attributes with string type.

Dataset details

Associated task:
Regression
Domain:
Education
Data types:
Size:
658.4 MB
Count of tables:
8
Count of rows:
1,027,838
Count of columns:
71
Missing values:
Yes
Compound keys:
No
Loops:
Yes
Type:
Real
Instance count:
41,793
Target table:
users
Target column:
Reputation
Target ID:
Id
Target timestamp:
LastAccessDate

Algorithms

Dataset versionTargetAlgorithmAuthor textMeasureValue
statsReputationFastPropgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesR20.9777
statsReputationRelboostgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesR20.9809
statsReputationDeep Feature SynthesisfeaturetoolsR20.9624
statsReputationFastPropgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesRMSE0.6533
statsReputationRelboostgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesRMSE0.6076
statsReputationDeep Feature SynthesisfeaturetoolsRMSE0.8499
statsReputationFastPropgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesMAE0.3361
statsReputationRelboostgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesMAE0.3114
statsReputationDeep Feature SynthesisfeaturetoolsMAE0.3487

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "stats" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).