Lahman

Lahman’s baseball database contains complete batting and pitching statistics from 1871 to 2014, plus fielding statistics, standings, team stats, managerial records, post-season data, and more.

Original source: www.seanlahman.com

Versions

  • Lahman_2014 (by Jan Motl)

    • Added foreign key constrains by removal of violating samples

Dataset details

Associated task:
Regression
Domain:
Sport
Data types:
Size:
74.1 MB
Count of tables:
25
Count of rows:
470,225
Count of columns:
353
Missing values:
Yes
Compound keys:
No
Loops:
Yes
Type:
Real
Instance count:
23,111
Target table:
salaries
Target column:
salary
Target ID:
teamID, playerID, lgID
Target timestamp:
yearID

Algorithms

Dataset versionTargetAlgorithmAuthor textMeasureValue
lahman_2014salaryFastPropgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesR20.788
lahman_2014salaryRelboostgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesR20.8395
lahman_2014salaryDeep Feature SynthesisfeaturetoolsR20.7797
lahman_2014salaryFastPropgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesRMSE1402960
lahman_2014salaryRelboostgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesRMSE1220382
lahman_2014salaryDeep Feature SynthesisfeaturetoolsRMSE1431516
lahman_2014salaryFastPropgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesMAE765292
lahman_2014salaryRelboostgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesMAE666548
lahman_2014salaryDeep Feature SynthesisfeaturetoolsMAE769939

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "lahman_2014" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).