TPCH

TPCH

TPC-H is the benchmark published by the Transaction Processing Performance Council (TPC) for decision support.

Original source: www.tpc.org

Versions

  • Tpch (by Jan Motl)

Dataset details

Associated task:
Regression
Domain:
Retail
Data types:
Size:
2 GB
Count of tables:
8
Count of rows:
6,885,051
Count of columns:
61
Missing values:
No
Compound keys:
Yes
Loops:
Yes
Type:
Synthetic
Instance count:
148,534
Target table:
customer
Target column:
c_acctbal
Target ID:
c_custkey
Target timestamp:
?

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "tpch" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).