PTE

A database from The Predictive Toxicology Evaluation Challenge (1997). The task is to predict whether the compound is carcinogenic, or not.

Original source: link.springer.com (BibTeX)

Versions

  • PTE (by Alev Mutlu)

    • Fixed typo in pte_drug: renamed one of two "d8" to the missing "d82". Duplicate "d110" and missing "d23", "d85" and "d208" were not fixed. contains duplicate tuples. Beware of "Set" attributes (e.g. in pte_six_ring) that are typed as varchar while they should be typed as set.

Dataset details

Associated task:
Classification
Domain:
Medicine
Data types:
Size:
4.4 MB
Count of tables:
38
Count of rows:
29,762
Count of columns:
76
Missing values:
No
Compound keys:
No
Loops:
Yes
Type:
Real
Instance count:
299
Target table:
pte_active
Target column:
is_active
Target ID:
drug_id
Target timestamp:
?

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "PTE" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).