PTE
A database from The Predictive Toxicology Evaluation Challenge (1997). The task is to predict whether the compound is carcinogenic, or not.
Original source: link.springer.com (BibTeX)
Versions
PTE (by Alev Mutlu)
- Fixed typo in pte_drug: renamed one of two "d8" to the missing "d82". Duplicate "d110" and missing "d23", "d85" and "d208" were not fixed. contains duplicate tuples. Beware of "Set" attributes (e.g. in pte_six_ring) that are typed as varchar while they should be typed as set.
Dataset details
- Associated task:
- Classification
- Domain:
- Medicine
- Data types:
- Size:
- 4.4 MB
- Count of tables:
- 38
- Count of rows:
- 29,762
- Count of columns:
- 76
- Missing values:
- No
- Compound keys:
- No
- Loops:
- Yes
- Type:
- Real
- Instance count:
- 299
- Target table:
- pte_active
- Target column:
- is_active
- Target ID:
- drug_id
- Target timestamp:
- ?
How to download the dataset
The datasets are publicly available directly from MariaDB database.
- Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
- Use following credentials:
- hostname: db.relational-data.org
- port: 3306
- username: guest
- password: relational
- Export "PTE" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).