PTC

Predictive Toxicology Challenge (2000) consists of more than three hundreds of organic molecules marked according to their carcinogenicity on male and female mice and rats.

Original source: www.predictive-toxicology.org (BibTeX)

Versions

  • Toxicology (by Jan Motl)

    • Unresolved issues for molecule TR499 (the prolog file has different content from SMILES file). Molecule table contains only binarized labels for male rats (positive if MR={P, CE, SE}, negative if MR={NE, N}). There is a single missing value - a possible error.

Dataset details

Associated task:
Classification
Domain:
Medicine
Data types:
Size:
8.1 MB
Count of tables:
4
Count of rows:
49,239
Count of columns:
11
Missing values:
Yes
Compound keys:
No
Loops:
No
Type:
Real
Instance count:
343
Target table:
molecule
Target column:
label
Target ID:
molecule_id
Target timestamp:
?

Algorithms

Dataset versionTargetAlgorithmAuthor textMeasureValue
ToxicologylabelPredictor FactoryPredictor FactoryAccuracy0.5951

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "Toxicology" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).