Carcinogenesis

Alternative names: PTE

For prediction of whether a given molecule is carcinogenic or not. The dataset contains 182 positive carcinogenicity tests and 148 negative tests.

Original source: kt.ijs.si (BibTeX)

Versions

  • Carcinogenesis (by Janez Kranjc)

    • Foreign key constraints violated. Specifically, table "atom" has a drug "d115" that is missing in "canc" table. The dataset contains just 329 instances, the expected number is 330.

Dataset details

Associated task:
Classification
Domain:
Medicine
Data types:
Size:
21 MB
Count of tables:
6
Count of rows:
27,570
Count of columns:
23
Missing values:
No
Compound keys:
No
Loops:
Yes
Type:
Real
Instance count:
329
Target table:
canc
Target column:
class
Target ID:
drug_id
Target timestamp:
?

Algorithms

Dataset versionTargetAlgorithmAuthor textMeasureValue
CarcinogenesisclassAlephWordification: Propositionalization by unfolding relational data into bags of words Accuracy0.5532
CarcinogenesisclassPredictor FactoryPredictor FactoryAccuracy0.6689
CarcinogenesisclassRelFWordification: Propositionalization by unfolding relational data into bags of words Accuracy0.6018
CarcinogenesisclassRSDClowdFlowsAccuracy0.55
CarcinogenesisclassRSDWordification: Propositionalization by unfolding relational data into bags of words Accuracy0.6049
CarcinogenesisclassWordificationClowdFlowsAccuracy0.8
CarcinogenesisclassWordificationWordification: Propositionalization by unfolding relational data into bags of words Accuracy0.6231

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "Carcinogenesis" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).