Mutagenesis

Mutagenesis

The dataset comprises of 230 molecules trialed for mutagenicity on Salmonella typhimurium. A subset of 188 molecules is learnable using linear regression. This subset was later termed the ”regression friendly” dataset. The remaining subset of 42 molecules is named the ”regression unfriendly” dataset. Note that authors use this dataset with a variable set of the background knowledge (count of features in ”molecule” table) and consequently, the reported accuracies do not have to be directly comparable.

Original source: www.cs.ox.ac.uk (BibTeX)

Versions

  • Mutagenesis (by Jan Motl)

  • Mutagenesis_188 (by Janez Kranjc)

  • Mutagenesis_42 (by Janez Kranjc)

Dataset details

Associated task:
Classification
Domain:
Medicine
Data types:
Size:
900 KB
Count of tables:
3
Count of rows:
10,324
Count of columns:
14
Missing values:
No
Compound keys:
No
Loops:
Yes
Type:
Real
Instance count:
188
Target table:
molecule
Target column:
mutagenic
Target ID:
molecule_id
Target timestamp:
?

Algorithms

Dataset versionTargetAlgorithmAuthor textMeasureValue
mutagenesismutagenicAlephFast Relational Learning using Bottom Clause Propositionalization with Artificial Neural NetworksAccuracy0.8085
mutagenesismutagenicAlephkFOIL: Learning Simple Relational KernelsAccuracy0.734
mutagenesismutagenicAlephWordification: Propositionalization by unfolding relational data into bags of words Accuracy0.6011
mutagenesismutagenicCILP++Fast Relational Learning using Bottom Clause Propositionalization with Artificial Neural NetworksAccuracy0.892
mutagenesismutagenicCoTReCTransductive Relational Classification in the Co-training ParadigmAccuracy0.8579
mutagenesismutagenicCrossMineADMA 2010Accuracy0.819
mutagenesismutagenicCrossMineEfficient Classification across Multiple Database Relations: A CrossMine ApproachAccuracy0.893
mutagenesismutagenicCrossMineLearning from Skewed Class Multi-relational DatabasesAccuracy0.912
mutagenesismutagenicCrossMineRelational Classification using Multiple View Approach with VotingAccuracy0.857
mutagenesismutagenicFOILEfficient Classification across Multiple Database Relations: A CrossMine ApproachAccuracy0.797
Show all algorithms

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "mutagenesis" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).