Musk

Musk

The Musk database describes molecules occurring in different conformations. Each molecule is either musk or non-musk and one of the conformations determines this property. Such a problem is known as a multiple-instance problem, and is modeled by two tables molecule and conformation, joined by a one-to-many association. Confirmation contains a molecule identifier plus 166 continuous features. Molecule just contains the identifier and the class. There are two versions of the dataset, MuskSmall, containing 92 molecules and 476 confirmations, and MuskLarge, containing 102 molecules and 6598 confirmations.

Original source: sourceforge.net

Versions

  • MuskSmall (by Arnaud Barragao)

  • MuskLarge (by Arnaud Barragao)

Dataset details

Associated task:
Classification
Domain:
Medicine
Data types:
Size:
400 KB
Count of tables:
2
Count of rows:
554
Count of columns:
170
Missing values:
No
Compound keys:
No
Loops:
No
Type:
Real
Instance count:
92
Target table:
molecule
Target column:
class
Target ID:
molecule_name
Target timestamp:
?

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "MuskSmall" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).