About

Mission

To support the growth of relational machine learning.

How to cite

FAQ

Why are the datasets not stored in CSV files?

Because CSV files do not store information about data types, PKs, FKs and other constraints.

Why am I not able to connect to the database?

If you are connecting to the database over a corporate network, the corporate firewalls could be the culprit (it may block port 3306).
Try to access the database with a different internet provider (e.g. with your cellular provider).
Also, keep in mind that database names are case sensitive. Database "mutagenesis" is not the same database as "Mutagenesis".
If the problems persist, contact us and provide us with the following information:

Your database client and its version (e.g. MySQL Workbench 6.3.10).
The database name you tried to connect to (e.g. mutagenesis).

Why mysqldump cannot find COLUMN_STATISTICS in information_schema?

MariaDB has the table in MYSQL.COLUMNM_STATS. Use one of the workarounds.

What to do if I want an ILP format?

See a collection of datasets at ILPnet2.
Or use a conversion tool, where you have to change the connection parameters in src/Read.java from:
read.setConnection("jdbc:mysql://mantong01.dyndns.org:3306/mln","temp","Passw0rd");
to:
read.setConnection("jdbc:mysql://db.relational-data.org:3306/mutagenesis","guest","relational");

Why do the datasets contain missing values/composite keys/strange data types/any other ugly thing you may think of?

Because they are also present in the real datasets.

What is the point of including artificial datasets?

While datasets like Adventure Works may not contain any pattern that could be found during modeling, they still increase the diversity of the repository. For example, the named Adventure Works dataset has the highest table count in the whole repository.
If your algorithm can process all the tables present in Adventure Works, it may be able to process real-world datasets.

Tools that use our repository

dm: Relational Data Models, a package for working with relational data in R.
Data Xtractor, a visual SQL query builder for Windows.
getML, a propositionalization library in Python.