AdventureWorks
Adventure Works 2014 (OLTP version) is a sample database for Microsoft SQL Server, which has replaced Northwind and Pub sample databases that were shipped earlier. The database is about a fictious, multinational bicycle manufacturer called Adventure Works Cycles.
BasketballMen
The task is to predict rank of teams.
Biodegradability
This is an older data set of chemical structures containing 328 compounds labeled by their half-life for aerobic aqueous biodegradation (a regression task).
CCS
Transactional data from Czech debit card company specialising on payments at petrol pumps.
CDESchools
A database containing geospatial information, as well as SAT average scores and Free-or-Reduced-Price Meal eligibility data, for California schools.
ClassicModels
The schema is for Classic Models, a retailer of scale models of classic cars. The database contains typical business data such as customers, orders, order line items, products and so on.
Countries
The task is to predict "Forest area (% of land area)" for 247 countries in 2012 based on the previous values.
Employee
The employees test database: small, fake database of employees.
FNHK
Anonymised data from a hospital in Hradec Kralove, Czech Republic, about treatment and medication.
GOSales
GO Sales dataset from IBM contains information about daily sales, methods, retailers, and products of a fictitious outdoor equipment retail chain “Great Outdoors” (GO). The task is to predict sale quantity.
Grants
This dataset includes funding grants from the National Science Foundation. The task is to predict the award amount.
Lahman
Lahman’s baseball database contains complete batting and pitching statistics from 1871 to 2014, plus fielding statistics, standings, team stats, managerial records, post-season data, and more.
Mesh
This domain is about finite element methods in engineering. The task is to predict how many elements should be used to model each edge of a structure. The target predicate is mesh(Edge,Number) where the Number of elements in the Mesh model can vary between 1 and 17.
Northwind
The Northwind database contains the sales data for a fictitious company called Northwind Traders, which imports and exports specialty foods from around the world.
Pubs
The pubs sample database is modeled after a book publishing company.
Pyrimidine
A pyrimidine QSAR dataset. The goal is to predict the inhibition of dihydrofolate reductase by pyrimidines.
Restbase
A database of restaurants in San Francisco. The goal is to predict the customer's satisfaction.
Sakila
The venerable sakila test database: small, fake database of movies.
SalesDB
A simple artificial database in star schema.
Seznam
Seznam.cz is a web portal and search engine in the Czech Republic. The data represent online advertisement expenditures from Seznam's "wallet". Table description: client: location and domain field of the client (anonymized) dobito: prepaid into a wallet in Czech cur…
SFScores
The San Francisco Dept. of Public Health’s database of eateries, inspections of those eateries, and violations found during the inspections. The task is to predict the unscheduled inspection scores from 2013 to 2016. The scores range from 1 to 100, where 100 means that…
Stats
An anonymized dump of all user-contributed content on the Stats Stack Exchange network.
TPCH
TPC-H is the benchmark published by the Transaction Processing Performance Council (TPC) for decision support.
Triazine
A pyrimidine QSAR dataset. The the goal is to predict the inhibition of dihydrofolate reductase by pyrimidines.
Walmart
Walmart challenges participants to accurately predict the sales of 111 potentially weather-sensitive products (like umbrellas, bread, and milk) around the time of major weather events at 45 of their retail locations.