Shakespeare

Alternative names: OSS

The Open Source Shakespeare is a collection of Shakespeare's complete works. This is a much more interesting data set than some boring imaginary online retailer. In this dataset, people die! The task is to predict the character, who speaks the lines.

Original source: www.opensourceshakespeare.org

Versions

  • Shakespeare (by Jan Motl)

    • We use a normalized database schema from https://github.com/mozz100/bardofavon.

Dataset details

Associated task:
Classification
Domain:
Entertainment
Data types:
Size:
8.8 MB
Count of tables:
4
Count of rows:
35,234
Count of columns:
19
Missing values:
No
Compound keys:
No
Loops:
No
Type:
Real
Instance count:
32,980
Target table:
paragraphs
Target column:
character_id
Target ID:
id
Target timestamp:
?

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "Shakespeare" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).