Financial

Alternative names: loan application

PKDD'99 Financial dataset contains 606 successful and 76 not successful loans along with their information and transactions. The standard task is to predict the loan outcome for finished loans (A vs B in loan.status) at the time of the loan start (defined by loan.date). Note: Two factors have a great impact on the reported model's accuracy in the references: 1) Was the temporal constraint respected? 2) Was the problem formulated as (A vs B), or (A vs B vs C vs D)? If the temporal constraint is ignored, good loans (A, C) can be perfectly separated from bad loans (B, D) with: if min(trans.balance) >= 0 then good else bad. Finished loans (A, B) can be perfectly separated from unfinished loans (C, D) with: if loan.date + loan.duration >= 1999-01-01 then unfinished else finished.

Original source: web.archive.org (BibTeX)

Versions

  • Financial (by Jan Motl)

    • Added foreign key constrains. Separated "rodne cislo" into date of birth and gender
  • Financial_ijs (by Janez Kranjc)

  • Financial_std (by Oliver Schulte)

Dataset details

Associated task:
Classification
Domain:
Financial
Data types:
Size:
78.8 MB
Count of tables:
8
Count of rows:
1,090,086
Count of columns:
55
Missing values:
Yes
Compound keys:
No
Loops:
Yes
Type:
Real
Instance count:
682
Target table:
loan
Target column:
status
Target ID:
account_id
Target timestamp:
date

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "financial" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).