Financial
Alternative names: loan application
PKDD'99 Financial dataset contains 606 successful and 76 not successful loans along with their information and transactions. The standard task is to predict the loan outcome for finished loans (A vs B in loan.status) at the time of the loan start (defined by loan.date). Note: Two factors have a great impact on the reported model's accuracy in the references: 1) Was the temporal constraint respected? 2) Was the problem formulated as (A vs B), or (A vs B vs C vs D)? If the temporal constraint is ignored, good loans (A, C) can be perfectly separated from bad loans (B, D) with: if min(trans.balance) >= 0 then good else bad. Finished loans (A, B) can be perfectly separated from unfinished loans (C, D) with: if loan.date + loan.duration >= 1999-01-01 then unfinished else finished.
Original source: web.archive.org (BibTeX)
Versions
Financial (by Jan Motl)
- Added foreign key constrains. Separated "rodne cislo" into date of birth and gender
Financial_ijs (by Janez Kranjc)
Financial_std (by Oliver Schulte)
Dataset details
- Associated task:
- Classification
- Domain:
- Financial
- Data types:
- Size:
- 78.8 MB
- Count of tables:
- 8
- Count of rows:
- 1,090,086
- Count of columns:
- 55
- Missing values:
- Yes
- Compound keys:
- No
- Loops:
- Yes
- Type:
- Real
- Instance count:
- 682
- Target table:
- loan
- Target column:
- status
- Target ID:
- account_id
- Target timestamp:
- date
References
Algorithms
How to download the dataset
The datasets are publicly available directly from MariaDB database.
- Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
- Use following credentials:
- hostname: db.relational-data.org
- port: 3306
- username: guest
- password: relational
- Export "financial" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).