WebKP
The WebKB dataset consists of 877 scientific publications classified into one of five classes. The citation network consists of 1608 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1703 unique words.
Original source: linqs.cs.umd.edu
Versions
WebKP (by Arnaud Barragao)
Dataset details
- Associated task:
- Classification
- Domain:
- Education
- Data types:
- Size:
- 12.8 MB
- Count of tables:
- 3
- Count of rows:
- 80,592
- Count of columns:
- 6
- Missing values:
- No
- Compound keys:
- No
- Loops:
- Yes
- Type:
- Real
- Instance count:
- 877
- Target table:
- webpage
- Target column:
- class_label
- Target ID:
- webpage_id
- Target timestamp:
- ?
How to download the dataset
The datasets are publicly available directly from MariaDB database.
- Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
- Use following credentials:
- hostname: db.relational-data.org
- port: 3306
- username: guest
- password: relational
- Export "WebKP" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).