WebKP

The WebKB dataset consists of 877 scientific publications classified into one of five classes. The citation network consists of 1608 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1703 unique words.

Original source: linqs.cs.umd.edu

Versions

  • WebKP (by Arnaud Barragao)

Dataset details

Associated task:
Classification
Domain:
Education
Data types:
Size:
12.8 MB
Count of tables:
3
Count of rows:
80,592
Count of columns:
6
Missing values:
No
Compound keys:
No
Loops:
Yes
Type:
Real
Instance count:
877
Target table:
webpage
Target column:
class_label
Target ID:
webpage_id
Target timestamp:
?

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: db.relational-data.org
    • port: 3306
    • username: guest
    • password: relational
  3. Export "WebKP" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).