PubMed_Diabetes
The Pubmed Diabetes dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.
Original source: linqs.soe.ucsc.edu
Versions
PubMed_Diabetes (by Jan Motl)
Dataset details
- Associated task:
- Classification
- Domain:
- Education
- Data types:
- Size:
- 44.1 MB
- Count of tables:
- 3
- Count of rows:
- 1,051,972
- Count of columns:
- 7
- Missing values:
- No
- Compound keys:
- No
- Loops:
- Yes
- Type:
- Real
- Instance count:
- 20,055
- Target table:
- paper
- Target column:
- class_label
- Target ID:
- paper_id
- Target timestamp:
- ?
How to download the dataset
The datasets are publicly available directly from MariaDB database.
- Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
- Use following credentials:
- hostname: db.relational-data.org
- port: 3306
- username: guest
- password: relational
- Export "PubMed_Diabetes" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).