CiteSeer
The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding …
CORA
The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding wo…
DCG
The set of positive examples consists of all sentences of up to seven words that can be generated by the DCG in Bratko's book (565 positive examples).The set of negative examples was generated by randomly selecting one word in each positive example and replacing it by …
Grants
This dataset includes funding grants from the National Science Foundation. The task is to predict the award amount.
PubMed_Diabetes
The Pubmed Diabetes dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word …
Stats
An anonymized dump of all user-contributed content on the Stats Stack Exchange network.
StudentLoan
Student Loan contains data about students enrollment and employment status, and the aim is to find rules that define a students' obligation for paying his/her loan back.
Trains
East-West challenge (1980) database describes east-bound and west-bound trains.
University
An artificial database from Simon Fraser University describing students, professors and courses.
UW-CSE
This dataset lists facts about the Department of Computer Science and Engineering at the University of Washington (UW-CSE), such as entities (e.g., Student, Professor) and their relationships (i.e. AdvisedBy, Publication).
VisualGenome
Visual Genome is a dataset, a knowledge base, an ongoing effort to connect structured image concepts to language.
WebKP
The WebKB dataset consists of 877 scientific publications classified into one of five classes. The citation network consists of 1608 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding wor…