Hierarchical Truth Discovery Data Sets
We build two datasets to evaluate the peformances of truth discovery algorithms using hierarchies. Each dataset contains claims from web pages, a hierarchy for the claimed values and ground truths.
More details can be found in the following paper.
Source (Citation)
- Woohwan Jung, Younghoon Kim and Kyuseok Shim. Crowdsourced Truth Discovery in the Presence of Hierarchies for Knowledge Fusion, EDBT 19 [Paper] [Code]
Heritages
This is a dataset of the locations of World Heriatage Sites, the places of special cultural or physical significance provided by UNESCO World Heritage Centre, available at http://whc.unesco.org.
-
Statistics
Number of objects 785 Number of sources 1,577 Number of claims 4,424 Number of nodes in the hierarchy 1,027
a. Claims [Download]obj src value ID name parentID
obj val You can download the official world heritage list in UNESCO World Heritage Centre web sites
Official World Heritage List
d. Crowdsourced answers [Download]
This contains answers collected from 20 workers in a commercial crowdsourcing platform Amazon Mechanical Turk.
obj workerId value
BirthPlaces
This dataset contains birthplaces of 6,005 celebrities.
-
Statistics
Number of objects
6,005
Number of sources
7
Number of claims
13,510
Number of nodes in the hierarchy
4,999
a. Claims [Download]
obj
src
value
b. hierararchy [Download]
ID
name
parentID
c. Ground truths
We used IMDb data as the ground truths in the paper.
However, unfortunately, birthplaces of directors/actresses/actors are not available at IMDb now (2019.1.16.).
In addition, we cannot redistribute the old version of data since IMDb's policy prohibits redistibution of the data.
Thus, we provide alternative links to a dataset and an API which contain birthplaces of peoples.
We think the following datasets can be used to evaluate the performance of truth discovery algorithms.
UDBMS Group - Film dataset
THE MOVIE DB API
Statistics
Number of objects | 6,005 |
---|---|
Number of sources | 7 |
Number of claims | 13,510 |
Number of nodes in the hierarchy | 4,999 |
a. Claims [Download]
obj | src | value |
ID | name | parentID |
We used IMDb data as the ground truths in the paper.
However, unfortunately, birthplaces of directors/actresses/actors are not available at IMDb now (2019.1.16.).
In addition, we cannot redistribute the old version of data since IMDb's policy prohibits redistibution of the data.
Thus, we provide alternative links to a dataset and an API which contain birthplaces of peoples.
We think the following datasets can be used to evaluate the performance of truth discovery algorithms.
UDBMS Group - Film dataset
THE MOVIE DB API