1213. sredin seminar, 13. junij 2012

Linked data: Predicting missing properties

The volume of available structured data is increasing, particularly in the form of Linked Data, where relationships between individual pieces of data are encoded by a graph-like structure. The structure is a multigraph where: concepts define nodes and properties and relationships are represented by edges. Despite increasing scales of the data, the use and applicability of these resources is currently limited by mistakes and omissions in the linking data.

In this talk, we look at the missing property recommendation problem. Given a specific query node in our multigraph dataset, can we correctly rank possibly omitted properties by likelihood. We propose a general method based on leveraging properties from similar nodes in our dataset. To determine the likelihood, we use weighted averages of property counts to determine the most likely missing property. To determine similar nodes, we use a variety of local, global and external measures of similarity. Finally, we present a comprehensive evaluation of the performance of our approach for a number of different averaging schemes and similarity measures on three very large-scale datasets, two based on DBpedia and one based on Freebase.

Sreda/wiki