Find a copy online
Links to this item
theses.fr Accès au texte intégral
Find a copy in the library
Finding libraries that hold this item...
Details
Genre/Form: | Thèses et écrits académiques |
---|---|
Material Type: | Document, Thesis/dissertation, Internet resource |
Document Type: | Internet Resource, Computer File |
All Authors / Contributors: |
Justine Reynaud; Amedeo Napoli; Yannick Toussaint; Claire Gardent; Catherine Faron-Zucker, (19..-.... ; chercheur).; Fatiha Saïs; Luis Galarraga Del Prado; Université de Lorraine.; École doctorale IAEM Lorraine - Informatique, Automatique, Électronique - Électrotechnique, Mathématiques de Lorraine.; Laboratoire lorrain de recherche en informatique et ses applications. |
OCLC Number: | 1159114758 |
Notes: | Titre provenant de l'écran-titre. |
Description: | 1 online resource |
Responsibility: | Justine Reynaud ; sous la direction de Amedeo Napoli et de Yannick Toussaint. |
Abstract:
In this thesis, we are interested in the web of data and knowledge units that can be possibly discovered inside. The web of data can be considered as a very large graph consisting of connected RDF triple databases. An RDF triple, denoted as (subject, predicate, object), represents a relation (i.e. the predicate) existing between two resources (i.e. the subject and the object). Resources can belong to one or more classes, where a class aggregates resources sharing common characteristics. Thus, these RDF triple databases can be seen as interconnected knowledge bases. Most of the time, these knowledge bases are collaboratively built thanks to human users. This is particularly the case of DBpedia, a central knowledge base within the web of data, which encodes Wikipedia content in RDF format. DBpedia is built from two types of Wikipedia data: on the one hand, (semi-)structured data such as infoboxes, and, on the other hand, categories, which are thematic clusters of manually generated pages. However, the semantics of categories in DBpedia, that is, the reason a human agent has bundled resources, is rarely made explicit. In fact, considering a class, a software agent has access to the resources that are regrouped together, i.e. the class extension, but it generally does not have access to the ``reasons'' underlying such a cluster, i.e. it does not have the class intension. Considering a category as a class of resources, we aim at discovering an intensional description of the category. More precisely, given a class extension, we are searching for the related intension. The pair (extension, intension) which is produced provides the final definition and the implementation of classification-based reasoning for software agents. This can be expressed in terms of necessary and sufficient conditions: if x belongs to the class C, then x has the property P (necessary condition), and if x has the property P, then it belongs to the class C (sufficient condition). Two complementary data mining methods allow us to materialize the discovery of definitions, the search for association rules and the search for redescriptions. In this thesis, we first present a state of the art about association rules and redescriptions. Next, we propose an adaptation of each data mining method for the task of definition discovery. Then we detail a set of experiments applied to DBpedia, and we qualitatively and quantitatively compare the two approaches. Finally, we discuss how discovered definitions can be added to DBpedia to improve its quality in terms of consistency and completeness.
Reviews


Tags
Similar Items
Related Subjects:(11)
- Web sémantique.
- Représentation des connaissances.
- Analyse formelle de concepts.
- Exploration de données.
- Classification automatique.
- Découverte de connaissances
- Analyse de concepts formels
- Fouille de redescriptions
- Fouille de règles
- Construction de définitions
- Classification dans le web des données