Find a copy online
Links to this item
Find a copy in the library
Finding libraries that hold this item...
|Material Type:||Thesis/dissertation, Manuscript, Internet resource|
|Document Type:||Book, Archival Material, Internet Resource|
|All Authors / Contributors:||
|Notes:||Adviser: Jeff Heflin.|
We propose a framework for document-centric query answering for the Semantic Web. The idea is novel in that it extends traditional queries on logical knowledge bases by integrating the notion of documents. We consider two broad types of queries: document entailment queries, which are concerned with what assertions are entailed by a specific subset of documents in the knowledge base, and document provenance queries, which ask for the minimal consistent subsets of documents in order for specific assertions to hold.
We develop algorithms to support the above queries in a knowledge base system. We adopt a preprocessing strategy that reasons with documents and caches selected results. This allows us to reuse expensive OWL reasoning at query time and reduce query answering to a simple semantic network-like inference procedure. In addition, we make use of an assumption-based truth maintenance system to represent the contexts of an assertion, i.e., minimal consistent subsets of documents that entail the assertion, as well as recording information about inconsistent document subsets.
Another issue we try to address is scalability, which is crucial for the Semantic Web to succeed. Reasoning with OWL is highly expensive, and moreover, the scale of the Semantic Web poses great challenges. We explore ways that could help improve the scalability of Semantic Web knowledge base systems that need to handle large scales of data. First, we examine the logical relationships, i.e., logical dependence and logical independence, between OWL knowledge bases. On the one hand, we introduce a set of theorems that state conditions under which logical independence is guaranteed. At the same time, we present an algorithm that detects logical independence for the general case. On the other hand, we set up a theoretic framework for identifying logical dependence in terms of how collections of knowledge bases may together lead to new inferences.
In addition, we describe a scalable and practical approach for partitioning large OWL ABoxes so that specific kinds of reasoning can be performed separately on each partition, and additionally the results can be combined in order to answer conjunctive queries. The main features of our approach include: a reasonable tradeoff between the complexity of determining partitions and the granularity of partitioning; worst-case polynomial time complexity of partitioning; and the ability to handle problems that are too large for main memory. In addition, we show promising experimental results on both the LUBM (Lehigh University Benchmark) data and the FOAF (The Friend of a Friend) data collected from the Web.
Moreover, in order to further improve the query answering system, we apply and extend the above partitioning approach in order to cut down the number of document sets we need to reason with during preprocessing. Also, we give algorithms for answering document-centric queries that are extensions to conjunctive ABox queries. Finally, empirical experiments on both the LUBM data and the FOAF data demonstrate satisfactory results, in particular, good scalability of the system.