42. ALGEBRAIC FOUNDATIONS FOR OPTIMIZING SEMISTRUCTURED QUERIES WITH FULL TEXT PREDICATES AND USER DEFINED SCORING
Name: Nathan C Bales
Grad Year: 2012
We study necessary foundations for building a practical implementation of keyword search with full-text predicates and scoring in XQuery. Much recent work has been done integrating XML Query facilities with IR like keyword search to support digital library systems. In particular, XQuery full text is a new W3C recommendation for augmenting the XQuery language with Full-Text expressions for keyword search. Implantations of XQuery Full-Text must support answer scoring. Exiting systems are designed around a single or a narrow class of scoring algorithms. Choice of scoring algorithm is subjective and application specific, so there is value in building a system that can support arbitrary scoring algorithms. Furthermore, many of these existing systems forgo either query optimization or consistency of scores. Expressivity is sacrificed in these systems to make execution practical in the absence of optimization. We present a framework by which users can define their own scoring algorithm and show that it can be used to express many of the scoring algorithms already used for keyword search in XML. We give semantics to the framework such that scores are well defined with respect to a document and all logically equivalent queries. We describe a system that can correctly execute and score keyword queries using time exponential in the size of the query. We propose both traditional and novel optimization techniques for application in full-text queries.