• Information Retrieval & Transaction Processing (Wednesday 1:30-3:00PM)
    Room A102/104/106
    Chair: James Hoe, Carnegie Mellon University

    • Title: Efficient Execution of Multiple Query Workloads in Data Analysis Applications
    • Authors:
      Henrique Andrade (University of Maryland, College Park)
      Tahsin Kurc (The Ohio State University)
      Alan Sussman (University of Maryland, College Park)
      Joel Saltz (The Ohio State University)
    • Abstract:
      Applications that analyze, mine, and visualize large datasets are considered an important class of applications in many areas of science, engineering, and business. Queries commonly executed in data analysis applications often involve user-defined processing of data and application-specific data structures. If data analysis is employed in a collaborative environment, the data server should execute multiple such queries simultaneously to minimize the response time to clients. In this paper we present the design of a runtime system for executing multiple query workloads on a shared-memory machine. We describe experimental results using an application for browsing digitized microscopy images.

    • Title: Dynamic Page Placement to Improve Locality in CC-NUMA Multiprocessors for TPC-C
    • Authors:
      Kenneth M. Wilson (Apple Computer Inc.)
      Bob B. Aglietti (Advanced Micro Devices)
    • Abstract:
      The use of CC-NUMA multiprocessors complicates the placement of physical memory pages. Memory closest to a processor provides the best access time, but optimal memory page placement is a difficult problem with process movement, multiple processes requiring access to the same physical memory page, and application behavior changing over execution time. We use dynamic page placement to move memory pages where needed for the database benchmark TPC-C executing on a four node CC-NUMA multiprocessor. Dynamic page placement achieves local memory accesses up to 73% of the time instead of the static page placement results of 34% locality achieved with first touch and 25% with round robin. This can result in a 17% improvement in performance.   

    • Title: Compressing Inverted Files in Scalable Information Systems by Binary Decision Diagram Encoding
    • Authors:
      Chung-Hung Lai (National Chung Cheng University)
      Tien-Fu Chen (National Chung Cheng University)
      Best Student Paper Finalist
    • Abstract:
      One of the key challenges of managing very huge volumes of data in scalable Information retrieval systems is providing fast access through keyword searches. The major data structure in the information retrieval system is an inverted file, which records the positions of each term in the documents. When the information set substantially grows, the number of terms and documents are significantly increased as well as the size of the inverted files.

      Approaches to reduce the inverted file without sacrificing the query efficiency are important to the success of scalable information systems. In this paper, we propose a compression approach by using Binary Decision Diagram Encoding (BDD) so that all possible ordering correlation among large amount of documents will be extracted to minimize the posting representation. Another advantage of using BDD is that BDD expressions can efficiently perform Boolean queries, which are very common in retrieval systems. Experiment results show that the compression ratios of the inverted files have been improved significantly by the BDD scheme.