• Computational Grid I/O (Tuesday 3:30-5:00PM)
    Room A102/104/106
    Chair: Rich Wolsky, University of Tennessee/Knoxville

    • Title: LegionFS: A Secure and Scalable File System Supporting Cross-Domain High-Performance Applications
    • Authors:
      Brian S. White (University of Virginia)
      Michael Walker (University of Virginia)
      Marty Humphrey (University of Virginia)
      Andrew S. Grimshaw (University of Virginia)
      Best Student Paper Finalist
    • Abstract:
      Realizing that current file systems can not cope with the diverse requirements of wide-area collaborations, researchers have developed data access facilities to meet their needs. Recent work has focused on comprehensive data access architectures. In order to fulfill the evolving requirements in this environment, we suggest a more fully-integrated architecture built upon the fundamental tenets of naming, security, scalability, extensibility, and adaptability. These form the underpinning of the Legion File System (LegionFS). This paper motivates the need for these requirements and presents benchmarks that highlight the scalability of LegionFS. LegionFS aggregate throughput follows the linear growth of the network, yielding an aggregate read bandwidth of 193.8 MB/s on a 100 Mbps Ethernet backplane with 50 simultaneous readers. The serverless architecture of LegionFS is shown to benefit important scientific applications, such as those accessing the Protein Data Bank, within both local-and wide-area environments.

    • Title: High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies
    • Authors:
      Bill Allcock (Argonne National Laboratory)
      Ian Foster (Argonne National Laboratory)
      Veronika Nefedova (Argonne National Laboratory)
      Ann Chervenak (Information Sciences Institute, University of Southern California)
      Ewa Deelman (Information Sciences Institute, University of Southern California)
      Carl Kesselman (Information Sciences Institute, University of Southern California)
      Jason Lee (Lawrence Berkeley National Laboratory)
      Alex Sim (Lawrence Berkeley National Laboratory)
      Arie Shoshani (Lawrence Berkeley National Laboratory)
      Bob Drach (Lawrence Livermore National Laboratory)
      Dean Williams (Lawrence Livermore National Laboratory)
    • Abstract:
      In numerous scientific disciplines, terabyte and soon petabyte-scale data collections are emerging as critical community resources. A new class of Data Grid infrastructure is required to support management, transport, distributed access to, and analysis of these datasets by potentially thousands of users. Researchers who face this challenge include the Climate Modeling community, which performs long-duration computations accompanied by frequent output of very large files that must be further analyzed. We describe the Earth System Grid prototype, which brings together advanced analysis, replica management, data transfer, request management, and other technologies to support high-performance, interactive analysis of replicated data. We present performance results that demonstrate our ability to manage the location and movement of large datasets from the user's desktop. We report on experiments conducted over SciNET at SC2000, where we achieved peak performance of 1.55Gb/s and sustained performance of 512.9Mb/s for data transfers between Texas and California.   

    • Title: Gathering at the Well: Creating Communities for Grid I/O
    • Authors:
      Douglas Thain (University of Wisconsin-Madison)
      John Bent (University of Wisconsin-Madison)
      Andrea Arpaci-Dusseau (University of Wisconsin-Madison)
      Remzi Arpaci-Dusseau (University of Wisconsin-Madison)
      Miron Livny (University of Wisconsin-Madison)
    • Abstract:
      Grid applications have demanding I/O needs. Schedulers must bring jobs and data in close proximity in order to satisfy throughput, scalability, and policy requirements. Most systems accomplish this by making either jobs or data mobile. We propose a system that allows jobs and data to meet by binding execution and storage sites together into I/O communities which then participate in the wide-area system. The relationships between participants in a community may be expressed by the ClassAd framework. Extensions to the framework allow community members to express indirect relations. We demonstrate our implementation of I/O communities by improving the performance of a key high-energy physics simulation on an international distributed system.