TUESDAY, NOVEMBER 13

  • EMERGING LIFE SCIENCES I
    Chair: Barbara Horner-Miller, Arctic Region Supercomputing Center
    Time: 10:30-Noon
    Room A207/209

    Applications in Computational Biology and Computational Chemistry: Similarities and Differences

    Eamonn O'Toole, Compaq Computers

    Computation is rapidly assuming a central role in biology. Such recent feats as the completion of the draft human-genomes by Celera Genomics and the International Human Genome Sequencing Consortium and their annotation would not have been possible without significant computational resources. Some of the largest computers in private and public hands are now devoted to biological problems. Chemists have made extensive use of computation for considerably longer than biologists, and computational chemists are responsible for some of the largest and most important scientific applications in current use.

    Chemists and biologists tackle many problems that are closely related. In addition, some large computer installations are shared between chemists and biologists. We will outline some of our experiences benchmarking and analyzing performance of a number of applications in computational chemistry and biology. We are particularly interested in similarities and differences between these applications, their behavior and the stresses that they place on systems. This work is part of an on-going program to better understand these applications.

    Computing Requirements for the Bioinformatics Revolution: Genomics, Proteomics and Cellular Machinery

    Bill Camp, Sandia National Laboratory

    Biology and medicine are entering a new age dominated by high-throughput experimentation and computation. How quickly the door to that future is opening is signalled by the staggering progress in developing a map of the human genome as well as those of several other species. Those efforts have required unprecedented increases in experimental output made possible in part by the the development of shotgun sequencing and the subsequent automation of DNA sequencing. The key to using these high throughput methods has been the ability to reconstruct the entire sequence from a redundant set of small fragment sequences. This assembly is in turn enabled by the introduction of extreme computing into biology by Celera Genomics and its attendant use by both Celera and the Public Genome Project. This first stage of genomic biology utilized terascale computing for the first time in biology. It was dominated by informatics—the searching , comparison and manipulation of large alphanumeric datasets. Little or no floating point computation was required; and the work was essentially embarrassingly parallel and dominated by unusually intense data I/O. On-going refinements of the genomics computing methodologies include reducing I/O requirments by replacing I/O with parallel computing techniques.

    In the next stage, early stage proteomics, the calculations will be even larger than those required for genomics. However, they still will have a strong informatics flavor. Since the calculations will be larger, there will be even more pressure on I/O systems which will in turn drive additional emphasis on true parallel computing methods.

    Later stages in proteomics— which include elucidation of protein structure and function and protein-protein interactions— will see orders-of-magnitude increase in computing requirements with increased emphasis on floating-point operations. The long term goals for computation in biology and medicine include gene-based species modification (e.g. for agriculture or environmental remediation) and intervention strategies for medicine and counter-terrorism. In all cases, being able to understand protein structure and function will be a critical step. Further refinement will involve understanding signalling and metabolic pathways, enzymatic catalysis,protein target identification as well as the design of interventions. These longer-term goals pose computing challenges which are both floating point intense and by current estimates will involve sustained, affordable petascale processing.

    Finally possibly the most exotic challenge posed by biology is that of simulating the biological cell at a system level. This involves far more than adapting the simulation methods of physics, chemistry and engineering to biological systems-- although that will be a critical component of the solution. The cell is so much more complex than anything we have attempted to simulate in the past; and so many of the underlying processes will remain under-characterized that radically new simulation methodologies will be necessitated: for example, our inability to characterize underlying details (e.g. a myriad of reaction rates) may require inherently non-deterministic simulation methods. Of course, a detailed atomistic simulation of even the simplest cell is not only computational infeasible for the foreseeable future but also would be so complex as to challenge interpretability.

    I will discuss architectural strategies for meeting the computing challenges of the revolution in biological sciences, including scaling needs for computation, I/O and networking.


  • EMERGING LIFE SCIENCES II
    Chair: Ty Rabe, Compaq Computer Corporation
    Time: 1:30-3:00 PM
    Room A207/209

    The Computational Impact of Genomics on Biotechnology R&D
    Scooter Morris, Genentech

    The excitement surrounding the completion of the first draft sequence of the Human Genome has created significant interest in genomics, the study of entire genomes, and has greatly influenced how the pharmaceutical industry discovers and characterizes new medicinal compounds. Two major areas of impact are the way in which we approach the search for new proteins of interest and the computational tools that we use to perform that search. Increasingly, the search for new proteins or lead compounds has moved from the laboratory to the computer. This migration has not been sudden, and is not a direct result of the completion of the Human Genome sequencing effort, but rather is a result of the increasing capacity and performance of modern computing systems, and the incredible increase in the amount of available sequence data. This talk will discuss both of these evolutions, and conclude with a survey of the current computing systems and architectures that are used in the biotech industry for primary sequence discovery and other research activities.

    New Wine in Old Bottles: The Use of Vector Processors and Fine-Grained Parallelism in Genomic Analysis
    Stanley K. Burt, Advanced Biomedical Computing Center, National Cancer Institute
    [Authors: Jack Collins, Robert Stephens, and Stanley K. Burt]

    The common assumption is that biological data analysis problems are suitable for parallel computing, particularly by cluster computing. However, certain problems, in which very large amounts of data are involved, can be approached by other computer techniques. We demonstrate in a new methodology that certain techniques used for cryptography can be useful for pattern recognition in biological research, such as finding tandem repeats in DNA sequences. This new method takes advantage of special hardware capabilities of the Cray computer architecture, the vector registers, large shared memory, fine grain parallelism, and also leverages additional speedup from sequence compression.

    The identification of simple tandem repeats within DNA sequences is an extremely powerful tool for exploring genomes. These specific repeat elements (or microsatellites) are frequently polymorphic and thus can be used for many purposes ranging from diagnostic primers used to increase the marker density surrounding regions of interest, mapping new genes and forensic sciences. We report here the development of a new, extremely rapid tandem repeat finder that exhaustively determines all possible repetitive elements up to 16 bases in length.

    We describe and demonstrate the utility of the method in the identification of simple tandem repeats within the entire human genome. By focusing on known coding regions, we find many repeats, possibly linked to diseases, previously not described. The data has been assembled into a relational database that is web-accessible and allows searching for elements based on genomic region or gene. Beyond this particular application, this methodology will allow analyses that previously were beyond current computational capabilities.


  • EMERGING LIFE SCIENCES III
    Chair: Ellen Roder, Cray, Inc.
    Time: 3:30-5:00 PM
    Room A207/209

    Computing Challenges for Structure-based Drug Design on a Genomic Scale
    Tod M. Klingler, Structural GenomiX

    At Structural GenomiX we are integrating experimental approaches for protein structure determination with computational modeling methods, including comparative modeling, ab initio prediction and molecular dynamics, to produce the most comprehensive and accurate view of protein structure space. Using this view of protein structure space as a starting point, large-scale structure-based drug design will be used to greatly improve the drug development process. Computational techniques for docking chemical structural to protein structures are the core of this platform. The required algorithms for protein modeling and chemical docking are compute-intensive and often require specific tuning. In this talk I will describe several of these computational approaches, their integration, and some of the automation and high-throughput computing challenges we are facing in developing this new platform for drug discovery drug.

    National Digital Mammography Archive
    Robert Hollebeek, University of Pennsylvania

    The National Digital Mammography Archive is funded by the National Library of Medicine to design and implement a secure digital archive for mammography and associated reports using Next Generation Internet technologies, including high bandwidth optical networks, quality of service, scalable systems, and scalable applications. Images and reports will be rapidly available wherever needed for medical or educational purposes thus improving screening, diagnosis and ultimately, patient care. Researchers from the Universities of Pennsylvania, Chicago, North Carolina and Toronto, team with advanced computing groups from the University of Pennsylvania (NSCP) and BWXY (Oak Ridge ACT), to develop integrated systems for high-speed networking, distributed archiving, and secure applications. The talk will demonstrate how images and patient data can be securely moved to and from hospitals to an archive and how the applications, including computer assisted diagnosis (CAD), data mining, and teacher training collections, could be used for clinical and research purposes.


WEDNESDAY, NOVEMBER 14

  • HPC COMPUTING INFRASTRUCTURE I
    Chair: Will Murray, CISCO
    Time: 10:30-Noon
    Room A207/209

    StarLight: Optical Switching for the Global Grid
    Tom DeFanti, University of Illinois

    STAR TAP, a persistent infrastructure to facilitate the long-term interconnection and interoperability of advanced international networking, has demonstrated the importance of providing advanced digital communication services to a worldwide scientific research community. However, there are clear indications that 21st-century grid-intensive "e-Science" applications will require a networked "cyber-infrastructure" and set of services that are more sophisticated, with much higher capacity potential and substantially higher performance. The University of Illinois at Chicago, in collaboration with Northwestern University and Argonne National Laboratory, and in partnership with CANARIE (Canada) and SURFnet (Holland), is now creating the Optical STAR TAP, or StarLight www.startap.net/starlight.

    StarLight is an advanced optical infrastructure and proving ground for network services optimized for high-performance applications. The StarLight facility, operational in the summer of 2001, is located on Northwestern University's downtown campus at 710 N. Lake Shore Drive in Chicago. StarLight provides the applications-centric network research community with a Chicago-based co-location facility with enough space, power, air conditioning and fiber to engage in next-generation optical network and application research and development activities. StarLight's architecture is designed to be distributable among opportune carrier points of presence, university campuses, and carrier meet points. And, because optical networks allow for a far greater degree of network configuration flexibility than existing networks, StarLight will provide the required tools and techniques for (university and government laboratory) customer-controlled 10 Gigabit network flows to be switched and routed to research networks and commercial networks, empowering applications to dynamically adjust and optimize network resources. StarLight welcomes the academic and commercial communities to work with us to create a global proving ground in support of grid-intensive e-Science applications, network performance measurement and analysis, and computing and networking technology evaluations.

    Evolution of Supercomputing Networks- from Kilobits to Terabits
    Charlie Catlett, Aragonne National Laboratory

    Just over 15 years ago the National Science Foundation created NSFNET, a 56 Kb/s backbone network that connected a half dozen supercomputer centers. Ten years ago, the US Gigabit Testbeds Initiative was unveiling prototype networks running at between 600 Mb/s and 1.2 Gb/s, with the most interesting application. Within five years, supercomputer centers were connected at 155 Mb/s with networks such as vBNS and ESnet. Today there are backbone networks running at 2.5 Gb/s with many talking of upgrading to 10 Gb/s. Catlett will talk about two projects attempting to push beyond the 10 Gb/s barrier. First is NSF's [proposed] Distributed Terascale Facility (DTF) interconnect, which involves a partnership between Qwest Communications, Argonne National Laboratory, NCSA, Caltech, SDSC, and the Internet2 project. The [proposed] DTF interconnect will couple the four high performance computing centers at 40 Gb/s in early 2002. Second is the State of Illinois "Illinois Wired/Wireless Infrastructure for Research and Education," or I-WIRE. I-WIRE is an optical network that provides both dark fiber and lambda services between six institutions in Illinois (including Argonne and NCSA), providing optical connectivity for the Starlight project as well as connectivity to multiple carrier exchange points in Chicago.


  • HPC COMPUTING INFRASTRUCTURE II
    Chair: David Culler, University of California, Berkeley
    Time: 1:30-3:00 PM
    Room A207/209

    Bringing I/O Scalability and Availability to Linux and AIX Clusters
    Lyle Gayne, IBM

    As parallel and large scale computing has moved from specialized "Supercomputers" (with their traditionally exotic technology and commensurate price) to more flexible, cost-effective cluster environments, the scaling of practical compute capability has created demands for comparable scalability of I/O performance and capacity. The aggregation of large numbers of not completely reliable modular components (be they processors, network or disk) has simultaneously forced software failure survivability into the same domain. A reasonable degree of success in these two domains uncovers and forces further issues to the fore. This technical presentation will discuss IBM's efforts to meet this evolving set of Cluster I/O challenges in both Linux and AIX cluster environments, highlighting the technical issues, progress to date, and the still impending challenges.

    Bringing Linux Clusters into the Enterprise
    Jamshed Mirza, IBM

    Linux is rapidly making inroads today in its traditional areas of strengthæAppliances, Web Serving, and High Performance Computing. But Linux also has the potential to be a key technology for the next generation of e-business - a potential that will only be reached if real and perceived limitations, technical and otherwise, in Linux and Linux Clusters today are removed. To that end, IBM and others are working with the Linux community to make Linux and Linux clusters more enterprise-capable, and are working with customers and ISVs to encourage its wider use within the enterprise. This talk will discuss the potentially strategic importance of Linux, position its capability today relative to other mainstream Unix solutions for HPC, and investigate possible scenarios of its evolution over time that will determine its ability to attain its potential as a strategic e-business technology.


  • TIME MIGRATION IN THE OIL INDUSTRY
    Chair: Ray Paden, IBM
    Time: 3:30-5:00 PM

    Scalability Analysis of Distributed 3D Prestack Time Migration
    Kevin Hellman, Aliant Geophysical

    3D prestack time migration is a seismic imaging application which is well suited for parallel computation in distributed memory clustered computer environments. The basic algorithm involves data aggregation at a volume of output locations, using (potentially) all of the input data at each of these output locations. Parallelization may be designed in either output or input domains. Since the majority of the processing time is spent in the summation kernel, time migration is often thought of as "embarrassingly" parallel, and not much importance is attached to the parallelization scheme. For seismic surveys of actual exploration size, however, the details of parallelization can have a dramatic impact on the scalability, and hence the runtime, of prestack time migration. Simple timing models for three common approaches to parallelization will be introduced which characterize the total throughput time and parallel efficiency of the process with respect to machine size, cpu performance, and speed of data movement. The turnaround time of production sized jobs turns out to be highly dependent on the choice of parallel algorithm, and the choice itself will change with the parameters of the project.

    Computational Elements, Requirements and Tradeoffs for Imaging Normal-Incidence Seismic Data
    Jim McClean, PGS Research
    [Authors: Jim McClean and Steve Kelly, PGS Research]

    Exploration seismic recordings are often processed to simulate an experiment in which the source and receiver are coincident at the same surface location. We outline an algorithm for imaging preprocessed recordings of this type using an approximate form of the scalar wave equation. This outline will consist of a description of the various approximations used to reduce the computional cost while retaining acceptable accuracy.

    In general, huge datasets are handled with this method. Additional constraints include available disk and memory capacities, I/O speed and the underlying computational requirements of the algorithm. We discuss the impact of these constraints upon our processing methodology.

    We also comment on the style of parallelization that is most effective for the algorithm, as well as its scalability.


THURSDAY, NOVEMBER 15

  • HPC IN ENTERTAINMENT
    Chair: Steve Briggs, Compaq Computer Corporation
    Time: 10:30-12:00
    Room A207/209

    Computational Challenges in Computer Animation at Blue Sky Studios
    John Turner, Blue Sky Studios

    Since its inception in 1987, Blue Sky Studios has used ray tracing for virtually all the images it has produced, and it remains the only computer animation studio to use this computationally intensive technique so extensively in production. When one considers the hardware available to a small studio in 1987, it's understandable that many in the industry questioned Blue Sky's approach. However, the principal architects of the original system, Carl Ludwig and Eugene Troubetskoy, believed from the outset that ray tracing produces the best images and that "compute-intensive" is better than "human-intensive." Indeed, the drive toward more complexity and photorealism has made it increasingly difficult to obtain the desired results using non physically-based techniques without inordinate human effort. An example is soft shadows, which are achieved naturally with ray tracing but require special techniques with scanline, first-surface approaches.

    Carl has been known to say that "at Blue Sky we write software for the computers of tomorrow. "While that was certainly true in 1987, advances in computer hardware have brought tomorrow closer than ever.

    In addition to our renderer, other computationally-intensive aspects of computer animation, such as fluid dynamics and cloth simulation will also be discussed.

    Computational Challenges in Creating Volume Rendered Galactic Animations
    Jon Genetti, University of Alaska Fairbanks/Arctic Region Supercomputing Center

    The San Diego Supercomputer Center collaborated with the American Museum of Natural History to produce a visualization of the Orion Nebula for the new Hayden Planetarium. During the Space Show, viewers are transported 1500 light years to the heart of the nebula on an 67 foot digital dome consisting of 9 million pixels. This 2 1/2 minute animation required over 31,000 1280x1024 images and was rendered on SDSC's IBM SP using over 900 processors during a single 12-hour period. Under a less demanding time schedule, a second version was produced for high-resolution flat displays and was shown in the Electronic Theater at Siggraph 2000. This sequence consisted of 4500 6400x3072 images and was rendered on SDSC's SUN E10000s using backfill CPU cycles over a 4 month period.

    In this presentation, I will give an overview of the new Hayden Planetarium, the Orion fly-thru development process, the importance of using HPC resources, the tradeoffs/compromises made during development and the rationale for the final modeling/rendering decisions. I will also give a preview of the next collaboration that plans to generate and render a time-varying volume dataset that will be several terabytes and require state-of-the-art data handling and computation resources.

  • SCXY AS A MASTERWORK
    Chair: Barbara Kucera, National Center for Supercomputing Applications
    Time: 1:30-3:00 PM
    Room A207/209

    SC Global: Celebrating Global Science
    Ian Foster, Argonne National Laboratory & The University of Chicago

    Science and engineering are evolving into increasingly collaborative, distributed, multi-institutional, and often international activities. The technologies that we use to practice science and engineering, to communicate research advances, and to discuss future directions must evolve also. The SC Global event at SC'2001 celebrates and showcases this parallel evolution of work and technology. On the one hand, it represents a technical tour de force, with advanced collaboration and networking technologies used to link hundreds of people at tens of sites on six continents; on the other, it incorporates technical sessions that communicate recent progress on some the most interesting collaborative and international science projects currently in progress. In this talk, I both explain the technologies that underlie the SC Global event and review the technical goals and current status of some of the projects presented on SC Global, including GriPhyN and NEESgrid.

    SCinet: The Annual Convergence of Advanced Networking and High Performance Computing
    Steve Corbato, Backbone Network Infrastructure

    For a period of approximately six months each year, a dedicated team of network architects, engineers, and fiber experts—drawn from the leading national research centers and the national research and education networks—reconvenes to design, build, and operate SCinet. This state-of-the-art network supports both the advanced demonstration and varied general connectivity needs at each SCxy. While this network quickly springs into existence and then carries hundreds of Terabytes of data over its short lifetime of less than a week, it has come to symbolize the increasing convergence of advanced networking and high performance computing—as evidenced, for one, by the recent TeraGrid design.

    Over the years, SCinet has grown to include significant efforts in establishing high-performance wide area connectivity, deploying both fiber-based and wireless networks throughout the venue, probing and characterizing network performance, enabling the Bandwidth Challenge for innovative applications, and demonstrating bleeding-edge network technologies through Xnet. In this presentation, I will provide a glimpse into the truly collaborative and often Herculean process that creates this network and will highlight several implications for the evolving field of distributed high-performance computing.


FRIDAY, NOVEMBER 16

  • VIRTUAL PRODUCT DEVELOPMENT WITH CAE I
    Chair: Ed Turkel, Compaq Computer Corporation
    Time: 8:30-10:00 AM
    Room A102/104/106

    Modeling Approaches in FLUENT for the Solution of Industrial CFD Applications on High-Performance Computing Systems

    Tom Tysinger, Fluent

    FLUENT is a widely used commercial software package for modeling fluid flow and heat transfer in complex geometries. It is capable of solving flows in both the incompressible and compressible regimes. FLUENT is used by engineering analysts and designers to reduce design time, improve product quality and optimize performance. The solution of real-world fluid flow problems requires both large memory and lengthy computation times. This has driven the implementation of FLUENT on parallel computers, reducing the turnaround time from days to hours, or from hours to minutes, and allowing larger and more complex problems to be modeled with greater fidelity. This presentation will describe some of the challenges involved in making a variety of diverse physical models and flow solvers perform efficiently on contemporary HPC architectures.

    High Performance Simulation and Visualization in Engineering Systems
    Kamal Jaffrey, Delta Search Labs
    [Authors: Ahmed Ghoniem and Kamal Jaffrey, MIT and Delta Search Labs]

    The increasing complexity of products and the systems they comprise make traditional design, development and testing difficult. High performance simulation of engineering designs, in which complex physics, chemistry and dynamics interact over a wide range of length and time scales contributing equally to the performance of the system, are becoming possible thanks to the recent advancement of massively parallel and cluster computing, immersive visualization, and numerical algorithms. Of the many applications of HPS "Grand Challenges", the so-called multi-scale multi physics phenomena; combustion has received much attention due to its critical role in many applications including power generation, transportation and propulsion. Combustion simulation, where computational fluid dynamics methods must be further refined to capture the fine scales of multi species transport, and extended to incorporate chemical reactions, is extremely demanding; it has been estimated that simulating an IC engine operation accurately requires many days on 50+ teraflop machine! The conflicting demands on a combustion system; including high efficiency, safety and stability, high power density and extremely low emission, makes simulation-based optimization over a range of conditions necessary and very attractive to designers. Progress in simulation approaches, including grid-free and Lagrangian methods, adaptive, moving and multi grids methods, hybrid Eulerian-Lagrangian methods, fast chemistry reduction algorithms, etc. is bringing this goal closer. The talk will review progress in the field and summarize many of the remaining challenges.

  • VIRTUAL PRODUCT DEVELOPMENT WITH CAE II
    Chair: Mary Kay Bunde, Etnus Corporation
    Time: 10:30-Noon
    Room A102/104/106

    Accuracy and Precision of Distributed Memory Crash Simulation

    Clemens-August, GMD National Research Center for Information Technology
    [Authors: Clemens-August Thole, Juergen Bendisch, Otto Kolp, Mei Liquan, Hartmuth von Trotha, Fraunhofer Institute for Algorithms and Scientific Computing]

    Numerical crash simulation on parallel machines shows indeterministic results for certain car models. For a specific BMW car model, the node positions of the crashed model may show up differences of up to 14 cm between several executions on a parallel machine for the same input deck.

    Detailed investigations have shown that these effects are not a result of the parallel execution or a wrong implementation. The indeterminism of the parallel execution is a direct result of numerical bifurcations. These bifurcations are caused either by the numerical algorithms or are a feature of the car design.

    In the case of the BMW car model, the scatter of the simulation results is a direct consequence of buckling of the motor carrier. A slight redesign of this motor carrier resulted in stable simulation results for parallel machines.

    Systems and Software Technology for Automotive Crash Simulation
    Ed Turkel, Compaq Computers
    Jean-Pierre Bobineau, Radioss Consulting Corporation
    [Authors: Francis Arnaudeau, Eric Lequiniou, MECALOG SARL; Ed Turkel, Compaq Computer Corporation; Martin Walker, Compaq Computer EMEA; Jean-Pierre Bobineau, Radioss Consulting Corporation]

    As a result of increasing consumer and regulatory demands, the automotive industry is investing heavily in simulation technology to improve the crash-worthiness of their vehicles. Recent studies have shown that crash simulation is the single largest consumer of system resources in the engineering computing centers of the major automotive manufacturers worldwide.

    The trends in the use of crash simulation include:

    • Increasing the accuracy of simulation by increasing the amount of detail in vehicle models while increasing the model's resolution, resulting in much larger models

    • Increasing demand for more use of crash simulation to facilitate vehicle design decisions, including the use of statistical techniques to optimize vehicle designs.

    • Increasing pressure to reduce the cost of vehicle development, resulting in greater use of simulation in virtual prototyping, while also raising the visibility of IT costs, putting pressure to lower the cost of computing.


    The result of these trends is the increasing use and size of simulation, while also reducing its cost. MECALOG and COMPAQ are collaborating to develop crash simulation solutions that utilize parallel processing on distributed memory systems to provide highly scalable and accurate simulations, while driving the cost of simulation down. The authors will discuss the parallel processing technology used in MECALOG's RADIOSS-Crash combined with the systems technology in COMPAQ's distributed-memory/clustered Tru64 UNIX and LINUX-based Alpha systems, that enable significant improvements in simulation performance and accuracy while driving the cost of simulation down.