Chair: Barbara Horner-Miller, Arctic Region Supercomputing Center
    Time: 10:30-Noon
    Room A207/209

    Applications in Computational Biology and Computational Chemistry: Similarities and Differences

    Eamonn O'Toole, Compaq Computers

    Computation is rapidly assuming a central role in biology. Such recent feats as the completion of the draft human-genomes by Celera Genomics and the International Human Genome Sequencing Consortium and their annotation would not have been possible without significant computational resources. Some of the largest computers in private and public hands are now devoted to biological problems. Chemists have made extensive use of computation for considerably longer than biologists, and computational chemists are responsible for some of the largest and most important scientific applications in current use.

    Chemists and biologists tackle many problems that are closely related. In addition, some large computer installations are shared between chemists and biologists. We will outline some of our experiences benchmarking and analyzing performance of a number of applications in computational chemistry and biology. We are particularly interested in similarities and differences between these applications, their behavior and the stresses that they place on systems. This work is part of an on-going program to better understand these applications.

    Computing Requirements for the Bioinformatics Revolution: Genomics, Proteomics and Cellular Machinery

    Bill Camp, Sandia National Laboratory

    Biology and medicine are entering a new age dominated by high-throughput experimentation and computation. How quickly the door to that future is opening is signalled by the staggering progress in developing a map of the human genome as well as those of several other species. Those efforts have required unprecedented increases in experimental output made possible in part by the the development of shotgun sequencing and the subsequent automation of DNA sequencing. The key to using these high throughput methods has been the ability to reconstruct the entire sequence from a redundant set of small fragment sequences. This assembly is in turn enabled by the introduction of extreme computing into biology by Celera Genomics and its attendant use by both Celera and the Public Genome Project. This first stage of genomic biology utilized terascale computing for the first time in biology. It was dominated by informaticsthe searching , comparison and manipulation of large alphanumeric datasets. Little or no floating point computation was required; and the work was essentially embarrassingly parallel and dominated by unusually intense data I/O. On-going refinements of the genomics computing methodologies include reducing I/O requirments by replacing I/O with parallel computing techniques.

    In the next stage, early stage proteomics, the calculations will be even larger than those required for genomics. However, they still will have a strong informatics flavor. Since the calculations will be larger, there will be even more pressure on I/O systems which will in turn drive additional emphasis on true parallel computing methods.

    Later stages in proteomics which include elucidation of protein structure and function and protein-protein interactions will see orders-of-magnitude increase in computing requirements with increased emphasis on floating-point operations. The long term goals for computation in biology and medicine include gene-based species modification (e.g. for agriculture or environmental remediation) and intervention strategies for medicine and counter-terrorism. In all cases, being able to understand protein structure and function will be a critical step. Further refinement will involve understanding signalling and metabolic pathways, enzymatic catalysis,protein target identification as well as the design of interventions. These longer-term goals pose computing challenges which are both floating point intense and by current estimates will involve sustained, affordable petascale processing.

    Finally possibly the most exotic challenge posed by biology is that of simulating the biological cell at a system level. This involves far more than adapting the simulation methods of physics, chemistry and engineering to biological systems-- although that will be a critical component of the solution. The cell is so much more complex than anything we have attempted to simulate in the past; and so many of the underlying processes will remain under-characterized that radically new simulation methodologies will be necessitated: for example, our inability to characterize underlying details (e.g. a myriad of reaction rates) may require inherently non-deterministic simulation methods. Of course, a detailed atomistic simulation of even the simplest cell is not only computational infeasible for the foreseeable future but also would be so complex as to challenge interpretability.

    I will discuss architectural strategies for meeting the computing challenges of the revolution in biological sciences, including scaling needs for computation, I/O and networking.