TUESDAY, NOVEMBER 13

  • EMERGING LIFE SCIENCES II
    Chair: Ty Rabe, Compaq Computer Corporation
    Time: 1:30-3:00 PM
    Room A207/209

    The Computational Impact of Genomics on Biotechnology R&D
    Scooter Morris, Genentech

    The excitement surrounding the completion of the first draft sequence of the Human Genome has created significant interest in genomics, the study of entire genomes, and has greatly influenced how the pharmaceutical industry discovers and characterizes new medicinal compounds. Two major areas of impact are the way in which we approach the search for new proteins of interest and the computational tools that we use to perform that search. Increasingly, the search for new proteins or lead compounds has moved from the laboratory to the computer. This migration has not been sudden, and is not a direct result of the completion of the Human Genome sequencing effort, but rather is a result of the increasing capacity and performance of modern computing systems, and the incredible increase in the amount of available sequence data. This talk will discuss both of these evolutions, and conclude with a survey of the current computing systems and architectures that are used in the biotech industry for primary sequence discovery and other research activities.

    New Wine in Old Bottles: The Use of Vector Processors and Fine-Grained Parallelism in Genomic Analysis
    Stanley K. Burt, Advanced Biomedical Computing Center, National Cancer Institute
    [Authors: Jack Collins, Robert Stephens, and Stanley K. Burt]

    The common assumption is that biological data analysis problems are suitable for parallel computing, particularly by cluster computing. However, certain problems, in which very large amounts of data are involved, can be approached by other computer techniques. We demonstrate in a new methodology that certain techniques used for cryptography can be useful for pattern recognition in biological research, such as finding tandem repeats in DNA sequences. This new method takes advantage of special hardware capabilities of the Cray computer architecture, the vector registers, large shared memory, fine grain parallelism, and also leverages additional speedup from sequence compression.

    The identification of simple tandem repeats within DNA sequences is an extremely powerful tool for exploring genomes. These specific repeat elements (or microsatellites) are frequently polymorphic and thus can be used for many purposes ranging from diagnostic primers used to increase the marker density surrounding regions of interest, mapping new genes and forensic sciences. We report here the development of a new, extremely rapid tandem repeat finder that exhaustively determines all possible repetitive elements up to 16 bases in length.

    We describe and demonstrate the utility of the method in the identification of simple tandem repeats within the entire human genome. By focusing on known coding regions, we find many repeats, possibly linked to diseases, previously not described. The data has been assembled into a relational database that is web-accessible and allows searching for elements based on genomic region or gene. Beyond this particular application, this methodology will allow analyses that previously were beyond current computational capabilities.