Denver Info  


Tech Papers  

HPC Games  

SC Global  
Contact Info  
Student Vol.  
SC History  

Exhibitor Forum

THURSDAY November 15th
High-Performance BioInformatics
Gerald Lipchus, Director of Sales, Scientific Computing Associates
High-performance computing on cost effective clusters of commodity processors is of growing importance in the life sciences. Scientific Computing Associates, Inc. and TurboGenomics, Inc. are partnering to help the life science community leverage this critical technology. This presentation describes several recent initiatives that make the power of clusters more accessible to the broad user community. A discussion of several TurboGenomics products including TurboBLAST, a high performance, parallel version of NCBO BLAST, will be presented.
Cracking MPI/OpenMP Performance Problems
Karl Solchenbach, Managing Director, Pallas
Cluster computing has emerged as a defacto standard in parallel computing over the last decade. Now, researchers have begun to use clustered, shared-memory multiprocessors (SMPs) to attack some of the largest and most complex scientific calculations in the world today. To program these clustered systems, people use MPI, OpenMP, or a combination of MPI and OpenMP. However analyzing the performance of MPI/OpenMP programs is difficult. While there are several existing tools to analyze performance of either MPI or OpenMP programs efficiently, development to combine these into tighly-integrated tools is just on its way.
Pallas GmbH and KAI Software have partnered with the Department of Energy through an ASCI Pathforward contract to develop a tool called Vampir/GuideView, or VGV. This tool combines the richness of the existing tools, Vampir for MPI, and GuideView for OpenMP, into a single, tightly-integrated performance analysis tool. From the outset, its design targets performance analysis on systems with thousands of processors.
ChaMPIon/Pro: A Next-generation, Terascale MPI-2
Anthony Skjellum, President, MPI Software Technology, Inc.
Current MPI implementations often date back to the Argonne/Mississippi State model implementation, MPICH. This early model implementation development led to huge usage of MPI-1.2 throughout the world. Later, ROMIO was added as a way to provide public-domain support for most of MPI I/O, a key part of MPI-2. ChaMPIon/Pro provides a commercial, highly scalable implementation of the MPI-2 standard, based on a software design that itself is two generations newer than MPICH. It also offers a next-generation alternative to our current commercial-grade MPI-1.2 system, MPI/Pro. Whereas MPI/Pro provides a thread-safe, high-performance MPI-1.2 implementation, we learned a lot in this process that has been brought to bear in an entirely new design. This new design includes substantial performance and interoperability requirements posed for huge-scale systems, up to 15,000 processors or more.

ChaMPIon/Pro targets the largest DOE supercomputers and superclusters, including ASCI White, Sandia Cplant, ASCI Purple, and ASCI "Q". This talk covers the effort to create this MPI implementation, and its future planned commercial uses outside of the DOE Pathforward progam that have initially funded its creation. In particular, we compare to our current MPI/Pro product, and explain the advantages of the next-generation MPI, particularly for systems of huge scale.
PGI Linux Compilers and Tools for High Performance Computing
Vincent Schuster, Director AST Portland Lab, STMicroelectronics
In combination with Linux, the PGI® CDK™ Cluster Development Kit™ enables turnkey use of networked clusters of x86-compatible systems for HPC users. The PGI Fortran, C and C++ compilers include full native support for OpenMP shared-memory parallel programming, SSE and prefetch optimizations for IA32 processors, and optimizations for Athlon processors. The PGI CDK also includes tools for debugging and profiling of shared-memory OpenMP and distributed-memory MPI applications. A custom installer builds and installs preconfigured versions of the most common open source cluster utilities: MPI, the PBS batch queuing system, the ScaLAPACK parallel math library and PVM. The PGI CDK also includes a large number of training, example and tutorial programs. This presentation provides overview of the PGI CDK, new tools, and applications and benchmark results using the PGI CDK.
12NN - 12:30PM
ConsoleWorks the Web Based Console Management Solution - for High Performance Technical Computing (HPTC)
William D. Johnson, CEO, TECSys Development, Inc..
This session will cover system architecture, configuration, integration, features, benefits, upgrade paths, plus migration from PCM to ConsoleWorks. Coverage will include how to, why, and benefits of upgrading or implementing a Web based console management solution. If you have not implemented a console management solution, this session will teach you what you need to know. Specifically, this is not marketing hype; it's console management at its core. This will be a technical discussion. Attendees should have working knowledge of terminal servers, networks, the Web, firewalls, telnet, Java script, Java and Web browsers.
Evolution of I/O Connection Technologies and Their Effect on Storage Architecture
Rip Wilson, Product Marketing Manager, LSI Logic Storage Systems, Inc.
One of the keys to high performance computing is the speed at which data is transferred from the storage system(s) to the server(s). In the last few years, we've seen this pipe evolve from 40 MB/s SCSI to 200 MB/s Fibre Channel. And with 4-Gbit and 10-Gbit Fibre Channel and InfiniBand on the horizon, it's only going to get faster.

As storage vendors strive to get the latest and fastest host I/O connections on their arrays, the storage architects are working to optimize the controller design to take full advantage of the new speeds. They must consider internal bandwidth, single bus or multiple, cache design and size, and back-end I/O. All components of the storage controller must come together to effectively and efficiently take advantage of emerging technology and ensure the storage array can deliver the performance enabled by its host I/O connectivity.

This presentations looks at the evolution of I/O connections and their effect on storage system design.
1PM - 1:30PM
IBM Unparalleled Performance and Petaflop Research
Surjit Chana, VP HPC Marketing and Dave Jensen, Sr. Mgr, IBM Research
Surjit Chana will briefly review IBM's recent POWER4 and Cluster announcements. Dave Jensen will then discuss the expansion of IBM's Blue Gene research project. On Friday, November 9th, IBM announced a partnership with the Department of Energy's National Nuclear Security Agency to expand IBM's Blue Gene research project. IBM and NNSA's Lawrence Livermore National Laboratory will jointly design a new supercomputer based on the Blue Gene architecture. Called Blue Gene/L, the machine will be 15 times faster, consume 15 times less power per computation and be 50 to 100 times smaller than today's fastest supercomputers. Blue Gene/L is a new member of the IBM Blue Gene family, marking a major expansion of the Blue Gene project. Blue Gene/L is expected to operate at about 200 teraflops (200 trillion operations per second) which is larger than the total computing power of the top 500 supercomputers in the world today. IBM will continue to build a petaflop-scale (one quadrillion operations per second) machine for a range of projects in the life sciences, originally announced in December 1999.
Meta-computing as the Key to Distributed, Multi-disciplinary Simulation
John A. Benek, Principal Scientist, Raytheon
The Department of Defense (DoD), Simulation Based Acquisition (SBA) strategy requires increasing levels of simulation and simulation fidelity. Successful competitors must demonstrate the superiority of their designs largely through simulations; therefore, selections will be made primarily on the quality of the simulations. Since major weapon systems are typically designed by teams of contractors or by single contractors with widely dispersed design groups, a means must be found that will allow the best modeling capability of the team members to be used, while protecting their proprietary data and models. This methodology must also provide a simple way of connecting models at distributed sites. Building blocks for this capability include the DoD HLA initiative, the Defense Research and Engineering Network (DREN), as well as several modeling environments that are currently under development. However, the key to combining these elements into the comprehensive structure required to adequately support SBA, consists of elements of a meta-computing architecture that can connect heterogeneous computing resources, including computers, operating systems and networks, and models seamlessly and transparently to the user. This presentation will describe Raytheon's vision of this architecture and progress toward its implementation.
New Architecture and Challenges in Creating Networks for the Teragrid
Wesley K. Kaplow, CTO, Government Systems Division, Qwest Communications
The combination of teraflop supercomputing clusters and multi-gigabit wide area networks has enabled the long awaited era of the TeraGrid to begin. However, issues such as the optimum architecture to ensure scalability still remain. Long-haul optical transmission systems can now provide around 120, ten-Gbps channels. These have been used in nationwide deployments, enabling point-to-point communication in the tens of Gbps. However, the key is creating a network with the capability to scale dozens of endpoints with predictability in performance. There are a variety of network standards that can be used to create TeraGrid networks. Two are 10 Gigabit Ethernet and OC-192 IP Packet-over-SONET. Both of these can create fully meshed networks providing an initial implementation. However, the cost-effective scalability of these approaches alone is uncertain, and the next generation of integration of the optical transport layer with switching and routing may be needed. The evolution of optical transport gear with the ability to provision via a mechanism such as G-MPLS may be necessary to allow the sharing of the optical transport layer.

The current state of optical transport infrastructure, current vendor switch and router hardware, and cost-effective scalability TeraGrid architectures will be discussed.
Open Inventor/CAVELib Integration
Mike Heck, VP of R&D, TGS, Inc.
TGS and VRCO are working together to improve the immersive environment. The resulting new Open Inventor(TM)/CAVELib(TM) configuration will include a new Open Inventor/CAVELib layer that integrates CAVELib and Open Inventor making the two libraries significantly easier to use. VRCO will license Open Inventor from TGS and use it in the development of new products, including a new version of VRScape(R).

This combination of CAVELib and Open Inventor will overcome the majority of past problems faced when trying to create robust, flexible and multi-platform applications that will support advanced displays, collaboration, and interaction technologies. This solution will be enabled by new multi-threaded versions of Open Inventor and CAVELib. This breakthrough technology allows multiple rendering threads to share a single copy of the scene graph, saving memory and simplifying management of the scene graph. This capability will present to the market a cross-platform solution never before available.

Using an object-oriented scene graph API offers many advantages over programming directly to low-level graphics APIs. For example, Open Inventor provides a higher-level abstraction and built-in optimizations. Compared to platform-specific APIs, Open Inventor protects your software investment by allowing migration to many platforms. In this way it perfectly complements the portability of CAVELib. Compared to open source scene graph APIs, Open Inventor from TGS provides far more features, a history of successful use in major applications, and dedicated and highly responsive product support. Using Open Inventor also enables the use of powerful extension classes available from TGS for 3D data visualization and volume rendering.
A New Class of Challenges in Commercial HPC
Andrew Grimshaw, CTO, Avaki Corporation
Varied sectors of the business world are experiencing a new class of challenges in the realm of high performance computing. Some examples:
(1) Biotech: Against the backdrop of advanced biotechnologies such as genomics and proteomics exists a complex web of relationships between companies, institutions, and individuals that demand the secure sharing and management of applications and extremely large, proprietary data sets across organizational boundaries.

(2) Financial Services: F.S. organizations rely on mission-critical, deadline-contingent simulations that demand large amounts of processing power. Requirements include the ability to manage and recover from failures in real-time and federate existing enterprise computing resources.

(3) Engineering-Intensive Manufacturing: Collaboration across organizational boundaries, in the forms of data-sharing and complex application to application (A2A) interactions, is an emerging requirement for EIM enterprises, driven by the need to radically collapse product development lifecycles and increase product quality.

Dr. Andrew Grimshaw, chief architect of one of the world's leading Grid computing projects (Legion) and CTO of Avaki Corporation, will discuss how companies are approaching these challenges today and will present his vision of how a distributed, pervasive, peer-oriented architecture can elegantly address such challenges in the future.
Smarter AND Faster: Supercomputing with FPGAs
Richard Loosemore, Director of Research, Star Bridge Systems, Inc.
What is the point of fast computers if they are so viciously hard to program? The question is relevant for "ordinary" computers, but it seems to be even more devastating when applied to FPGA computers. In this presentation we argue three things: (a) the payoff is so huge, it is worth trying to build and program FPGA computers; (b) the problems involved in programming FPGA computers (assuming we want to squeeze the maximum performance out of them) are so huge that we are forced to rebuild all our ideas from scratch; (c) surprisingly, once the old ideas about programming massively parallel machines are torn down and rebuilt, there is a new approach that can make FPGA computers much easier to program than "ordinary" machines.

The new approach involves hyperspecificity (choosing circuitry on the fly to custom fit both the task and the required data rate), massive application of weak constraints (smart parallelism in the compiler and elsewhere), and a liberal dose of psychology in both the circuit architecture and the interface seen by the developer. With these ingredients, a thousand-fold increase in compute density is not just possible, it might actually be usable.

©2001 SC2001  |  Web Feedback  |  SC2001 Info
SC2001 is sponsored by the IEEE Computer Society and ACM