Skip to main content

BLAST on the Open Science Grid

PTI staff from Indiana University’s High Throughput Computing group and the National Center for Genome Analysis Support have extended the Galaxy web portal to run large jobs on the NSF-funded Open Science Grid.

Research and discovery May 14, 2020

The popular web portal Galaxy, which life scientists use to analyze a wide array of sequence data, has been extended to allow users to submit BLAST jobs that run on the Open Science Grid (OSG).

A seemingly small change to a web page represents a major step forward in relieving a computational bottleneck that biologists and medical researchers encounter. The largest computational challenge facing life scientists is comparing new DNA, RNA, and Protein sequences with other known sequences to gain insights into the function of the new sequence. The most commonly used tool for this is a program known as BLAST. Genomic researchers often wait up to three or more weeks for BLAST to analyze a set of new sequences. Not only is this slow, but it also consumes local computer resources. In addition, scientists frequently use the popular Galaxy web portal to run smaller BLAST jobs along with hundreds of other analytic tools, but are forced to use different and more complex tools for running the larger jobs. The OSG provides a place to run very large computational jobs in parallel, but life scientists found it unapproachable. Life scientists can now run BLAST on systems across the nation, thanks to a system by which the Galaxy web portal automatically breaks apart large BLAST jobs and submits them to the OSG. This relieves stress on local computational systems, and has the added benefit of allowing jobs to complete more quickly.

Figure 1. This image shows the web page researchers see when they use the NCGAS instance of the Galaxy web portal to analyze their next-generation DNA or RNA sequence data. The option to run BLAST on the Open Science Grid provides a major enhancement in functionality.

PTI staff from Indiana University’s High Throughput Computing group and the National Center for Genome Analysis Support have extended the Galaxy web portal to run large jobs on the NSF-funded Open Science Grid.

Project Leads: Robert Quick, Richard LeDuc, and Bill Barnett

High Throughput Computing & National Center for Genome Analysis Support, Science Community Tools Group, UITS Research Technologies 

NSF GSS Codes:
Primary Field: Genetics (610) - Genome Sciences/Genomics 
Secondary Field: Computer Science (401) Computer Systems Analysis

More stories