Skip to main content

Collaboration Between Biology and Research Technologies Develops Open-Source Genomics Tool

Biology and Computer Science Professor Matthew Hahn has developed a customizable tool to trace the evolution of gene expression

News and events Aug 29, 2023
Tomato varieties are one type of organism that can be classified phylogenetically, or by family type, by tracing mutations using a new so...
Tomato varieties are one type of organism that can be classified phylogenetically, or by family type, by tracing mutations using a new software called CAGEE, developed by IU professor Matthew Hahn.

The study of an organism’s family tree, known as phylogenetics, can inform biological connections across millions of years. In contemporary bioinformatics research, computers analyze genomes, or genetic codes with millions of data points, in search of patterns and changes over thousands to millions of generations. Using a variety of high performance computing (HPC) infrastructure at Indiana University, including Slate-Scratch, Quartz supercomputer, and Big Red 200 supercomputer, Matthew Hahn, Distinguished Professor of Biology and Computer Science at IU, has developed a tool to study the change in a gene’s level of expression.

Matthew Hahn, Distinguished Professor of Biology and Computer Science at I... Matthew Hahn, Distinguished Professor of Biology and Computer Science at Indiana University Bloomington

Called CAGEE (Computational Analysis of Gene Expression Evolution), the software was initially used by IU Professor of Biology Leonie Moyle, whose lab researches wild tomatoes, and applied to gene expression in tomatoes and their ancestors. Hahn says that the software can handle genetic data from any species and could be used in a variety of genetic modeling research applications.

“The software is scalable, so it’s possible to study 10 species or 100, depending on your dataset,” said Hahn.

“I’ve used every system. As a biologist, it is a great opportunity to collaborate with the skilled people supporting IU’s HPC resources.” Hahn said.

Leonie Moyle, professor of Biology at Indiana University

CAGEE solves a research problem in the field of genomics: combining multiple species into a single analysis. Because species are related in complex ways, each species cannot just be analyzed on its own, explained Hahn. CAGEE can handle the complexity of finding patterns of gene expression across small and large sets of species, making it a flexible option for biologists. “Researchers are trained to ask the right questions,” said Hahn. “CAGEE can give them answers faster than if they had to build their own tools from scratch.”

Hahn has been working on the interdisciplinary software needs of bioinformatics since he was a doctoral student at Duke. “Parts of the code for CAGEE were a part of my thesis in a program called CAFE,” he said. Several revisions later, Hahn entrusted Research Technologies (RT) developer Ben Fulton to rewrite CAFE (Computational Analysis of (gene) Family Evolution) from C into C++ to make it compatible with today’s research. CAGEE was developed from the codebase Fulton wrote for CAFE. “Software consulting for Dr. Hahn and his lab not only supports IU’s research footprint, but how research is conducted across biology—an exciting, but challenging project,” said Fulton.

In more than 20 years, Hahn says his academic career has grown along with the computational infrastructure at IU. “I’ve used every system,” he said, and credits part of his work’s success to the expertise of engineers like Fulton and the Research Technologies team. “As a biologist, it is a great opportunity to collaborate with the skilled people supporting IU’s HPC resources,” said Hahn.

Learn more about Research Technologies software consulting services.

Check out the CAGEE software on Github.

More stories

Attendees of Lightning Round Panel
UITS