A critical component of a successful genome sequencing project is to discover the genes contained within the genome. This step, called gene annotation, is particularly difficult. One approach to gene annotation is to sequence the RNA molecules found in the organism, and map these assembled transcripts back onto the newly assembled genome. This is what was done, with help from NCGAS, for the loblolly pine, which is at the center of a major multi-site sequencing effort. A paper detailing this work, “Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation,” was recently published in Genetics: http://www.genetics.org/content/196/3/891.
The loblolly pine is the most economically important tree in the United States, and the source of most of the wood pulp used to produce paper products. A complete and annotated genome will be used by plant breeders to develop strains of the tree optimized for different growing conditions, or resistant to environmental or biological threats such as drought or disease. This project is particularly difficult because the Loblolly pine genome is the largest plant genome yet sequenced, and is seven times larger than the human genome.
NCGAS bioinformatician Le-Shin Wu, working in close partnership with Indiana University faculty member Keithanne Mockaitis, provided bioinformatic assistance in running de novo RNA-sequence assemblies, and technical support with the Mason cluster. NCGAS additionally provided computational resources specifically designed to support these sorts of compute jobs.
Figure 1. The loblolly pine is the most commercially important tree species in the US and the source of much paper manufactured here. It contains the largest genome yet sequenced. Image source: Woodlot from Wikipedia.