What kinds of projects does the HAGSC do?


Genome improvement and finishing: We take draft shotgun genomes and fix them up into complete and accurate genome sequences. We work on algal and fungal genomes to bring them to completion within the limits of current technology and also work on large plant genomes to increase contiguity and finish the captured genomic sequence. These improved or finished genomes are referred to as reference genomes.

Genomic region based sequencing: Once a QTL or genomic region responsible for a phenotype has been identified, it is necessary to sequence the surrounding region so the genomic sequence responsible for the trait can be identified. We sequence small regions as in the case of stickleback or butterflies or large regions like 5% of the soybean genome.

Whole genome shotgun assemblies (WGSA): We build genomic assemblies from shotgun sequenced reads and produce genomic sequence releases that have been screened for contamination and controlled for the quality and completeness of the genome. We build a variety of eukaryotic genome assemblies, but specialize in plants and difficult to assemble WGSAs.

BAC End Sequencing (BES): To increase the long range contiguity of large genome sequences, we sequence bacterial artificial chromosome (BAC) paired end sequences. These BES are difficult to sequence but they add vital 100-180 kilobase links to a WGSA. BES is also a useful sampling technique for comparing near-relative genomes to identify rearrangements and large variations.

Quality control sequences for WGSAs: We sequence BAC and fosmid based clones and then finish them to above Bermuda standard so that these can be used to assess the accuracy and quality of the genomic sequences being produced by the DOE.

Expression Sequence Tags (ESTs): EST sequencing allows one to survey the transcribed sequences (the gene sequences) of an organism. We sequence EST libraries so that we can identify commonly expressed genes and so that we can compare between different life stages or different growth conditions of the same organism.

Full-length complementary DNA (FLcDNA) sequencing: In order to annotate (identify all of the genes) a genome accurately we need reference gene transcripts to train the gene identification algorithms. FLcDNA sequencing allow us to fully cover the coding region of a gene and to annotate the 5’ and 3’ untranslated regions of genes. Once the gene callers have been well trained using these reference transcripts they can the more accurately predict the structure of unknown transcripts.

SNP discovery and genetic mapping: We sequence directed regions of genomes to identify variation with the reference strain, so called SNPs (single nucleotide polymorphisms) which can cause functional changes. We also use PCR to amplify and sequence regions of mapping populations in order to build recombination based genetic maps. These maps are crucial in ordering assembled genomic sections into full chromosomes as they provide order and orientation information that is sequence independent and spans very large genomic distances.