Dfam: sustainable growth, curation support, and improved quality for mobile element annotation

  • Hubley, Robert (PI)
  • Smit, Arian A.F (CoPI)
  • Wheeler, Travis J. (CoPI)

Project: Research

Grant Details

Description

Project Summary / Abstract Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Thorough and accurate annotation of repetitive content in genomes depends on a comprehensive database of known TEs, along with robust statistical and procedural methods for recognizing decayed instances of elements and disentangling their complex relationships. Annotation of TE instances is usually performed using our RepeatMasker software, which compares a genome to a database containing representations of known repeat families. These have historically been consensus sequences, which generally approximate the sequences of the original TEs. Our Dfam database is an open access collection of repetitive DNA families, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). We have demonstrated that profile HMMs support improved annotation sensitivity, and Dfam provides numerous aids to both curators of TE families and those who make use of the resulting annotations. During the life of this grant, the database has grown to include families belonging to more than 1000 species (from a baseline of 5). This growth has introduced a number of scale-based pressures, which in some cases have forced us to reduce Dfam functionality in response, and in other cases highlighted ways that the resource can better meet the needs of the community. Our proposed efforts largely target these matters while continuing to expand and diversify the resource.
StatusActive
Effective start/end date08/15/1806/30/24

Funding

  • National Human Genome Research Institute: $665,174.00
  • National Human Genome Research Institute: $540,645.00
  • National Human Genome Research Institute: $600,717.00
  • National Human Genome Research Institute: $607,378.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.