Capacity: BBSRC-NSF/BIO: Globally harmonized re-analysis of Data Independent Acquisition (DIA) proteomics datasets enables the creation of new resources (DIA-eXchange)

Project: Research

Grant Details

Description

The research community has generated millions of datasets that were used to answer scientific questions of interest, and deposited those datasets into public data repositories as part of the mandated data sharing policies of funding agencies worldwide. Especially for datasets generated to measure the protein content of biological samples, substantial additional information can be extracted from these public datasets using newer techniques, newer versions of software, and newer reference information during re-analysis. This project will develop a software infrastructure for globally harmonized re-analysis of an emerging type of proteomics dataset to extract new information from older data. The infrastructure will improve our ability to turn proteomics data into knowledge about global relative protein abundances and about interactions between proteins inferred from how protein abundances are correlated. The project will also provide opportunities for students to learn skills in scientific data analysis. This project is an international collaboration with the Proteomics Team at EMBL-EBI (Hinxton, UK) and the Institute of Systems, Molecular and Integrative Biology (ISMIB) at the University of Liverpool (UoL; UK).This project will make the datasets generated by the method known as data-independent acquisition (DIA) mass spectrometry proteomics more findable, accessible, interoperable, and reusable (FAIR). This will be accomplished by developing an indexing system for the libraries of mass spectra that are used to analyze the data from DIA experiments. The project will further develop benchmarks and guidelines to advance the field, and then develop data analysis pipelines to process thousands of public experiments at scale and make the results available to the research community. The resulting protein abundance maps across thousands of experiments will enable the development of protein interaction maps via inference from protein co-expression patterns extracted from these datasets, something that can only be accomplished by thousands of datasets analyzed in unison. Students will be given opportunities to participate in the work in order to build their skills, and all software and data products will be made publicly available to further advance the field.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date09/1/2308/31/26

Funding

  • National Science Foundation: $1,200,698.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.