Genomic Data Commons offers the greatest resource in cancer genomics

The National Cancer Institute’s (GDC) Genomic Data Commons, launched in 2016 by Vice President Joseph Biden and held at the University of Chicago, has become one of the largest and most widely used resources. in cancer genomics, with more than 3.3 petabytes of data from more than 65 projects and more than 84,000 anonymous patient cases, serving more than 50,000 unique users per month.

In new papers published on 22 February in Nature Communication and The genetics of nature, the UChicago-based research team is sharing new details about the GDC, which is funded by the National Cancer Institute (NCI), through a subcontract with the Frederick National Laboratory for Cancer Research, run by currently owned by Leidos Biomedical Research, Inc.

One of the papers describes the design and operation of the GDC. The other describes the pipelines used by the GDC for the harmonization of data submitted to the GDC and the generation of data used by the GDC research community.

The goal of the GDC is to provide the cancer research community with a database of similarly processed genomic and related clinical data that enables data sharing and collaborative analysis to support treatment. detailed.

Data production for the GDC would begin in June 2015 using a private cloud. After just one year, the GDC had analyzed more than 50,000 raw order data submissions. The GDC includes genomic, transcriptomic, epigenomic, proteomic, clinical, and imaging data. The treatment tubes described in Nature’s paper have extracted more than 1,660 TB of data on more than two dozen types of primary cancer. This data is stored within the GDC Data Portal, where it is available for viewing and downloading.

Along with the data portal, the GDC also offers additional user resources, including GDC Data Analysis, Image, and Analysis (DAVE) Tools for interactive analysis of modified data genomic or specific mutation; GDC Data Access Port for entering data; GDC Data Transfer Tool (DTT) for downloading large genomic data; and GDC data synchronization system, which allows users to submit data submitted to the GDC through the concurrency processing pipes.

These data play a crucial role. As data collects, it will become easier to identify new markers as important targets for understanding cancer biology. In addition, the data sharing infrastructure can inform research studies, giving us new insights into genetic differences between individuals and how it may affect cancer patient outcomes. “

Robert Grossman, PhD, Principal Investigator, Genomic Data Commons, Director, Center for Translational Data Science, University of Chicago

Source:

University of Chicago Medical Center

Magazine Reference:

Zhang, Z., et al. (2021) Analysis of uniform genomic data in NCI Genomic Data Commons. Nature Communication. doi.org/10.1038/s41467-021-21254-9.

.Source