Advancing Science through Data Sharing

A black and white line drawing of Gutenberg’s printing press, showing a man pulling a lever on the printing press machine. A photograph of a cloud server, showing an aisle of computer servers behind glass doors, illuminated by fluorescent lights.
A black and white line drawing of Gutenberg’s printing press, showing a man pulling a lever on the printing press machine.(L) A photograph of a cloud server, showing an aisle of computer servers behind glass doors, illuminated by fluorescent lights.(R) Photo credit: iStockphoto

Around 1440, Gutenberg invented the printing press and since then science has been communicated in “papers.” However, current and future technologies for data storage and sharing offer new opportunities to enhance the value of meticulously collected data. For instance, such data might be useful in exploring analyses other than those reported in the Results section of a “paper.” They might be useful in combination with a different dataset collected by a post-doctoral researcher on the other side of the country, or even the world. A new discovery or hypothesis that comes years after a dataset was collected might motivate another look at the data, but only if it is findable and interpretable. There are also valuable datasets from rigorously designed experiments that are never published in a “paper” but could inform the community if available. But most importantly, the absolute size and complexity of some neuroscience datasets necessitates that they are available to bioinformaticians or mathematicians to extract those patterns that reflect fundamental rules of biology. In each of these cases (and there are surely many more), this extended value of data will only be possible if the data and metadata are presented and explained to be properly interpretable. Support for this new data-rich era of science is of critical importance to NIH and NINDS as well.

After soliciting broad public input, NIH has issued a final data management and sharing policy (DMS Policy), effective January 25, 2023, which will require all NIH-funded research applicants to describe and comply with a suitable data management and sharing plan. Some funding organizations already require data to be posted in accredited data bases. The new NIH Policy includes the “expectation that researchers will maximize appropriate data sharing” and includes the “necessary flexibility for researchers to accommodate the substantial variety in research fields, projects and data types that this expectation will encompass.” It addresses timing as well: “scientific data should be made accessible as soon as possible and no later than the time of an associated publication or the end of the award/support period, whichever comes first.” It also calls for sharing of scientific data generated with NIH funding regardless of whether it is the basis of a publication. NIH funds for “personnel costs required to perform the data management and sharing activities” are allowable to include in the grant’s budget.

Looking ahead to 2023, NINDS is examining how to maximize the value and reuse of data generated by our programs, as well as training the workforce to use that data and ensuring data are managed with protections for security and privacy. Data sharing is just one of the cross-cutting strategies included in the recently released NINDS Strategic Plan, and we are working to develop the necessary infrastructure and resources to take advantage of data science and foster the sharing of high value data among the research community. Data science methods are essential across the full spectrum of basic, translational, and clinical studies. As technology continues to rapidly develop, and analysis tools become increasingly powerful, the effort involved in creating, curating, harmonizing, storing, accessing, and reusing neuroscience data will continue to grow. Accordingly, NINDS will continue to work proactively to realize efficiencies and streamline adoption of these practices across the scientific workforce.

These data science efforts align with broader data activities at NIH, which include the work of the Office of Data Science Strategy (ODSS). ODSS leads implementation of the NIH Strategic Plan for Data Science by working closely with NIH institutes, centers, and offices to foster scientific, technical, and operational collaboration. At the most recent NINDS Advisory Council meeting, ODSS Director Dr. Susan Gregurick spoke about creating an integrated cloud-based ecosystem through NIH Cloud Platform Interoperability and private partnerships; collaborating with the National Science Foundation to accelerate innovations in computer science, information science, and engineering; and activities to address workforce needs, ethics, bias, and transparency for artificial intelligence in biomedicine. Collectively, NIH is considering the best ways to share the vast amount of data meaningfully and productively across the biomedical research enterprise, because wise stewardship and implementation approaches are needed to strengthen and advance NIH-funded science.

Toward this end, NINDS, led by Dr. Lyn Jakeman, Director of our Division of Neuroscience, is developing a Data Science Plan to guide the Institute in developing data sharing principles, policies, infrastructure, and resources to maximize the opportunities, value, and cost effectiveness of its research investments and of people’s participation in clinical research. Subcommittees of NINDS staff are carefully considering  several broad areas of data science, as framed by the NIH Strategic Plan for Data Science: building an infrastructure where needed to support FAIR data (i.e., findable, accessible, interoperable, and reusable), the TRUST principles, and facilitate user access; supporting development of tools and resources for curation, harmonization, and novel analyses; identifying and developing training opportunities in data science for researchers, clinicians, and NINDS staff; and managing ethical and sustainable stewardship over NINDS-supported datasets and resources. After an initial phase of defining goals and identifying opportunities for NINDS data science, NINDS anticipates seeking input from the community on these goals and opportunities, before refining this plan prior to its implementation in 2023. Alongside this planning effort, various activities are underway to build programs and resources that support the collection and curation of high-quality data; engage with communities to develop and use data standards; facilitate data sharing and enable secondary analyses; and build and sustain partnerships that expand the reach of these data science efforts. Currently, there are over ten active NINDS-supported or co-sponsored funding opportunities to enhance data science and its ecosystem, demonstrating its importance and need within the scientific enterprise.

A critical component of a robust data science ecosystem, especially for current and future human research studies, will be providing opportunities to collect and link research and clinical datasets and high-quality biospecimens. Ethical and responsible research in this area requires informed consent practices that will provide research participants with accessible and accurate information about the collection and future use of their data and biospecimens, clearly defining the constraints of access and use by others. In addition to federal, state, and local regulations on informed consent, NIH has developed sample language that may be used in informed consent when data and biospecimen sharing may occur, as well as “points to consider" for investigators and Institutional Review Boards. Through a Request for Information (RFI), NIH encourages the public’s input to help refine this draft sample language. For more information and to see the sample language and “points to consider,” please view the full RFI. Please submit comments and responses electronically by September 29, 2021, using this form.

The neuroscience landscape is rapidly evolving, research costs are increasing, and our data science activities and infrastructure must adapt and keep pace for maximal impact. NIH and NINDS will continue to invite public input as we continue to advance data sharing and data science more broadly.