If you build it, the data will not only come, it will be standardized, accessible, and reusable by researchers around the world. With this in mind, the National Institutes of Health (NIH) is creating its Data Commons initiative, and recently invited University of Miami investigators to contribute their expertise.
“There is a big effort at the NIH now to bring biomedical research into the data science era. Data science including machine learning is all around us—like recommendation engines for shopping or movies, image and voice recognition, language translation, even the news you see on social media, and your personalized internet search results,” said Stephan Schürer, PhD, program director, Drug Discovery, University of Miami Center for Computational Science, and associate professor of molecular and cellular pharmacology at the Miller School of Medicine.
As in all these areas, more and more data are being generated in biomedical science. The NIH and other organizations already maintain large data repositories, but it’s not only about the incredible volume of data. There is a growing need to create an open data-sharing platform that makes all the data really usable for the community, and, to facilitate and accelerate the pace of important discoveries in medicine.
Schürer and colleagues will contribute to building a foundation for this new data ecosystem. The NIH wants to ensure it is F.A.I.R., meaning the data are Findable, Accessible, Interoperable and Reusable. These principles are already widely endorsed by research organizations, funders, and publishers.
“We need a way to assess how open and standardized the data are. Our group is building tools to formalize the process of evaluating the ‘FAIRness,’ and potentially, at a later stage, to improve it,” Schürer said. “We are thinking strategically. How can we build a FAIR data submission system? Which tools are good starting points?”
Ensuring that research teams receive credit for their data is another important goal of the initiative, Schürer said. Traditionally, many researchers protect their findings prior to publication. “This is a key concern and a very important reason for people not sharing data now,” he added. However, the new system will be designed so people will cite the sources of data they use. “As that gets accepted in the scientific community, data themselves can become more like a publication, and there will be more incentive to share them. It’s important. When people generate data, often with significant effort and resources, they should get credit for that. When data get widely shared, it also has another great benefit of likely improving data quality.”
It is important for the University of Miami to be at the forefront of national projects such as the NIH Data Commons, Schürer said. In joining the consortium, UM investigators will be working alongside researchers from Harvard Medical School, University of Chicago, University of California at Davis, and other leading institutions.
“For us, it’s about advancing data science research. It’s a great project to be a part of,” Schürer said.
The project also moves beyond the ‘N’ in NIH. The scope is international because it includes the University of Oxford in the U.K., and Maastricht University in The Netherlands. There are parallel efforts in Europe to build data fairness and standardization infrastructure. Adding leading European experts to the consortium will help to connect to their systems as well and eventually achieve global data FAIRness.