10/9/2014
The University of Illinois at Urbana-Champaign will be the home of a National Institutes of Health (NIH) Center of Excellence for Big Data Computing, part of a wide-ranging effort to develop new strategies to analyze and leverage the explosion of increasingly complex biomedical data sets, often referred to as Big Data. These NIH multi-institute awards constitute an initial investment of nearly $32 million in fiscal year 2014 by NIH’s Big Data to Knowledge (BD2K) initiative to establish 12 centers that will each tackle specific data science challenges.
Written by
The University of Illinois at Urbana-Champaign will be the home of a National Institutes of Health (NIH) Center of Excellence for Big Data Computing, part of a wide-ranging effort to develop new strategies to analyze and leverage the explosion of increasingly complex biomedical data sets, often referred to as Big Data.
“Data creation in today’s research is exponentially more rapid than anything we anticipated even a decade ago,” said NIH Director Francis S. Collins. “Mammoth data sets are emerging at an accelerated pace in today’s biomedical research and these funds will help us overcome the obstacles to maximizing their utility. The potential of these data, when used effectively, is quite astounding.” NIH’s Big Data to Knowledge (BD2K) initiative is projected to have a total investment of nearly $656 million through 2020, pending available funds.
With the advent of transformative technologies for biomedical research, such as DNA sequencing and imaging, biomedical data generation is exceeding researchers’ ability to capitalize on the data. The new Center is a collaboration between the University of Illinois, a recognized world leader in computational science and engineering, and the Mayo Clinic, one of the world’s leading clinical care and research organizations. It will be based at the UIUC Institute for Genomic Biology, which has state-of-the-art facilities and a nationally recognized program of multidisciplinary team-based genomic research.
“Physicians and biologists are now routinely producing very large, genome-wide datasets,” explained Jiawei Han, a professor of computer science (CS) who will lead the research effort at Illinois. “The Center will leverage the latest computational techniques used to mine corporate or Internet data to enable the intuitive analysis and exploration of biomedical Big Data.” Saurabh Sinha (CS), Jun Song (bioengineering and physics), and Richard Weinshilboum of the Mayo Clinic, are co-PIs for the project. Victor Jongeneel--who is director of Bioinformatics and of HPCBio (High-Performance Biological Computing) at the Institute for Genomic Biology (IGB), and a senior research scientist at the National Center for Supercomputing Applications--will serve as the Center's executive director, and IGB Director Gene Robinson will function in a key role in one of the subprojects.
|
Challenges in making the best use of such biomedical information are many. They include problems of locating data and the appropriate software tools to access and analyze them, lack of data standards for many types of data, and the low adoption of data standards across the research community. There is also a need for new policies to facilitate data sharing while protecting privacy.
The BD2K awards will support the development of new approaches, software, tools, and training programs to improve access to these data and the ability to make new discoveries using them. Investigators hope to explore novel analytics to mine large amounts of data, while protecting privacy, for eventual application to improving human health. Examples include an improved ability to predict who is at increased risk for breast cancer, heart attack, and other diseases and condition, and better ways to treat and prevent them.
“The future of biomedical research is about assimilating data across biological scales from molecules to populations,” said Philip E. Bourne, NIH associate director for data science. “As such, the health of each one of us is a big data problem. Ensuring that we are getting the most out of the research data that we fund is a high priority for NIH.” In calling for the establishment of a “digital ecosystem” for biomedical research, Bourne said that the new BD2K programs are at the forefront of NIH’s efforts to increase the efficiency and cost effectiveness of scientific discovery.
The four main components of the new BD2K awards are:
- Centers of Excellence for Big Data Computing. These 11 centers will develop innovative approaches, methods, software, tools and other resources. While the development efforts will focus on specific research questions, their output is expected to be more generally relevant to various aspects of big data science, such as data integration and use, analysis of genomic data and managing data from electronic health records.
- BD2K-LINCS Perturbation Data Coordination and Integration Center. This center will be a data coordination center for the NIH Common Fund’s Library of Integrated Network-based Cellular Signatures (LINCS) program, which aims to characterize how a variety of types of cells, tissues and networks respond to disruption by drugs and other factors. The center will support data science research focusing on interpreting and integrating LINCS-generated data from different data types and databases in the LINCS-funded projects. This center is co-funded by BD2K and the NIH Common Fund.
- BD2K Data Discovery Index Coordination Consortium (DDICC). This program will create a consortium to begin a community-based development of a biomedical data discovery index that will enable discovery, access and citation of biomedical research data sets.
- Training and Workforce Development. These awards support the education and training of current and future generations of researchers who will specialize in data science fields, as well as those whose work may require certain expertise in the use of or generation of large amounts of data and data resources.
The BD2K initiative, launched in December 2013, is a trans-NIH program with funding from all 27 institutes and centers, as well as the NIH Common Fund. NIH’s effort is being developed in the context of a number of related projects elsewhere in the world, including those under development in the United Kingdom and Australia, and by the European Union. There is great interest in communication and collaboration among those involved in this international effort to enable scientists around the world to contribute to advances in understanding health and disease, and ultimately to improve diagnoses, treatment and prevention.
___________________