Students, faculty and researchers across the Midwest and beyond will gain critical access to new research data through a cloud-based platform whose construction has been made possible under a large-scale partnership led by the IU Libraries and IU Network Science Institute.
A $2 million project to create a secure online database for academic resources, the Shared BigData Gateway for Research Libraries has been awarded nearly $850,000 from the Institute of Museum and Library Services, the primary federal funding agency supporting the nation's libraries and museums. Additional support comes from eight other universities in the Big Ten; the Big Ten Academic Alliance; the National Science Foundation's Big Data Regional Innovation Hubs program; and two private companies: Clarivate Analytics and Microsoft Research.Watch a video about the project
"This project exemplifies the role of libraries in the information age," said Jamie Wittenberg, research data management librarian and head of scholarly communication at IU Libraries, who will direct the project. "Our mission is to efficiently and effectively connect researchers with the materials they need to advance innovation and discovery. The Shared BigData Gateway for Research Libraries will open up the power of data mining to everyone, not only people with specialized expertise."
"The combination of technical expertise and investments represented under this partnership will support a cyberinfrastructure that advances research across the Midwest and beyond," added Patricia Mabry, a senior research scientist at the IU Network Science Institute and a co-director on the project. "We're also taking steps to support the effort though workshops that cultivate a community of researchers and librarians who will cooperatively play a role in the project's future development and growth."
The university partners are Michigan State University, Purdue University, University of Iowa, University of Michigan, University of Minnesota, The Ohio State University, Pennsylvania State University and Rutgers University. Additional project co-directors are Valentin Pentchev, director of information technology at the IU Network Science Institute, and Xiaoran Yan, an assistant research scientist at the institute.
IU currently offers access to some of the resources that will open up to new partners through a system developed by the IT team at the IU Network Science Institute. Led by Pentchev, the Secure Enclave for Critical Data is the nation's first universitywide implementation of the entire Clarivate Analytics Web of Science, a private database with over 68 million records spanning more than 100 years.
A groundbreaking work of software engineering, the strength of the university's secure system was a key factor in garnering grant support from IMLS. The award will make possible the use of cloud technology to scale out the enclave with additional open data sets, extending access to every research library in the country.
The first new materials to be added to the Shared BigData Gateway are a copy of records of the U.S. Patent and Trademark Office, which contains data on publicly available patents and intellectual property, and the Microsoft Academic Graph, a public database of 160 million scientific records.
Access to these resources will be based on a federated security system that will enable users from multiple organizations to access the system with their institutional usernames and passwords. Members of the Big Ten Academic Alliance will use the gateway to access and mine shared Clarivate XML citation data, purchased cooperatively in 2017. Some data sets will be accessible to anyone with a .edu email address.
The ability to deeply analyze connections between these texts will support bibliometric research, a growing field that plumbs the world's increasingly large and complex databases to reveal the underlying structural forces that affect the production of scientific knowledge. This work -- often called the "science of science" -- has shed light on a wide range of subjects. For example, bibliometric analysis has helped reveal the depth of women's historical contributions to science and the influence of large-scale historical events on research activity.
In addition to data access, the Shared BigData Gateway will provide a user-friendly "front door" through which the partner institution members can request bibliometric analysis of data in the system through an online form. The project will automate many complex and time-consuming tasks that were previously required to conduct this research.
Another important feature of the system is the power to share data. Individuals who use the platform will not only be able to share the results of their analyses, but also the software code, algorithms, workflows, methods, and the specific software versions and configurations used to run their analyses. This is critical for making the work reproducible -- as well as helping the original researchers refine their methods for other projects.
Also contributing expertise to the project will be IT experts at Microsoft, Clarivate and several units at IU, including the Research Data Services group; Science Gateways Research Center; Pervasive Technology Institute; and University Information Technology Services, or UITS. UITS will also contribute to the Shared BigData Gateway through access to the university's supercomputing resources and cloud-computing platform, Jetstream.
"As centers of learning and catalysts of community change, libraries and museums connect people with programs, services, collections, information and new ideas in the arts, sciences and humanities," said Kathryn K. Matthew, director of the IMLS. "IMLS is proud to support their work through our grant-making as they inform and inspire all in their communities."
"Co-investment in infrastructure accelerates our ability to recognize and support new forms of inquiry in scholarship," said Kimberly Armstrong, director of the libraries initiative at the Big Ten Academic Alliance. "Given faculty interest in text mining of the cooperatively purchased citation data, we are glad to support delivery of a tested access solution across the Alliance."