INDIANAPOLIS — Indiana University School of Medicine researchers are leading a multi-site study to use a privacy-preserving artificial intelligence approach, called federated learning, to improve breast cancer risk prediction and reduce health inequities in cancer prevention care. A new five-year, $3.7 million grant from the National Institutes of Health’s National Cancer Institute is funding the research.
Spyridon Bakas, director of the Division of Computational Pathology at the IU School of Medicine and principal investigator of the project. “Federated learning is a novel paradigm for multi-site collaborations like this, because it allows access to ample and importantly diverse data that are essential to developing robust models, without sharing patient data across sites. This grant is allowing us to leverage the technology of federated learning and develop an improved breast cancer risk assessment model that aims at breast cancer prediction and will translate across multiple patient populations.”
“We will have an AI methodology that can contribute to the future of women’s health,” saidFederated learning is a mechanism to collaboratively train complex AI models using data that remains decentralized, meaning it never leaves the corresponding institution, thereby increasing data privacy. Bakas said this creates more trust and mitigates patient privacy concerns.
The other institutions participating in this collaborative study are the Mayo Clinic, Washington University in St. Louis, the University of Pennsylvania and Columbia University. Each site will provide de-identified data from patients who undergo 3D digital breast tomosynthesis, a method of breast cancer screening that is becoming more widely used than the traditional 2D digital mammogram. Researchers will use federated learning to analyze and learn from the data acquired at all participating sites, then create an open-source AI model with the goal of predicting breast cancer risk while gaining knowledge from diverse patient populations.
Breast cancer is the second-leading cause of death from cancer in women. The data used in the project will come from patients who undergo breast cancer screenings, of which some will develop cancer over time while some will not.
Across five years, the goals of the project include:
- Developing breast cancer risk assessment models that leverage multi-site, ethnically diverse data of women undergoing breast screening.
- Improving these initial models by including additional geographically diverse sites.
- Generating realistic synthetic imaging data matching each site’s local patient population characteristics and using them for data augmentation and privacy preservation.
- Creating an automated mechanism for quantitative and interpretable determination of optimal privacy preservation in health care AI models.
“The goal of our models will be to predict when and if a woman will develop breast cancer much earlier and assess their risk of developing breast cancer in the future,” Bakas said. “We’re focused more on the prediction rather than diagnosis, and being proactive rather than reactive.”
The researchers are also focusing on creating AI models that account for health disparities and health inequities, as many patients don’t have access to a comprehensive health system. “These models usually cannot be trained in a community hospital setting because they don’t have the resources,” Bakas said. “With federated learning, we gain this knowledge from diverse populations and then we distribute the AI model around to other community settings for application.
“Our overarching goal for this study is an easy-to-use, translatable, trustworthy federated learning framework, lowering the barrier for underserved populations to participate in large-scale federated learning studies and benefit from such technological advancements, thus paving the way toward addressing health disparities.”
Other study co-investigators include Despina Kontos of Columbia University, Celine Vachon of the Mayo Clinic, Aimilia Gastounioti of Washington University, Anne Marie McCarthy of the University of Pennsylvania and Prashant Shah of Intel.
What they’re saying:
“The era of digital transformation in health care has been fueled by open-source software tools developed by the scientific community. These tools have not only democratized AI by making it accessible but also encouraged health care researchers to explore how reproducibility and robustness can positively impact patient outcomes. By adopting open-source tools and making our trained AI models publicly available, we are fostering a culture of collaboration and transparency essential for innovation, while building a better future together.” — Sarthak Pati, software architect at Indiana University
“Federated learning enables models to learn from restricted data from numerous collaborators while overcoming data ownership, privacy and regulatory concerns. It allows the collection of meaningful amounts of data for rare diseases that can help create robust machine-learning models that work on diverse populations, reducing health disparities and inequalities.” — Prashant Shah, head of artificial intelligence at Intel Health and Life Sciences