By Dave DeFusco
Hieu (Henry) Ngo, a postdoctoral researcher at the Katz School of Science and Health, took center stage at the Carnegie Mellon x NVIDIA Federated Learning Hackathon for Biomedical Applications, helping design a cutting-edge dashboard that lets researchers explore and harmonize sensitive biomedical data across multiple biobanks without ever moving private patient records.
Over three intense days in January, Ngo led his team in turning complex, distributed data into clear, actionable insights, contributing to one of 10 projects that showcased the power of privacy-preserving artificial intelligence in advancing biomedical science.
The hackathon brought together more than 100 researchers, postdoctoral fellows, and industry professionals for a collaborative push to solve one of biomedical research’s biggest challenges: how to learn from sensitive data without compromising privacy. At the heart of this effort was federated learning, a method that trains AI models across multiple locations without sharing the raw data itself.
“Data in healthcare, like hospital records or clinical trial results, can’t always be shared because of privacy rules,” said Ngo. “Federated learning allows you to train a model locally, then only share the results, keeping patient information safe while still using large amounts of data.”
Ngo joined the visualization and harmonization track, where his team focused on making distributed biobank data easier to understand and use. They created a dashboard that allowed researchers to explore what data was available across multiple biobanks, identify overlapping variables, harmonize inconsistent naming conventions and evaluate whether datasets were “federated-ready” for secure modeling.
“We wanted to make the data interpretable,” said Ngo. “Before training a model, researchers can explore the data. After training, the dashboard lets them compare model performance and see which teams are working on which parts of the data. It helps everyone collaborate more effectively.”
The hackathon posed both technical and collaboration challenges. Actual biobank data could not be shared due to privacy concerns, so teams worked with publicly available and synthetic datasets to simulate federated learning.
“One of the goals was to show how federated learning can move the needle, and hopefully encourage biobanks to release access to their data in a privacy-preserving way,” said Ngo.
Working alongside experts from NVIDIA, other universities and medical research organizations exposed Ngo to new tools, techniques and perspectives. “I learned about specialized software from NVIDIA and also got insight into working with genomic data, which is a bit different from my focus on clinical trials,” he said. “Even though the applications differ, the methods, like clustering and disease subtyping, are very similar. It was a great way to see how others solve problems in the field.”
The hackathon culminated in 10 completed projects. All team manuscripts were compiled into a shared preprint, and the codebases were made publicly available on GitHub. Ngo emphasized the broader impact: these resources will help support future collaboration and allow researchers to explore large biomedical datasets safely.
“The preprint will be presented to biobanks, showing researchers how federated learning can protect privacy while enabling research,” he said.
Reflecting on the experience, Ngo highlighted both scientific and personal growth. “I learned how federated learning is developing for biomedical research and what problems people are solving,” he said. “Personally, I gained hands-on experience with new tools and methods, and it opened my eyes to what’s possible in this field.”