MLCommons announces MedPerf, a new benchmark for validating medical AI models

MLCommons announces MedPerf, a new benchmark for validating medical AI models

Posted on



The Medical Working Group of the open machine learning consortium MLCommons today announced the availability of a new and open benchmarking platform called MedPerf.

MedPerf is a big deal, the group says, as it enables medical-focused artificial intelligence models to be validated on diverse, real-world healthcare data without revealing that information. It’s hoped that the availability of MedPerf will help to “catalyze wider adoption of medical AI,” resulting in more efficient and cost-effective clinical practices, the group said.

MLCommons is a collaborative engineering organization focused on developing the AI ecosystem through benchmarks, public datasets and research. It’s best known for its MLPerf AI benchmarks, which have become established as the AI industry standard for testing and validating AI models.

In an upcoming article in Nature Machine Intelligence, the Medical Working Group of MLCommons explains that medical AI has tremendous potential to advance healthcare by supporting and contributing to the evidence-based practice of medicine, personalizing patient treatment, reducing costs and improving both healthcare provider and patient experiences. However, one of the biggest challenges in unlocking this potential is the need for a systematic, quantitative method for evaluating the performance of AI models on large-scale, heterogeneous datasets that can capture a diverse range of patient populations.

MedPerf has been built to address this challenge, and the group claims it can provide numerous benefits to the medical community. First, it delivers a consistent and rigorous methodology to quantitatively evaluate the performance of medical AI models for real-world applications in a systematic and standardized way.

MedPerf also provides researchers with a technical approach to quantify model generalizability across institutions, with full data privacy and protection of each model’s intellectual property, by ensuring that any data used never leaves the healthcare provider’s systems. In addition, its collaborative design method supports a neutral and scientific approach to clinical validation of AI, while simultaneously illuminating use cases where superior AI models can improve clinical efficiency.

MLCommons said its existing benchmarks have had a positive impact on AI development in multiple industries, and it believes that having access to a similar benchmark for medical AI will help to accelerate development in the healthcare industry. It believes MedPerf will help to accelerate medical AI adoption by providing developers with a way to better serve underrepresented patient populations.

“MedPerf aims to advance research related to data utility, model utility, robustness to noisy annotations, and understanding of model failures,” MLCommons explained. “If a critical mass of AI researchers adopts these benchmarking standards, healthcare decision makers will see substantial benefits from aligning with this effort to increase benefits for their patient populations.”

MedPerf itself has already been validated in multiple settings, including a key use case for the Federated Tumor Segmentation Challenge, and four other academic pilot studies.

Image: Freepik

Your vote of support is important to us and it helps us keep the content FREE.

One-click below supports our mission to provide free, deep and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *