On-Device Medical Intelligence: Converging MedGemma 1.5 and LiteRT

Mar 28, 2026

MUHAMMAD GHIFARY, ABIL SUDARMAN

This article originally comes from https://ai4medresearch.github.io/blog/2026/medgemma1-5/

In the rapidly evolving landscape of healthcare AI, the transition from massive, cloud-dependent models to specialized, on-device intelligence is not just a trend—it's a clinical necessity. Medical data is inherently sensitive, and the requirements for privacy (HIPAA compliance), zero-latency reasoning, and offline accessibility in remote or high-security environments are paramount.

Here we are diving into how to bring state-of-the-art medical multimodal intelligence directly to the edge. By converting MedGemma 1.5 4B to the specialized LiteRT-LM (.litertlm) format, we unlock the ability to perform complex clinical analysis—including MRI interpretation and EHR questioning—entirely within a local web browser or mobile device.

1. The Rise of On-Device Medical Intelligence

Traditional medical AI often relies on sending high-resolution scans and patient records to powerful GPU clusters in the cloud. While effective, this approach introduces significant bottlenecks:

Privacy Risks: Every byte of data leaving the hospital network is a potential point of failure.
Latency: In critical care, waiting for a round-trip to a data center can be the difference between a prompt diagnosis and a delayed one.
Connectivity: Many clinical environments (ORs, remote clinics, or mobile health units) suffer from inconsistent internet access.

On-device intelligence solves these by performing inference where the data is born. With the release of Google's Gemma 3 architecture and its medical sibling MedGemma 1.5, the "edge" is now powerful enough to handle 4-billion parameter multimodal models.

2. MedGemma 1.5 4B: A Multimodal Leap

MedGemma 1.5 4B ****represents a significant architectural shift over its predecessors. While MedGemma 1.0 was a pioneer in clinical text understanding, the 1.5 iteration—built on the Gemma 3 foundation—is a true multimodal powerhouse.

Some key advancements of MedGemma 1.5 are as follows:

From 2D to 3D: Previous models focused on 2D images like X-rays or dermoscopy. MedGemma 1.5 natively interprets 3D medical volumes from CT and MRI scans.
Longitudinal Reasoning: One of the model’s strongest features is its ability to track disease progression by comparing historical scans against current ones.
Clinical Accuracy: EHR Question Answering accuracy has jumped from ~68% in version 1 to a staggering ~90% in version 1.5.
Anatomy Localization: Precise identification of anatomical structures and abnormalities saw an improvement from ~3% IoU to ~38% IoU.

3. Deep Dive into the .litertlm Format

To run MedGemma efficiently on the edge, we leverage the .litertlm format. This isn't just another file extension; it is LiteRT’s (formerly TensorFlow Lite) specialized bundle for Generative AI.

The .litertlm offers several benefits.