Neev - Model Post-Training for Medical Domain

SOTA model for medical transcription & clinical note gen for India's dynamic healthcare ecosystem.

Neev needed a model that could handle how Indian clinical conversations actually sound: mixed languages, noisy rooms, interrupted speech, attendants speaking for patients, and medication names that generic speech models routinely mangle. I advised on AI strategy and worked on model post-training for medical transcription and clinical note generation across this environment.

Quick Demo

This sample is intentionally messy: Hindi, Telugu, Kannada, and Tamil switch within one short conversation. The comparison shows where Whisper v3 Large breaks down and how Neev's model preserves the actual spoken meaning.

Hindi, Telugu, Kannada, Tamil mix

0:00

Whisper v3 Large

ಇವು ವಿವೆಕ್ ಡೋರ್ ಪೆ ಫೂಡಾಗಿಯ ಸೂಟ್ರ ಆಡೆ ಉಂದಿ ಬೇಲ್ಲಕೊಚಿನೆಡ ಸೂಟ್ರ ಅವರತ್ತರ ಕಿಯು ತೋಮಮ್ಬಟು ಕೆಲ್ಗಡೆನ್ನು ತೋಮಮ್ಬಾಂತೆವಿದು ಅವನುಗೆ ಕೆಲ್ಗಡನೆ ರೆಡಿಯಾಗೆ ಅವಂ ಸೋಲ್ಲಾ ಅಂಗೆ ಇಲ್ಲಾ ಇನ್ನು ಸೋಲ್ಲಾ ಚೆಕ್ಕುಂಟಿ

Neev's Model

विवेक, डोर पे फूड आ गया है। చూడు రాడే ఉంది, బెల్ కొట్టినాడు చూడు। ಆ ಮತ್ತೆ ಕೀ ತಗೊಂಡು ಬಿಟ್ಟು ಕೆಳಗಡೆಯಿಂದ ತಗೊಂಡು ಬಾ ಅಂತ ಹೇಳು ಅವನಿಗೆ. ಫುಡ್ ಕೆಳಗಡೆನೇ ಇದೆ ರೆಡಿ ಆಗಿ ಇದೆ। அவன் சொல்லுப்பா, அங்கே இல்லைன்னு சொல்லு. செக் பண்ணு।

The Problem

Clinical transcription in India is not just speech-to-text. A single consultation can move across multiple languages, dialects, speakers, and acoustic conditions in under a minute. The model has to understand code switching, preserve clinical meaning, and avoid turning noisy or accented speech into plausible nonsense.

After benchmarking existing systems, the biggest gaps were clear:

Multilingual conversations - doctors, patients, and attendants often speak different languages
Code switching - Hindi, English, and regional languages can appear in the same sentence
Medication accuracy - names like Trastuzumab, Adalimumab, and Bevacizumab need to survive transcription
Clinical noise - coughs, sneezes, weak speech, background chatter, and overlapping voices are normal

What I Worked On

I helped build a post-training strategy for a model tuned to clinical transcription and downstream note generation. The focus was not just word error rate. The model needed to preserve intent, separate speakers where possible, handle regional language mixtures, and produce output that could be reliably used inside hospital workflows and EMR systems.

The final model supported major Indian regional and foreign languages with a strong latency profile, processing 60-minute consultations in under 40 seconds while maintaining high transcription quality.