Neev - Model Post-Training for Medical Domain
SOTA model for medical transcription & clinical note gen for India's dynamic healthcare ecosystem.
Links
Date
2024–25
Neev needed a model that could handle how Indian clinical conversations actually sound: mixed languages, noisy rooms, interrupted speech, attendants speaking for patients, and medication names that generic speech models routinely mangle. I advised on AI strategy and worked on model post-training for medical transcription and clinical note generation across this environment.
Quick Demo
This sample is intentionally messy: Hindi, Telugu, Kannada, and Tamil switch within one short conversation. The comparison shows where Whisper v3 Large breaks down and how Neev's model preserves the actual spoken meaning.
Hindi, Telugu, Kannada, Tamil mix
Whisper v3 Large
ಇವು ವಿವೆಕ್ ಡೋರ್ ಪೆ ಫೂಡಾಗಿಯ ಸೂಟ್ರ ಆಡೆ ಉಂದಿ ಬೇಲ್ಲಕೊಚಿನೆಡ ಸೂಟ್ರ ಅವರತ್ತರ ಕಿಯು ತೋಮಮ್ಬಟು ಕೆಲ್ಗಡೆನ್ನು ತೋಮಮ್ಬಾಂತೆವಿದು ಅವನುಗೆ ಕೆಲ್ಗಡನೆ ರೆಡಿಯಾಗೆ ಅವಂ ಸೋಲ್ಲಾ ಅಂಗೆ ಇಲ್ಲಾ ಇನ್ನು ಸೋಲ್ಲಾ ಚೆಕ್ಕುಂಟಿ
Neev's Model
विवेक, डोर पे फूड आ गया है। చూడు రాడే ఉంది, బెల్ కొట్టినాడు చూడు। ಆ ಮತ್ತೆ ಕೀ ತಗೊಂಡು ಬಿಟ್ಟು ಕೆಳಗಡೆಯಿಂದ ತಗೊಂಡು ಬಾ ಅಂತ ಹೇಳು ಅವನಿಗೆ. ಫುಡ್ ಕೆಳಗಡೆನೇ ಇದೆ ರೆಡಿ ಆಗಿ ಇದೆ। அவன் சொல்லுப்பா, அங்கே இல்லைன்னு சொல்லு. செக் பண்ணு।
The Problem
Clinical transcription in India is not just speech-to-text. A single consultation can move across multiple languages, dialects, speakers, and acoustic conditions in under a minute. The model has to understand code switching, preserve clinical meaning, and avoid turning noisy or accented speech into plausible nonsense.
After benchmarking existing systems, the biggest gaps were clear:
- Multilingual conversations - doctors, patients, and attendants often speak different languages
- Code switching - Hindi, English, and regional languages can appear in the same sentence
- Medication accuracy - names like Trastuzumab, Adalimumab, and Bevacizumab need to survive transcription
- Clinical noise - coughs, sneezes, weak speech, background chatter, and overlapping voices are normal
What I Worked On
I helped build a post-training strategy for a model tuned to clinical transcription and downstream note generation. The focus was not just word error rate. The model needed to preserve intent, separate speakers where possible, handle regional language mixtures, and produce output that could be reliably used inside hospital workflows and EMR systems.
The final model supported major Indian regional and foreign languages with a strong latency profile, processing 60-minute consultations in under 40 seconds while maintaining high transcription quality.