When Working Isn't Enough: The Story Behind Building a Self-Healing Mutual Fund Engine
Something working today doesn't guarantee it'll work tomorrow. The story of refactoring a mutual fund sync engine from 'it works' into 'it doesn't break.'
There’s a belief in engineering that once something “works,” you can move on.
I learned the opposite.
Something working today doesn’t guarantee it will work tomorrow. And in my world, systems behave like the sea — when the water is wavy, everything is normal; when it’s too calm, something big is definitely coming.
This story begins with Morningstar — the external service that powers everything in our mutual fund product.
They provide monthly mutual fund data. Not small data — 300+ fields per fund. And we have 12,212 funds.
If we miss syncing even a single month for any fund, that data is gone forever. No retries. No archives. No dev environment. Just one chance.
Now add on top: Daily NAV. New funds launching every month. Old funds closing but still sending past data. And hundreds of fields missing randomly for newly launched funds.
All this feeds our internal engines — allocation, scoring, benchmarking, sub-category selection, and scheme recommendation.
In demo environments, everything worked. Smooth. Predictable. That was the calm sea.
Production was the storm.
The sync nightmare
Every month, a cron job fetched fresh Morningstar data.
In theory, it sounded simple:
- For each ISIN → fetch monthly data
- For each ISIN → fetch daily NAV
- Parse → Normalize → Store → Done
But then reality happened.
Parsing failed for random funds. New funds had incomplete fields. Some funds returned NAV for years in the past. Some didn’t return NAV at all. Some didn’t exist in our local list yet. Some existed but weren’t present in Morningstar’s monthly dataset.
I had one big question running on loop in my mind:
“How do I know everything synced correctly?”
If even one fund slipped, the entire chain — scoring, benchmark calculation, sub-category signals — could silently produce wrong outputs.
And the worst part? I wouldn’t even know.
That fear started eating me.
The illusion of success
We delivered the feature. It produced perfect allocations:
- ₹20,000 split across asset classes
- then into subcategories
- then into top-ranked schemes
The logic was correct. The system worked. The demo was great.
But the system wasn’t stable. It worked — but only in the calm sea. I could feel the coming wave.
The data integrity war
The biggest problem wasn’t the logic.
It was trust.
How do I trust that:
- all 12,212 funds synced?
- a new fund didn’t get missed?
- a fund with missing fields didn’t corrupt a monthly ranking?
- NAV data didn’t silently fail for a closed fund?
- a failed parsing didn’t leave invisible gaps?
I needed a way to observe the system like a living organism. Not just run it. Understand it. Watch it breathe. And know exactly when it coughs.
The turning point
So I made one big decision:
Split the mutual fund data layer into its own microservice.
A separate brain. Separate DB. Separate queues. Separate dashboards. Separate logs. Separate responsibilities.
Everything that touches Morningstar — syncing, storing, normalizing, retrying — moved into one dedicated service called:
mutual-fund-store
And this changed everything.
Idempotency keys that saved my sanity
To avoid duplicates, overwrites, or corrupted histories:
- Monthly MF data →
month-year-isin - Daily NAV →
isin-navdate-legal-mf
Now I could re-run syncs 100 times and the result would always stay consistent.
A tool I didn’t know I needed
I built a small internal dashboard.
Simple, but life-saving:
- Total funds expected
- Total funds synced
- Funds with missing fields
- Funds with parsing failures
- Funds with NAV mismatch
- Retry buttons
- Error logs
- Sync timestamps
- API latency metrics
- Percentage completion bars
It wasn’t fancy. But suddenly, the system became transparent.
The sea wasn’t calm anymore — I could see the waves, measure them, and prepare for the storm.
The hardest part nobody sees
After the demo, I spent almost two weeks — 2 hours every night — refactoring, separating, stabilizing, and building guardrails.
This was the real work. The work that doesn’t get noticed. The work that makes a system reliable. The work that ensures tomorrow doesn’t break.
Because “working” is not the goal.
“Not breaking” is.
What I learned
- If you can’t see failures, you can’t trust successes.
- If something works once, that means nothing.
- Data systems need observability more than code perfection.
- A dashboard can be more powerful than an algorithm.
- Calm seas are the most dangerous.
- Reliability is built in silence, after hours, alone, with no praise.
- Delivering fast is good. Making it unbreakable is better.
This wasn’t just a technical battle. It was a mindset shift. A reminder that in engineering, “done” is an illusion — because the sea is always changing.
And the system must be ready for the next wave.
First published on Medium on November 30, 2025.