When Working Isn't Enough: The Story Behind Building a Self-Healing Mutual Fund Engine

There’s a belief in engineering that once something “works,” you can move on.

I learned the opposite.

Something working today doesn’t guarantee it will work tomorrow. And in my world, systems behave like the sea — when the water is wavy, everything is normal; when it’s too calm, something big is definitely coming.

This story begins with Morningstar — the external service that powers everything in our mutual fund product.

They provide monthly mutual fund data. Not small data — 300+ fields per fund. And we have 12,212 funds.

If we miss syncing even a single month for any fund, that data is gone forever. No retries. No archives. No dev environment. Just one chance.

Now add on top: Daily NAV. New funds launching every month. Old funds closing but still sending past data. And hundreds of fields missing randomly for newly launched funds.

All this feeds our internal engines — allocation, scoring, benchmarking, sub-category selection, and scheme recommendation.

In demo environments, everything worked. Smooth. Predictable. That was the calm sea.

Production was the storm.

The sync nightmare

Every month, a cron job fetched fresh Morningstar data.

In theory, it sounded simple:

For each ISIN → fetch monthly data
For each ISIN → fetch daily NAV
Parse → Normalize → Store → Done

But then reality happened.

Parsing failed for random funds. New funds had incomplete fields. Some funds returned NAV for years in the past. Some didn’t return NAV at all. Some didn’t exist in our local list yet. Some existed but weren’t present in Morningstar’s monthly dataset.

I had one big question running on loop in my mind:

“How do I know everything synced correctly?”

If even one fund slipped, the entire chain — scoring, benchmark calculation, sub-category signals — could silently produce wrong outputs.

And the worst part? I wouldn’t even know.

That fear started eating me.

The illusion of success

We delivered the feature. It produced perfect allocations:

₹20,000 split across asset classes
then into subcategories
then into top-ranked schemes

The logic was correct. The system worked. The demo was great.

But the system wasn’t stable. It worked — but only in the calm sea. I could feel the coming wave.

The data integrity war

The biggest problem wasn’t the logic.

It was trust.

How do I trust that:

all 12,212 funds synced?
a new fund didn’t get missed?
a fund with missing fields didn’t corrupt a monthly ranking?
NAV data didn’t silently fail for a closed fund?
a failed parsing didn’t leave invisible gaps?

I needed a way to observe the system like a living organism. Not just run it. Understand it. Watch it breathe. And know exactly when it coughs.

The turning point

So I made one big decision:

Split the mutual fund data layer into its own microservice.

A separate brain. Separate DB. Separate queues. Separate dashboards. Separate logs. Separate responsibilities.

Everything that touches Morningstar — syncing, storing, normalizing, retrying — moved into one dedicated service called:

mutual-fund-store

And this changed everything.

Idempotency keys that saved my sanity

To avoid duplicates, overwrites, or corrupted histories:

Monthly MF data → month-year-isin
Daily NAV → isin-navdate-legal-mf

Now I could re-run syncs 100 times and the result would always stay consistent.

A tool I didn’t know I needed

I built a small internal dashboard.

Simple, but life-saving:

Total funds expected
Total funds synced
Funds with missing fields
Funds with parsing failures
Funds with NAV mismatch
Retry buttons
Error logs
Sync timestamps
API latency metrics
Percentage completion bars

It wasn’t fancy. But suddenly, the system became transparent.

The sea wasn’t calm anymore — I could see the waves, measure them, and prepare for the storm.

The hardest part nobody sees

After the demo, I spent almost two weeks — 2 hours every night — refactoring, separating, stabilizing, and building guardrails.

This was the real work. The work that doesn’t get noticed. The work that makes a system reliable. The work that ensures tomorrow doesn’t break.

Because “working” is not the goal.

“Not breaking” is.

What I learned

If you can’t see failures, you can’t trust successes.
If something works once, that means nothing.
Data systems need observability more than code perfection.
A dashboard can be more powerful than an algorithm.
Calm seas are the most dangerous.
Reliability is built in silence, after hours, alone, with no praise.
Delivering fast is good. Making it unbreakable is better.

This wasn’t just a technical battle. It was a mindset shift. A reminder that in engineering, “done” is an illusion — because the sea is always changing.

And the system must be ready for the next wave.

First published on Medium on November 30, 2025.