Architecture

Before Building a Pricing Platform, We Had to Stop Replacing and Start Growing

Why our first move toward a new pricing platform was understanding the present, not replacing it.

The system was working.

That was the problem.

Pricing was producing revenue every day. Customers were booking. Markups, fees, and add-ons were flowing through services that nobody had touched in years. And buried inside that working system was the assumption that any of it could be safely replaced.

It couldn’t. Not in one move. Not without breaking trust we hadn’t earned the right to spend.

I joined a new team inside the company where I was working. The mission sounded straightforward on paper: evolve our pricing capabilities. The reality was considerably more complex. Pricing was not a single system. It was a collection of decisions spread across services, repositories, integrations, operational processes, and years of accumulated business knowledge. Fees were calculated in one place. Markups in another. Add-ons followed different paths. Some pricing decisions happened close to customer-facing applications, while others were buried deep inside backend services.

The challenge wasn’t building something new.

The challenge was building something new without breaking what already worked.

The temptation of the big rewrite

Whenever teams discover architectural complexity, there is often a natural instinct to start over.

The logic seems reasonable. We know the problems. We know the limitations. We have newer technologies available. Why not simply replace the old solution with a modern platform?

In practice, those conversations rarely survive contact with reality.

A pricing system is not just software. It is encoded business knowledge. Every rule represents a decision someone made years ago. Every exception exists because a customer, partner, operator, regulator, or market demanded it. Many of those decisions are not documented. Some are not even remembered. The system becomes a living museum of business evolution.

AssumptionReality
We know exactly what the current system doesMuch of the behavior exists only in code
Rebuilding is mostly a technical effortRebuilding is primarily a business discovery effort
New technology reduces riskRewrites often introduce new risks
Legacy behavior is fully understoodHidden dependencies emerge continuously

This was one of my first observations after joining the team. Before discussing future capabilities, we first needed to understand the present. Not just the code. The business.

Understanding the future before changing the present

One of the most valuable lessons I learned during that period was that architecture conversations should not start with architecture. They should start with outcomes.

Before creating a new pricing platform, we spent time understanding where the company wanted to be years into the future. Questions started appearing everywhere:

  • What pricing capabilities would we need in three years?
  • How quickly should new pricing experiments be launched?
  • Who should own pricing decisions?
  • How configurable should rules become?
  • What should require engineering involvement?
  • What should become self-service?
  • How would we support new products and monetization models?

Only after discussing these questions could we evaluate the systems that already existed. This wasn’t about technology. It was about creating a shared picture of the destination.

The real challenge wasn’t replacing software. It was creating enough understanding of the future that we could recognize which parts of the present should survive.

Discovering pricing capabilities hidden across the organization

As we mapped the existing landscape, we discovered pricing logic spread across multiple places. Some capabilities were obvious. Others were hidden. A simple markup could involve several services. A fee could depend on data generated elsewhere. An add-on might have its own operational process entirely separate from the rest of the pricing flow.

The work started to resemble archaeology. Every repository revealed another layer of business decisions. Every service exposed assumptions nobody had documented. Every conversation uncovered another dependency.

A simplified view looked something like this:

Pricing logic scattered across existing systems

πŸ”„ Rendering PlantUML diagram...

The goal wasn’t to centralize everything immediately. The goal was first to understand what actually existed.

Why the Strangler Fig pattern made sense

Around that time, one architectural approach consistently surfaced in discussions: the Strangler Fig pattern.

The idea comes from nature. A strangler fig grows around an existing tree. It does not replace the tree overnight. Instead, it gradually surrounds it, absorbs its responsibilities, and eventually becomes the primary structure. Martin Fowler later popularized this metaphor in software modernization.

The pattern is deceptively simple:

  1. Leave the existing system running.
  2. Build new capabilities around it.
  3. Redirect behavior gradually.
  4. Retire old components one piece at a time.

What made the idea attractive was not elegance. It was risk reduction.

Big RewriteStrangler Fig
Large delivery eventContinuous delivery
Long time before valueEarly incremental value
High uncertaintyContinuous learning
Difficult rollbackEasier rollback paths
Requires complete understanding upfrontAllows understanding to emerge

This isn’t about avoiding change. It’s about making change survivable.

Anatomy of a Strangler Fig migration

The pattern is easy to draw on a whiteboard and hard to apply in production. In our case, the most useful way to understand it was to walk through a single capability and trace what actually had to happen, step by step.

A fee, for example.

In the original system, fees were calculated inside the search service. The same monolith that returned travel results also decided how much to charge for them. The logic had been added incrementally over the years, sometimes for a market, sometimes for a partner, sometimes for a campaign nobody could find documentation for anymore. It was small enough that nobody had ever budgeted time to extract it. And critical enough that nobody wanted to be the person who broke it.

That’s the typical Strangler Fig starting point. It isn’t about systems that are obviously rotten. It’s about systems that are quietly central.

The migration unfolded in stages, and each stage existed to answer a different question.

Stage 1. Introduce a seam.

The first move had nothing to do with the new service. It was inside the search monolith. We extracted the inline fee calculation behind an internal interface β€” a single function call that the rest of the search code had to go through. Nothing about behavior changed. Nothing about ownership changed. We were not solving the problem yet. We were creating a place where the problem could be solved later.

A seam is the smallest unit of optionality. Without it, no migration is possible. With it, every later step becomes a choice.

Stage 2. Stand up the new service.

Then we built a separate fee service. Same inputs. Same expected outputs. No new features. No improvements. No “while we’re at it.” This is the step engineers most often want to skip, because it feels like duplication. It is duplication. That’s the point.

The new service was not allowed to be smarter than the old one yet. Its only job was to produce the same number.

Stage 3. Run it in shadow.

This is where the pattern starts paying off. Every search request still went through the seam, still called the old inline fee logic, and still returned that result to the customer. But in parallel β€” asynchronously, off the critical path β€” the seam also called the new fee service with the same inputs and recorded both outputs and timings.

The customer saw nothing different. The data team saw everything.

Shadow execution: customer sees the old path, the new path is observed

πŸ”„ Rendering PlantUML diagram...

What the shadow phase let us answer was simple and brutal:

  • For the same input, does the new service return the same fee as the old code? Always? In which markets? For which products? On which edges?
  • And does it do so fast enough that customers would not notice the change in tail latency once we cut over?

The first question protects revenue. The second protects experience. We were not allowed to ramp traffic until both were green.

In practice, the first weeks of shadow execution are humbling. The “same logic” you wrote in the new service diverges from the monolith in tiny, embarrassing ways. A rounding step happens half a millisecond earlier. A currency conversion uses a slightly different rate cache. A market-specific override that lived in a config file nobody mentioned is silently absent. Every divergence is a piece of business knowledge that nobody remembered to write down.

That’s not a problem to be hidden. That’s the whole reason the shadow exists.

Stage 4. Tune until parity.

Each divergence got triaged. Some were real bugs in the new service. Some were undocumented behavior in the old one that turned out to be intentional and had to be ported. Some were undocumented behavior that turned out to be a forgotten accident, and the business decided not to carry it forward.

Performance got the same treatment. Shadow traffic exposed where the new service was slower than the inline path β€” the cold-start of a separate process, an unnecessary network hop, a serialization choice that mattered at p99 but not at p50. None of this is visible in a load test against a happy path. It is only visible when production traffic, in production shape, runs through both implementations at the same time.

We did not move on until the comparison report showed two things on the same dashboard: output parity within an accepted tolerance, and latency within an accepted budget.

PhaseTraffic on new serviceWhat we watchedExit criterion
Shadow0% (mirrored async)Output parity, p50/p99 latency vs oldParity rate above target across all markets
Canary1%Conversion, revenue per session, error rateNo statistically significant regression
Ramp10% β†’ 25% β†’ 50%Same metrics, on a larger sampleStable across two business cycles
Cutover100%Same metrics, plus removal of dead codeOld fee path deleted from the monolith

Stage 5. Ramp behind a flag.

Only after parity and performance held did we start routing real traffic to the new service, and even then, behind a feature flag we could turn off in seconds. Small percentage first. Wait. Watch the business metrics, not just the technical ones. Increase. Wait. Increase.

The flag is not just a safety net. It is a contract with the rest of the company. It says: if anything looks wrong, we can put yesterday’s behavior back, in production, while we figure out why.

Stage 6. Remove the old path.

This is the step that closes the loop. Once the new fee service had been serving 100% of traffic for long enough to cover seasonality and partner cycles, we deleted the inline fee logic from the search monolith. The seam stayed. The legacy code did not.

If you skip this step, you have not done a Strangler Fig migration. You have built a second system and kept the first one. That isn’t modernization β€” it’s a tax.

The seam, the shadow, the comparison log, the feature flag, the ramp, the deletion. Each one is doing a different job. Each one earns its place by reducing a specific category of risk.

Keeping the experience identical while changing everything underneath

One of the most important constraints we had was preserving customer experience. Customers did not care about our architecture. They cared about seeing the right prices. They cared about booking successfully. They cared about trust.

That meant our first objective was not innovation. It was compatibility.

For a period of time, the new platform needed to produce the same outputs as the existing systems. Only after earning confidence could we begin introducing new capabilities.

In practice, the journey looked closer to this:

Growing a new pricing layer around the existing system

πŸ”„ Rendering PlantUML diagram...

The new layer initially looked redundant. Some engineers naturally questioned its value. Why build something that behaves exactly the same?

Because identical behavior today creates freedom tomorrow.

Transitional architecture is not waste

One lesson that stayed with me from that period is that engineers often underestimate the value of temporary architecture.

We like permanent solutions. We like clean systems. We like building things that last.

But modernization rarely happens that way. Sometimes the most valuable component is the one designed to disappear.

Routing layers. Compatibility adapters. Migration services. Shadow execution paths. Comparison dashboards. Feature flags whose only job is to be turned off one day. These components may eventually be deleted, but they make progress possible.

Transitional architecture is not waste. It is scaffolding. And scaffolding is often what allows us to safely construct something larger.

The real cost of modernization is rarely the temporary code. The real cost comes from trying to avoid temporary code and forcing a risky migration instead.

What I learned

Looking back, joining the pricing team taught me something that continues to influence how I approach large-scale change.

We often imagine transformation as replacement. Old becomes new. Legacy becomes modern. Monolith becomes platform.

Reality is usually less dramatic. The most successful transformations I’ve seen look more like gardening than demolition. You create space. You understand the ecosystem. You identify what should grow. You remove constraints carefully. And over time, something new emerges around the existing structure.

The Strangler Fig pattern isn’t really about software architecture. It’s about respecting complexity. It’s about acknowledging that businesses cannot stop while engineers redesign systems. And it’s about recognizing that the safest path to the future is often one small, deliberate step at a time.

If you’re staring at a system that works but you can no longer evolve, the question worth asking isn’t “how do we replace this?” It’s “where can we put the first seam, and what would we need to see before we trusted the new path with one percent of traffic?”

Answer that, and the migration writes itself.