Every data org eventually invents the same fantasy: one canonical pipeline that handles everything.
One ingestion path, One canonical data model. One blessed transformation layer. One place where quality, governance, lineage, and delivery all happen neatly.
It sounds responsible. It often becomes a bottleneck.
The tracker that wanted to become a platform
At Credit Karma, I worked on a mortgage application tracker. The problem was simple enough: once we handed a lead to a partner, we lost visibility into what happened next.
Did the user finish the application? Was it approved? Did the loan close? Where did the process stall?
The first version solved a mortgage problem. The more interesting realization was that mortgage was not special. Auto loans, personal loans, mortgage loans: they all had some version of the same lifecycle visibility problem.
What looked like a one-off tracker was really a platform trying to exist.
This is where centralized pipeline thinking starts to creak. A single shared pipeline wants every domain to agree on shape, timing, ownership, and semantics before anyone ships. But loan products do not all move the same way. Their partners are different. Their events arrive differently. Their definitions of "application submitted" or "approved" can have product-specific edges.
The useful abstraction was not one pipeline; it was a shared pattern for domain-owned events.
Shared pipelines create fake simplicity
The strongest argument for a central pipeline is consistency.
The argument is not wrong (data quality matters, lineage matters, governance matters). Having five teams produce five definitions of the same metric is a real problem.
But a central pipeline often creates a fake kind of consistency. The schema is shared, but the meaning is negotiated in meetings. The transformation is centralized, but ownership is vague. Everyone depends on the pipeline, so nobody can change it quickly.
Then the system gets political.
Need a schema change? Get in line. Need a product-specific exception? Explain why your domain is special. Need to backfill a new field? Coordinate with every consumer who might be assuming the old shape.
This is how a pipeline becomes an org chart with retry logic.
Domains are a better unit of ownership
I prefer starting from domains: a domain owns the source data, the processing logic, and the contract it publishes. Other teams consume the contract, not the internal machinery. The shared platform provides standards (event transport, schema registry, observability, lineage, access controls, backfill tooling).
That split matters.
The platform should make the good path easy. It should not require every product domain to pretend it has the same lifecycle, latency needs, or data model.
In practice, this often means publishing well-defined events through something like Kafka and treating schemas as real contracts, not informal agreements in proto form. Consumers can build their own projections; domains can evolve their internals; and the platform can enforce compatibility, metadata, and operational guarantees.
You still need governance, but you move it to the boundaries where it belongs.
The tradeoff is real
Domain-driven pipelines are messier than a single central system.
There are more streams, contracts, local decisions and chances for teams to model the same thing differently if the platform does not provide enough guidance.
That mess is the cost of keeping ownership close to the people who understand the data.
The alternative is often cleaner on a diagram and worse in production. One central pipeline can look elegant right up until every team is waiting on it, afraid to change it, or quietly building shadow pipelines because the official one cannot move.
The point is not decentralization for its own sake; the point is clear ownership with shared infrastructure underneath it.
One perfect pipeline is a comforting idea, but most real domains eventually outgrow it.