Data Architecture for Fintech: The Decisions That Actually Matter
Most advice on data architecture for fintech reads like a parts catalog: use Kafka, use Postgres, add microservices, sprinkle in a data lake. All true, all useless. None of it tells you the things that will actually get you fined, sued, or paged at 3 a.m. when a reconciliation job finds $40,000 that doesn't exist.
The hard part of data architecture in fintech is not throughput. It is that money has properties most data doesn't: it must balance, it must be auditable years after the fact, it cannot be silently lost or duplicated, and a regulator can ask you to reconstruct the exact state of an account on a Tuesday eighteen months ago. If your architecture treats a transaction like any other CRUD row, you have already lost. This post is about the decisions a senior architect actually agonizes over — the ones that are expensive to get wrong and nearly impossible to retrofit.
The ledger is your system of record — everything else is a cache
The single most important decision in fintech data architecture is naming your system of record (SoR) for money, and refusing to let anything else hold that authority. In practice that SoR is a ledger, and the ledger should be append-only and double-entry.
Double-entry isn't accounting nostalgia. It is a structural invariant: every movement of value writes a debit and an equal credit, so the sum of all entries is always zero. That invariant is checkable. You can run a continuous job asserting SUM(amount) = 0 across the ledger and across any account's entries, and if it ever drifts, you have a bug you can catch in minutes instead of discovering during a quarterly audit.
The append-only part matters just as much. You never update or delete a ledger entry. A reversal is a new pair of entries, not a mutation of the old one. This gives you three things you cannot retrofit later:
- A complete, immutable audit trail by construction.
- The ability to reconstruct any account balance at any point in time by replaying entries up to a timestamp.
- A natural fit for dispute resolution and regulatory reconstruction requests.
A common and costly mistake is storing a single mutable balance column on an account and treating it as truth. The moment two transactions race, or a retried webhook double-fires, your balance and your transaction history disagree — and in fintech, when the numbers disagree, the transaction log wins every time. The balance should be a derived, cached projection of the ledger, recomputable from entries, never the authority.
| Concern | Wrong default | Fintech-correct pattern |
|---|---|---|
| Balance | Mutable balance column as truth | Projection derived from append-only entries |
| Corrections | UPDATE the bad row | Reversing entry; original preserved |
| History | Soft-delete with is_deleted | Immutable log; status changes are new events |
| Idempotency | "It probably won't double-fire" | Idempotency key enforced at the write boundary |
Strong consistency where it counts, eventual consistency everywhere else
The microservices crowd will tell you eventual consistency scales better. They're right, and they're dangerous when applied without discrimination. The skill in fintech data architecture is drawing the line precisely.
Strong, transactional consistency is non-negotiable inside the money boundary. When you move funds between two accounts, the debit and credit must commit atomically. There is no acceptable window in which the money has left one account but not arrived in the other and a query could observe the gap. This is exactly what relational ACID transactions were built for, which is why the ledger almost always lives in PostgreSQL or a distributed-SQL engine (CockroachDB, Spanner, Yugabyte) that preserves serializable semantics across nodes. Distributed SQL is increasingly the answer when a single-node Postgres can no longer hold the write volume but you refuse to give up transactional guarantees — which, for a ledger, you should refuse to do.
Eventual consistency is fine — and preferable — for everything reading off that boundary: dashboards, analytics, notifications, search, fraud-scoring features, monthly statements. These are read models. They can lag the ledger by seconds and no one is harmed.
This naturally produces a CQRS shape: the command side (write a transaction) is small, strongly consistent, and optimized for correctness; the query side (show me my history, my balance, my spending trends) is denormalized, eventually consistent, and optimized for read fan-out. Event sourcing pairs well here because the ledger already is an event log — your read models are just projections subscribed to it.
The pitfall is letting CQRS leak into the write path "for performance" and reading a stale projection to make a debit decision. If you authorize a withdrawal against an eventually-consistent balance, you will eventually authorize an overdraft. Authorization decisions read the SoR, or a synchronously-updated reservation, never a lagging projection.
Design for the auditor and the regulator from day one
In most domains, compliance is a layer you add. In fintech, it is a constraint on the data model itself, and bolting it on later means a rewrite. A few principles that should shape the architecture before you write a migration:
- Immutability and provenance. Auditors don't just want the current value; they want to know what it was, when it changed, and who changed it. Append-only ledgers give you this for money. For everything else sensitive (KYC status, limits, account flags), prefer event logs or temporal tables over in-place updates.
- Data residency and segmentation. GDPR, regional banking rules, and data-localization laws may require that data for customers in a given jurisdiction physically stays there. This is a partitioning decision in your architecture, not a checkbox. If you discover this after you've co-mingled all customers in one cluster, you are looking at a painful re-shard.
- PII minimization and tokenization. The cardholder data environment (CDE) for PCI DSS should be isolated so that the scope of audit is small. The classic pattern is "compliance by isolation": put card data behind its own service with its own datastore, tokenize at the edge, and keep raw PANs out of every other system. Most of your microservices should never touch a real card number.
- Right to erasure vs. the immutable ledger. GDPR's right-to-be-forgotten collides head-on with an append-only financial record you are legally required to retain. The reconciliation is crypto-shredding: store erasable PII encrypted with a per-subject key, and "delete" by destroying the key while the financial entries (now referencing an anonymized subject) remain intact and balanced. Decide this early; it dictates how PII is keyed throughout.
A useful test: pick any account and any historical date, and ask whether your architecture can answer "what was true here, and why, and who is responsible?" If the honest answer is "we'd have to reconstruct it from logs and hope," you have a data architecture gap, not an ops gap.
Pick storage by access pattern, not by hype
Once the SoR and consistency boundaries are settled, polyglot persistence is genuinely the right answer — but driven by access patterns, not résumé-building. A defensible default split:
| Workload | Characteristics | Sensible choice |
|---|---|---|
| Ledger / SoR | Strong consistency, transactional, audited | PostgreSQL or distributed SQL |
| Event backbone | High-write, ordered, replayable | Kafka / Pulsar with a schema registry |
| Hot reads (balances, dashboards) | Low-latency, denormalized | Redis / read-replica projections |
| Search & history | Full-text, faceted queries | Elasticsearch / OpenSearch |
| Analytics & ML features | Columnar, large scans | Warehouse / lakehouse (Snowflake, BigQuery, Iceberg) |
Two non-obvious rules govern this. First, the schema registry on the event backbone is not optional. The moment more than one team consumes events, an unversioned schema change becomes a production incident; enforced, backward-compatible contracts are what let you evolve the ledger without breaking every downstream projection. Second, the analytics store is downstream of the SoR, never a peer. Analysts will want to "just fix" a number in the warehouse; the warehouse is a derived view, and corrections flow from the ledger out, never the reverse. The instant your warehouse and your ledger can disagree about a customer's balance with no clear winner, you've lost the property that makes the whole system trustworthy.
A pragmatic checklist before you build
If you're standing up or auditing data architecture for a fintech product, these are the questions worth answering on a whiteboard before any code:
- What is the single system of record for money, and is it append-only double-entry?
- Where exactly is the strong-consistency boundary, and what reads are allowed to be eventual?
- How do you enforce idempotency at every write that touches money? (Payment processors retry; webhooks duplicate. Plan for it.)
- Can you reconstruct any account's state at any past timestamp?
- Is the CDE / PII isolated enough to keep your PCI and privacy audit scope small?
- How do you reconcile GDPR erasure with mandatory financial retention? (Crypto-shredding or equivalent.)
- Is every analytical and read store provably downstream of the ledger?
Get these seven right and the parts-catalog questions — which message broker, which cache — become reversible implementation details. Get them wrong and no amount of Kafka will save you.
These same constraints are exactly the ones that make fintech systems painful to scaffold by hand: the ledger invariants, the consistency boundaries, the audit and residency requirements all have to be wired correctly the first time, across every service. That's the gap Archiet is built to close — it turns a product spec into a formal architecture model (ArchiMate, DMN, BPMN) and then deterministically generates conforming, gate-checked scaffolding with these patterns baked in, so the system of record, the read projections, and the compliance boundaries are consistent by construction rather than by heroics. If you're past the whiteboard and tired of re-implementing the same invariants per service, it's worth a look.
Sources consulted: Trio — Payment Ledger Architecture, Cockroach Labs — Why FinTech Is Moving to Distributed SQL, Dattell — Data Engineering for Fintechs, Bamboodt — Fintech Microservices Architecture.