Building an immutable bank
How to design and architect an immutable, event-driven bank—and what that means.
When we started building Griffin, we knew immediately that we wanted to design the internal systems around an immutable, event-driven architecture. We were coming from an immutable language background (Clojure) and had seen event-driven systems used to good effect at other financial services companies. It was an obvious choice.
This post describes the architectural design of Griffin at a fairly high level, starting from the bottom.
The ledger as an event log of debits and credits
At the heart of any bank is the ledger, which tracks the balances of both the bank's accounts and its customers'. Double-entry bookkeeping has been the apex predator of ledger technology since 1397 in Rennaisance Italy and after 600+ years it's still the obvious design choice here.[1]
The data model for a double-entry bookkeeping system is simple: you have accounts, journal entries, and line items. Accounts include both customer-facing accounts that correspond to e.g. current/checking accounts or loans, as well as bank-facing mirrors of those (deposits payable and loans receivable, in this case). Line items are debits or credits, and are attached to a journal entry. Journal entries are pairs or series of line items, for which the sum of debits must be equal to the sum of credits. Since journal entries cover all sides of a transaction, any metadata about the nature of the transaction is attached to the journal entry rather than to the line items on either side.
The core conceit of a double-entry system is to eliminate calculation errors or "lost money" - any debit or credit on one side of the book must be matched by its opposite on the other side. By way of example, when a customer deposits cash via ATM into their current account, we credit their [asset] account and debit a liability account that tracks the sum of all customer deposits ("deposits payable"). The deposits payable account reflects the fact that we (the bank) will have to pay the customer when they withdraw from that account.
A double-entry system doesn't prevent us from making other mistakes like paying someone the wrong amount of money, or paying the wrong person. But it does make sure that we never find ourselves asking "where did this money come from?", or, worse: "where did this money go!?"
There are essentially two views into a bank. One is a view of transactions or transfers between two or more accounts; this is what journal entries represent and is in some senses a global history. If you could put together such a view for every account in every ledger in the world, you would have a log of every financial transaction that had ever taken place.
The other is a view of transactions specific to a single account, and this is what line items represent. A view of a single account's line items (along with some metadata from the journal entries attached to those) is essentially what you're looking at whenever you're looking at your personal bank account or loan statement.
Each line item has a foreign key relationship to one journal entry and one account. A journal entry typically has two line items—one debit and one credit—but can have more in certain cases. In our case, accounts and journal entries do not have a direct FK relationship to each other outside of a join through the line items table—there aren't a lot of use cases where someone needs transaction metadata on an account without also knowing the sums involved.
Note that this architecture means that outside of a caching layer we don't store the balance of an account as a mutable/durable figure. Instead, the balance is calculated by calculating the net sum of debits and credits applied to that account; in this way we can provably calculate or re-calculate the balance of an account at any point in the account's history.
// A sample current account. // // Note that the ledger doesn't need to know about the fact this is a // current account with sort code and publicly-facing account number; // that can be handled at a higher level. {"id": "ae5d6583-3f87-4612-9aa6-7aff0c0d2272", "type": "asset", "currency": "GBP", "created_at": "2019-10-10 20:38:12.584889+01", "deleted_at": null, "min_balance": 0, "max_balance": null}
// A line item for a paycheck being deposited {"account_id": "ae5d6583-3f87-4612-9aa6-7aff0c0d2272", "journal_entry_id": "cb648791-9eb4-4686-9eb0-195f9f8c0497", "amount": 257275, "type": "debit"}
// A line item for an ATM withdrawal {"account_id": "ae5d6583-3f87-4612-9aa6-7aff0c0d2272", "journal_entry_id": "c15d8eb5-e4b2-48ef-818b-c54a99eee54c", "amount": 20000, "type": "credit"}
// Ledger balance retrieval [internally facing, not external API] {"id": "ae5d6583-3f87-4612-9aa6-7aff0c0d2272" "type": "asset", "currency": "GBP", "timestamp": "2019-10-18 14:28:02.434482+01" "balance": 255275}
You also might note that we leave in place a deleted_at
field on our accounts so that we can soft-delete closed accounts, when necessary.
The ledger's write API is fairly simple. It can create accounts, and it can create journal entries. The creation of a journal entry necessitates the creation of n line items, for which (as mentioned earlier), the sum of debits must be equal to the sum of credits.
The ledger's read API is also fairly simple. It can retrieve an account's line items (from which a balance for the account can be calculated). When retrieving the line items, it can also fetch the relevant journal entries (which will have metadata like a description of the transaction attached). Finally, it can retrieve a specific journal entry.
Handling real-world transactions
Actual transactions are almost never as simple as a pure pair of debits and credits, however. Just for starters, there are constraints on some accounts that don't apply to others—a ledger account corresponding to a current account shouldn't go below zero, and likewise a ledger account corresponding to a loan shouldn't go above the loan cap. Transactions are therefore both a read and write operation—we shouldn't create the line item without checking the balance against any relevant constraints. This means both need to happen atomically.
Secondly, real-world transactions that deal with external counterparties are prone to complex settlement paths with multiple failure cases. What if we try to send a payment to another bank and it never responds to confirm receipt? What if the payment system itself is down? What happens if a new request comes in for an outbound payment while we haven't finished settling a prior payment—and where the sum of both payments would violate an account invariant?
Ideally, the ledger is the ultimate system of record, and—partially as a consequence of that—should only concern itself with fully settled transactions.[2] Transactions that are still in-flight or otherwise capable of failure should be tracked at a higher level (at least in terms of code—this engine still needs read access to the ledger to prevent sending more money than is in a given account). We refer to this logical layer as the transactor.
The funds affected by any given transaction with an external system can be abstracted to a fairly simple set of states. These states are instruction (we've been told about a transaction but haven't established a hold on funds yet ), held (the transaction is in process and any other transactions should treat these funds as reserved), failed (the transaction has been aborted) or completed (the transaction has completed and a journal entry has been written to the ledger).
The transactor takes the more complex states involved in the settlement processes of individual payment rails and maps those onto the simpler states described above. Using the simpler states, the transactor is then also responsible for ensuring that account invariants (like spending beyond a loan cap or account minimum) are not violated when further inbound transaction events come in.
Finally, the transactor is responsible for digesting information about proposed transactions and emitting an event with various risk markers to our transaction monitoring infrastructure. The transaction monitoring service then evaluates those risk markers against known markers of financial crime, and can trigger other follow-on events (e.g. blocking the transaction or even freezing the relevant account) if necessary.
Workflow Events
Going one step further up the stack, we have non-transaction workflow events.
This somewhat vague term covers a number of different user stories in which money isn't being moved. Some examples include:
- Customer onboarding
- Opening a current account
- Changing contact information
These events can be thought of as affecting metadata - they by themselves don't change the financial position of our customers or the bank. In point of fact, most of these events aren't subscribed to by the ledger or the transaction engine at all.[3]
A large number of these workflow events will trigger various compliance checks, e.g.:
- When onboarding a customer, the information received is passed on to various services that look up identities and verify documents (per diagram, above).
- A loan application event triggers both a credit check as well as a check against the bank's available credit and current risk tolerance.
- Changing contact information triggers a user-facing verification process as well as a second set of know-your-customer checks.
All of the above event processors emit their own events once finished with their respective compliance workflow, letting the original service know that it is okay to proceed with the original workflow in question.
Changing contact information or (more generally) entity information is a particularly interesting case as we may be asked in the future to identify a customer by any attribute they have ever previously held. In this regard the history of customer metadata events must be known and searchable. Your address in 2005 is, from our perspective, not that different from a file in a git repository at a particular commit hash.
Global consistency
Much like the ledger at the heart of Griffin, you can think of the global financial system as a history of facts:
- At time 1, customer A opens account at institution B.
- At time 2, customer C executes a trade of security D with counterparty E on exchange F.
- etc.
Within that system you have (1) nodes (like banks) which are themselves complex distributed systems in their own rights, and (2) the global event-driven architecture by which they communicate (which encompasses all payment methods and schemes, from ACH to Visa to SWIFT). It is a somewhat fraught multi-tiered distributed system operating at massive scale, and therefore questions of consistency naturally arise.
Nodes are for the most part internally consistent, though they may accept a degree of local inconsistency where the overall financial position of the entity is otherwise approprately safeguarded. Even the largest of banks still operate at a scale where internal consistency can typically be guaranteed (if they know what they're doing). As we'll be starting from zero, our internal modeling suggests we won't run into consistency issues any time in the next decade.
Global consistency is largely ensured by the distributed locking mechanisms that are built-into the vast majority of the financial system's payment rails. Final settlement is handled by a third party, which acts as a synchronization mechanism for the funds looking to transact. The specific state machines that govern these multi-phase commit operations are fairly idiosyncratic across the world's various payment rails, but for the most part they can be reduced to a small number of proven synchronisation algorithms.
Conclusion
The above is a fairly high-level description of the architecture of Griffin and how it fits into the larger financial system. In the future, we'll supplement this post with greater technical detail on how exactly we've built the above.
If you've got comments or questions about this post, feel free to reach out to us on twitter @griffinbank.
If you're a software engineer in the UK and this sort of thing interests you, we're currently hiring for engineer #3! You can learn more about the position here and our corporate culture here. We'd love to hear from you:)
1. The other immutable ledger technology of note is, of course, blockchain. At least as of today, blockchain's distributed proofs require a performance and compute power cost that does not bring with it any tangible benefit for a bank such as ourselves. Accordingly, blockchain is not a good choice for us—though we remain open to the possibility that the right innovations might change that in the future.
Note that the above caveat is specific to the bank's internal ledger; external payment rails are another matter.
2. Some "fully settled" transactions will inevitably end up being written to the ledger when they shouldn't have been. Maybe a fee was taken that needs to be reversed, or maybe another bank has notified us that a settled payment needs to be recalled. In these cases we write a second journal entry acknowledging the reversal. It is far better for us to have a record both of the original event and a subsequent negating one than to try to re-write history.
3. Some events, such as opening a bank account or taking out a loan, will trigger events with the ledger. For instance, opening a current or loan account will trigger the creation of new ledger accounts (albeit ones without any corresponding line items).