Data consistency along with Multi-Step Process and Composite Application are the most common integration scenarios. At a very high level, multiple components manipulate and store the same entity. When one component modifies the entity the others must be notified of this change and ensure a consistent state of the entity across the whole system.
In real applications, to address the complexities around keeping the entity consistent in a distributed system, the business and system architects impose constraints that simplify the patterns of interaction and lead to a robust solution. The rest of these article contain typical approaches to Data Consistency. Depending on the type of entity there could be design variations but they all share the approach. To make this document more concrete, I will use the Customer entity as an example.
Version Control
- User Alice retrieves Customer A with timestamp T1 and starts making modification: Customer A(T1)
- User Bob retrieves Customer A with timestamp T1 and starts making modifications: Customer A (T1)
- User Alice saves the changes, the timestamp is now T2. Customer A(T2)
- User Bob tries to save but it fails because of differences in the timestamps. It then updates the local copy by merging his changes with the changes made in the repository between T2 and T1. Merge(Customer A (Local), Customer A (T2), Customer A (T1))
- Customer MR, Component B and Component C are in sync, all having the version Customer A (T1)
- Alice makes a change to Customer A and it is saved in Customer MR: Customer A(T2)
- Customer MR produces an event and sends the change to the Sync Engine. It can be in several ways:
- Full image of Customer A (T2)
- Changes made to Customer A: Customer A(T2) – Customer A (T1)
- Full image and a summary of changes: Customer A(T2), Customer A(T1) – Customer A(T2)
- Sync Engine gets the change and applying business rule propagates the changes to Component B, C, … .
- Component B (or C, …) receives the change and applies it to the copy. Depending on how the change is passed in the Sync engine, applying the change could mean: replacing all the Customer fields with the new version or updating only the fields that changed.
- Start with the Customer MR and all the copies being at the same version T1
- Alice make a change in Customer MR, Customer A (T2), which is propagated to the other components. Component B is updated to the new version but Component C is not, it remains at T1
- Bob makes a change to Customer A and saves it in Customer MR (Customer A (T3)). The changes are sent to Component B and C, B is at version T2, C is at version T1. The procedure to update the Customer will be different.
- it is more efficient to send only the changes to the Sync Engine. But, this is not always possible because:
- some of the components cannot do update at a field level, they need to have at least some of the Customer information that is not changing
- if we only send the change and one of the changes is lost (or error) the system does not recover naturally from this error. And there are limits on how the system can get back in sync. We need a special way to send the Customer profile and indicate that this is a ForceSync that should ignore previous changes
- if we send the full image every time,
- both the traffic and the number of updates is large unless each component implements a form of smart-update
- the dispatch engine may not be capable of routing based on change
- if we send the full image and the summary of changes,
- the traffic is large
- the dispatch engine may decide on when to send the full copy and when to send the change only
- the issues of ForceSync still remains for the components that accept only changes
- all components should accept a ForceSync message which takes the full image of the Customer and updates its internal. This is important to achieve “data resiliency” otherwise the components which do not accept ForceSync will never get up-to-date
- component should keep the timestamp of the record received. This information can be used to easily bring data consistent – through batch jobs that compare revisions.
- Customer MR must be able to send ForceSync messages to one or multiple components. This will create the infrastructure to achieve eventually-consistent goal
- The change details need to be available only at a high level structure. Inside the structure is better to have full information about the Customer (e.g. full address)
- In a change representation, we need to make a distinction between an element remaining the same (no change) and an element which is deleted. The general rule is that if the element is not present in the request it did not change, it should be present and have the Action set to Delete in order to indicate a change.