On Data Consistency Pattern

Data consistency along with Multi-Step Process and Composite Application are the most common integration scenarios. At a very high level, multiple components manipulate and store the same entity. When one component modifies the entity the others must be notified of this change and ensure a consistent state of the entity across the whole system.

In real applications, to address the complexities around keeping the entity consistent in a distributed system, the business and system architects impose constraints that simplify the patterns of interaction and lead to a robust solution. The rest of these article contain typical approaches to Data Consistency. Depending on the type of entity there could be design variations but they all share the approach. To make this document more concrete, I will use the Customer entity as an example.

One solution is to centralize all the Customer changes in one component, the “Customer Master,” and all the other component keep read-only copies of the entity. The master has a data-store capability (Customer Repository) and an editor of some form, hosted either on the same server or as a remote client. Multiple users can modify the entity in the editor(s) but there is only one Customer Repository instance.

Changes made to the Customer in the Master are sent asynchronously to a “Sync Engine” which analyzes the input and updates the Components that contain Customer copies. In the most general solution on any Customer change, the entire Customer record is sent to the Sync engine which passes this information to all the other systems which contain Customer copies. In reality that is not the case, the Sync Engine filters and transforms the Customer record before sending to the target components. This reduces the traffic in the system (filtering) and reduces the development effort for the target systems when the Customer schema changes.

Dispatch is an object inside the Sync Engine that receives the message, validates the syntax, filters messages for each target systems, logs the activities in the Dashboard. A typical Dispatch implementation uses a Business Rules Engine (e.g. Drools, iLog) running inside the Sync Engine.

Version Control

As multiple components (the editor and target components) have different copies of the Customer at different times a versioning model for the Customer is required. For the purpose of this document we are going to use the LastModified field of the Customer to keep the version. Other options would be an revision number managed in the Customer Master Repository.

In the interactions between Customer Editor(s) and the Customer MR, the versioning is done by a combination of optimistic locking and merging. Assume the following scenario:

User Alice retrieves Customer A with timestamp T1 and starts making modification: Customer A(T1)
User Bob retrieves Customer A with timestamp T1 and starts making modifications: Customer A (T1)
User Alice saves the changes, the timestamp is now T2. Customer A(T2)
User Bob tries to save but it fails because of differences in the timestamps. It then updates the local copy by merging his changes with the changes made in the repository between T2 and T1. Merge(Customer A (Local), Customer A (T2), Customer A (T1))

Another aspect of versioning is with respect to the target systems. Components B, C and others have their version of the Customer profile. Ideally is the same as the Customer MR but in at least two scenarios is different. One is when a new change is in flight, another is when the synchronization process failed.

For the “in flight” changes the scenario is:

Customer MR, Component B and Component C are in sync, all having the version Customer A (T1)
Alice makes a change to Customer A and it is saved in Customer MR: Customer A(T2)
Customer MR produces an event and sends the change to the Sync Engine. It can be in several ways:

Full image of Customer A (T2)
Changes made to Customer A: Customer A(T2) – Customer A (T1)
Full image and a summary of changes: Customer A(T2), Customer A(T1) – Customer A(T2)

Sync Engine gets the change and applying business rule propagates the changes to Component B, C, … .
Component B (or C, …) receives the change and applies it to the copy. Depending on how the change is passed in the Sync engine, applying the change could mean: replacing all the Customer fields with the new version or updating only the fields that changed.

Another scenario in which the copy can be out of sync is when there were errors in propagating the change between Customer MR and the copy.

Start with the Customer MR and all the copies being at the same version T1
Alice make a change in Customer MR, Customer A (T2), which is propagated to the other components. Component B is updated to the new version but Component C is not, it remains at T1
Bob makes a change to Customer A and saves it in Customer MR (Customer A (T3)). The changes are sent to Component B and C, B is at version T2, C is at version T1. The procedure to update the Customer will be different.

The two scenarios described above show the compromises that need to be made in the design:

it is more efficient to send only the changes to the Sync Engine. But, this is not always possible because:

some of the components cannot do update at a field level, they need to have at least some of the Customer information that is not changing
if we only send the change and one of the changes is lost (or error) the system does not recover naturally from this error. And there are limits on how the system can get back in sync. We need a special way to send the Customer profile and indicate that this is a ForceSync that should ignore previous changes

if we send the full image every time,

both the traffic and the number of updates is large unless each component implements a form of smart-update
the dispatch engine may not be capable of routing based on change

if we send the full image and the summary of changes,

the traffic is large
the dispatch engine may decide on when to send the full copy and when to send the change only
the issues of ForceSync still remains for the components that accept only changes

Based on this review:

all components should accept a ForceSync message which takes the full image of the Customer and updates its internal. This is important to achieve “data resiliency” otherwise the components which do not accept ForceSync will never get up-to-date
component should keep the timestamp of the record received. This information can be used to easily bring data consistent – through batch jobs that compare revisions.
Customer MR must be able to send ForceSync messages to one or multiple components. This will create the infrastructure to achieve eventually-consistent goal
The change details need to be available only at a high level structure. Inside the structure is better to have full information about the Customer (e.g. full address)
In a change representation, we need to make a distinction between an element remaining the same (no change) and an element which is deleted. The general rule is that if the element is not present in the request it did not change, it should be present and have the Action set to Delete in order to indicate a change.

Flowid

Search This Blog

On Data Consistency Pattern

Version Control

Labels

Popular posts from this blog

View - A Functor for Web App Design

Performance Testing a New CRM

On Defining Messages