Skip to main content

On Data Consistency Pattern

 Data consistency along with Multi-Step Process and Composite Application are the most common integration scenarios. At a very high level, multiple components manipulate and store the same entity. When one component modifies the entity the others must be notified of this change and ensure a consistent state of the entity across the whole system. 

In real applications, to address the complexities around keeping the entity consistent in a distributed system, the business and system architects impose constraints that simplify the patterns of interaction and lead to a robust solution. The rest of these article contain typical approaches to Data Consistency. Depending on the type of entity there could be design variations but they all share the approach. To make this document more concrete, I will use the Customer entity as an example.

One solution is to centralize all the Customer changes in one component, the “Customer Master,” and all the other component keep read-only copies of the entity.  The master has a data-store capability (Customer Repository) and an editor of some form,  hosted either on the same server or as a remote client. Multiple users can modify the entity in the editor(s) but there is only one Customer Repository instance. 
Changes made to the Customer in the Master are sent asynchronously to a “Sync Engine” which analyzes the input and updates the Components that contain Customer copies. In the most general solution on any Customer change, the entire Customer record is sent to the Sync engine which passes this information to all the other systems which contain Customer copies. In reality that is not the case, the Sync Engine filters and transforms the Customer record before sending to the target components. This reduces the traffic in the system (filtering) and reduces the development effort for the target systems when the Customer schema changes.
Dispatch is an object inside the Sync Engine that receives the message, validates the syntax, filters messages for each target systems, logs the activities in the Dashboard. A typical Dispatch implementation uses a Business Rules Engine (e.g. Drools, iLog) running inside the Sync Engine.

Version Control

As multiple components (the editor and target components) have different copies of the Customer at different times a versioning model for the Customer is required. For the purpose of this document we are going to use the LastModified field of the Customer to keep the version. Other options would be an revision number managed in the Customer Master Repository.
In the interactions between Customer Editor(s) and the Customer MR, the versioning is done by a combination of optimistic locking and merging. Assume the following scenario:
  1. User Alice retrieves Customer A with timestamp T1 and starts making modification: Customer A(T1)
  2. User Bob retrieves Customer A with timestamp T1 and starts making modifications: Customer A (T1)
  3. User Alice saves the changes, the timestamp is now T2. Customer A(T2)
  4. User Bob tries to save but it fails because of differences in the timestamps. It then updates the local copy by merging his changes with the changes made in the repository between T2 and T1. Merge(Customer A (Local), Customer A (T2), Customer A (T1))
Another aspect of versioning is with respect to the target systems. Components B, C and others have their version of the Customer profile. Ideally is the same as the Customer MR but in at least two scenarios is different. One is when a new change is in flight, another is when the synchronization process failed.
For the “in flight” changes the scenario is:
  1. Customer MR, Component B and Component C are in sync, all having the version Customer A (T1)
  2. Alice makes a change to Customer A and it is saved in Customer MR: Customer A(T2)
  3. Customer MR produces an event and sends the change to the Sync Engine. It can be in several ways:
    • Full image of Customer A (T2)
    • Changes made to Customer A: Customer A(T2) – Customer A (T1)
    • Full image and a summary of changes: Customer A(T2), Customer A(T1) – Customer A(T2)
  4. Sync Engine gets the change and applying business rule propagates the changes to Component B, C, … .
  5. Component B (or C, …) receives the change and applies it to the copy. Depending on how the change is passed in the Sync engine, applying the change could mean: replacing all the Customer fields with the new version or updating only the fields that changed.

Another scenario in which the copy can be out of sync is when there were errors in propagating the change between Customer MR and the copy.
  1. Start with the Customer MR and all the copies being at the same version T1
  2. Alice make a change in Customer MR, Customer A (T2), which is propagated to the other components. Component B is updated to the new version but Component C is not, it remains at T1
  3. Bob makes a change to Customer A and saves it in Customer MR (Customer A (T3)). The changes are sent to Component B and C, B is at version T2, C is at version T1. The procedure to update the Customer will be different.

The two scenarios described above show the compromises that need to be made in the design:
  • it is more efficient to send only the changes to the Sync Engine. But, this is not always possible because:
    • some of the components cannot do update at a field level, they need to have at least some of the Customer information that is not changing
    • if we only send the change and one of the changes is lost (or error) the system does not recover naturally from this error. And there are limits on how the system can get back in sync. We need a special way to send the Customer profile and indicate that this is a ForceSync that should ignore previous changes
  • if we send the full image every time,
    • both the traffic and the number of updates is large unless each component implements a form of smart-update
    • the dispatch engine may not be capable of routing based on change
  • if we send the full image and the summary of changes,
    • the traffic is large
    • the dispatch engine may decide on when to send the full copy and when to send the change only
    • the issues of ForceSync still remains for the components that accept only changes
Based on this review:
  • all components should accept a ForceSync message which takes the full image of the Customer and updates its internal. This is important to achieve “data resiliency” otherwise the components which do not accept ForceSync will never get up-to-date
  • component should keep the timestamp of the record received. This information can be used to easily bring data consistent – through batch jobs that compare revisions.
  • Customer MR must be able to send ForceSync messages to one or multiple components. This will create the infrastructure to achieve eventually-consistent goal
  • The change details need to be available only at a high level structure. Inside the structure is better to have full information about the Customer (e.g. full address)
  • In a change representation, we need to make a distinction between an element remaining the same (no change) and an element which is deleted. The general rule is that if the element is not present in the request it did not change, it should be present and have the Action set to Delete in order to indicate a change.

Popular posts from this blog

Performance Testing a New CRM

Performance testing  is challenging, frustrating, often underestimated typically getting attention only after an incident. How did we the performance test and what did we learn during the development and implementation of  web-services for a new CRM system?

What Can Category Theory Do for Me?

Category Theory is one of the hot topics in computer science. There are many blogs, youtube videos and books about it. It is an elusive subject, with the potential to be the ultimate unifier of everything (math, quantum physics, social justice), to allow us to write the best possible programs and many other lofty goals. I decided to explore the field and see how it can help me. Below is a summary of 12 weeks of reading and watching presentations about Category Theory. It took me some time to select the study material. In the end I decided to use David Spivak’s book (Category Theory for Sciences, David Spivak, 2014) and the youtube recording of a 2017 workshop. These provided both a rigorous approach and a lighter version in the youtube videos. In parallel we explored other sources for a more practical perspective. The CTfS book is a systematic introduction to Category Theory, with definitions and proofs. I liked that, it is a solid foundation. The examples are mostly from math (Set,

On Defining Messages

“Defining Message Formats” is the title of a message posted on the Service Oriented Architecture mailing list [1]  which attracted a lot of attention.  The post summarizes  the dilemma faced by solution architects when they have to define a service interface: 1. Base the internal message format on the external message standard (MISMO for this industry). This message set is bloated, but is mature, supports most areas of the business, and is extensible for attributes specific to this company and its processes. 2. Create an XML-based message set based on the company's enterprise data model, which the company has invested a great amount of money into, and includes a very high percentage of all attributes needed in the business. In a nutshell, we've generated an XML Schema from ER Studio, and will tinker with that construct types that define the payloads for messages. 3. Use MISMO mainly for its entity definitions, but simplify the structure to improve usability. We benefit from the