Skip to main content

On Data Consistency Pattern

 Data consistency along with Multi-Step Process and Composite Application are the most common integration scenarios. At a very high level, multiple components manipulate and store the same entity. When one component modifies the entity the others must be notified of this change and ensure a consistent state of the entity across the whole system. 

In real applications, to address the complexities around keeping the entity consistent in a distributed system, the business and system architects impose constraints that simplify the patterns of interaction and lead to a robust solution. The rest of these article contain typical approaches to Data Consistency. Depending on the type of entity there could be design variations but they all share the approach. To make this document more concrete, I will use the Customer entity as an example.

One solution is to centralize all the Customer changes in one component, the “Customer Master,” and all the other component keep read-only copies of the entity.  The master has a data-store capability (Customer Repository) and an editor of some form,  hosted either on the same server or as a remote client. Multiple users can modify the entity in the editor(s) but there is only one Customer Repository instance. 
Changes made to the Customer in the Master are sent asynchronously to a “Sync Engine” which analyzes the input and updates the Components that contain Customer copies. In the most general solution on any Customer change, the entire Customer record is sent to the Sync engine which passes this information to all the other systems which contain Customer copies. In reality that is not the case, the Sync Engine filters and transforms the Customer record before sending to the target components. This reduces the traffic in the system (filtering) and reduces the development effort for the target systems when the Customer schema changes.
Dispatch is an object inside the Sync Engine that receives the message, validates the syntax, filters messages for each target systems, logs the activities in the Dashboard. A typical Dispatch implementation uses a Business Rules Engine (e.g. Drools, iLog) running inside the Sync Engine.

Version Control

As multiple components (the editor and target components) have different copies of the Customer at different times a versioning model for the Customer is required. For the purpose of this document we are going to use the LastModified field of the Customer to keep the version. Other options would be an revision number managed in the Customer Master Repository.
In the interactions between Customer Editor(s) and the Customer MR, the versioning is done by a combination of optimistic locking and merging. Assume the following scenario:
  1. User Alice retrieves Customer A with timestamp T1 and starts making modification: Customer A(T1)
  2. User Bob retrieves Customer A with timestamp T1 and starts making modifications: Customer A (T1)
  3. User Alice saves the changes, the timestamp is now T2. Customer A(T2)
  4. User Bob tries to save but it fails because of differences in the timestamps. It then updates the local copy by merging his changes with the changes made in the repository between T2 and T1. Merge(Customer A (Local), Customer A (T2), Customer A (T1))
Another aspect of versioning is with respect to the target systems. Components B, C and others have their version of the Customer profile. Ideally is the same as the Customer MR but in at least two scenarios is different. One is when a new change is in flight, another is when the synchronization process failed.
For the “in flight” changes the scenario is:
  1. Customer MR, Component B and Component C are in sync, all having the version Customer A (T1)
  2. Alice makes a change to Customer A and it is saved in Customer MR: Customer A(T2)
  3. Customer MR produces an event and sends the change to the Sync Engine. It can be in several ways:
    • Full image of Customer A (T2)
    • Changes made to Customer A: Customer A(T2) – Customer A (T1)
    • Full image and a summary of changes: Customer A(T2), Customer A(T1) – Customer A(T2)
  4. Sync Engine gets the change and applying business rule propagates the changes to Component B, C, … .
  5. Component B (or C, …) receives the change and applies it to the copy. Depending on how the change is passed in the Sync engine, applying the change could mean: replacing all the Customer fields with the new version or updating only the fields that changed.

Another scenario in which the copy can be out of sync is when there were errors in propagating the change between Customer MR and the copy.
  1. Start with the Customer MR and all the copies being at the same version T1
  2. Alice make a change in Customer MR, Customer A (T2), which is propagated to the other components. Component B is updated to the new version but Component C is not, it remains at T1
  3. Bob makes a change to Customer A and saves it in Customer MR (Customer A (T3)). The changes are sent to Component B and C, B is at version T2, C is at version T1. The procedure to update the Customer will be different.

The two scenarios described above show the compromises that need to be made in the design:
  • it is more efficient to send only the changes to the Sync Engine. But, this is not always possible because:
    • some of the components cannot do update at a field level, they need to have at least some of the Customer information that is not changing
    • if we only send the change and one of the changes is lost (or error) the system does not recover naturally from this error. And there are limits on how the system can get back in sync. We need a special way to send the Customer profile and indicate that this is a ForceSync that should ignore previous changes
  • if we send the full image every time,
    • both the traffic and the number of updates is large unless each component implements a form of smart-update
    • the dispatch engine may not be capable of routing based on change
  • if we send the full image and the summary of changes,
    • the traffic is large
    • the dispatch engine may decide on when to send the full copy and when to send the change only
    • the issues of ForceSync still remains for the components that accept only changes
Based on this review:
  • all components should accept a ForceSync message which takes the full image of the Customer and updates its internal. This is important to achieve “data resiliency” otherwise the components which do not accept ForceSync will never get up-to-date
  • component should keep the timestamp of the record received. This information can be used to easily bring data consistent – through batch jobs that compare revisions.
  • Customer MR must be able to send ForceSync messages to one or multiple components. This will create the infrastructure to achieve eventually-consistent goal
  • The change details need to be available only at a high level structure. Inside the structure is better to have full information about the Customer (e.g. full address)
  • In a change representation, we need to make a distinction between an element remaining the same (no change) and an element which is deleted. The general rule is that if the element is not present in the request it did not change, it should be present and have the Action set to Delete in order to indicate a change.

Popular posts from this blog

View - A Functor for Web App Design

This blog is about practical applications of Category Theory to the development of Java + Spring applications. I am looking at a design approach to simplify the development of web applications. Traditionally, this kind of back-office application is based on the Web 1.0 technology stack, using Spring Boot and Thymeleaf. My approach is to keep using Spring Boot but replace the generation of HTML with J2HTML and higher-order views. From a Category Theory point of view, we can look at web applications as mappings from the Category of Business Entities and the Category of UI Widgets. If we go one step further, both business entities and UI widgets are mapped to Java classes. Thus, we can view a web application (or a part of it) as an endofunctor in the Category of Java Classes. We define the View-functor as follows: domain(V) - Java classes representing business entities - e.g., Invoice, User - and, codomain(V) - Java functions that render the business entity as a DomContent object (DomCont

Reading J2HTML

J2HTML (j2html) is a Java library used to generate HTML I have been using it to create Web 1.0 applications in Java. Web 1.0 is server-side rendering pages with minimal Javascript. As I got deeper into using the library I started to read the actual source code of this library with an eye on following Java best practices and to my pleasant surprise, this code follows many of them. I am going to show here some examples of using interesting Java features, beyond the basics. 1. Functional Interface People are aware that Java supports some form of Functional Programming, and here is an example of using it: @FunctionalInterface public interface Indenter {     String indent( int level , String text ); } public static Indenter indenter = ( level , text ) -> String. join ( "" , Collections. nCopies ( level , FOUR_SPACES )) + text ; Things that I noticed: String has a method called join. I used before StringUtils.join, but now that is in the standard library I don’t need to us

Performance Testing a New CRM

Performance testing  is challenging, frustrating, often underestimated typically getting attention only after an incident. How did we the performance test and what did we learn during the development and implementation of  web-services for a new CRM system?