Warning: Creating default object from empty value in /home/johnnz/public_html/wp-content/themes/simplicity/functions/admin-hooks.php on line 160

The Five Pillars of Preventative Data Quality

If any enterprise is serious about implementing Data Quality, then there are five fundamental actions that it must take in order to lay the foundations for this. These are critical actions as, without taking them, all other Data Quality activities will be in vain.

The reason for this is that these actions actually define what constitutes Data Quality for the enterprise. Without this definition, Data Quality cannot be achieved no matter how much time or resource is committed to it.

If an enterprise has not carried out these actions then, whatever else it might be doing, it is not practicing true, preventative Data Quality.

Easy as 1,2,3,4,5!

So, what are these five critical actions that form the Five Pillars of Data Quality? In summary they are:

  1. The Five Pillars of Data QualityModel the Business Functions
  2. Build a Logical Data Model
  3. Define Unique Identifiers
  4. Build a CRUD Matrix
  5. Define the Read and Update data criteria for Data Entities

If you are a Data Quality practitioner looking at this list and asking, “What has any of this to do with Data Quality”, you are not alone.

Most current mainstream Data Quality practices do not list any of these actions as core to Data Quality. This is the Achilles Heel of current mainstream thinking. It arises from the genesis of Data Quality in finding and correcting errors in existing data, as opposed to preventing errors arising in the first place.

This is perhaps the major reason why effective, preventative Data Quality still eludes most enterprises, even those with large Data Quality budgets, who would see themselves as totally committed to it.

So, let’s look at these five actions, see how they are critical for achieving true Data Quality in any enterprise.

Model the Business Functions

This activity, in spite of the fact that it is the cornerstone of Data Quality, will be totally alien to most Data Quality practitioners. What have Business Functions to do with Data Quality?Example Function Hierarchy Model

Business Functions are the core activities of every enterprise; they are the reason it exists. It is only through properly executing its

Business Functions that an enterprise can meet its aims and objectives and continue in existence.

Data has no intrinsic value! Data is only of value to an enterprise if it supports the execution of its Business Functions. Data Quality is all about making all data totally fit for this purpose.

When you model the Business Functions you get to know two things: 1) what it is that the enterprise ought to be doing and 2) all of the data elements that it requires in order do it.

It is the Business Functions that define the data requirements of every enterprise. Without knowing the Business Functions, you cannot know the enterprise data requirements, nor can you achieve Data Quality.

Build a Logical Data Model

Although it is the Business Functions that define the enterprise data requirements, it is the Logical Data Model that allows these to be fully understood and modelled in a totally unambiguous structure.Thumbnail image of Party Transaction Roles.

The Logical Data Model is an essential tool for every Data Quality practitioner, as it is the ‘DNA diagram’ for the enterprise data.

The Logical Data Model defines every Data Entity that the enterprise needs, plus the attributes for each Entity and the relationships between the Entities.

Define Unique Identifiers

Although the understanding of Unique Identifiers (UIDs) is key to achieving Data Quality, it is totally misunderstood by business managers, Data Quality Practitioners and database designers alike.

To really understand what the unique identifier for any Data Entity is, the question, “What is it, in relation to this enterprise, that makes one occurrence of <entity name>uniquely different from every other occurrence of <entity name >?” must be asked and answered.

It will come as a shock to many that the answer to this question will NEVER be a code! I explain why this is so in the post Unique Keys are the Primary Cause of Duplication in Databases.

The reason why UIDs are essential to Data Quality is that knowing them is, perhaps, the greatest single aid to avoiding duplication of data in enterprise databases. In fact, without knowing the genuine UIDs, it is impossible to avoid duplication.

Build a CRUD Matrix

The CRUD Matrix is another essential item in the toolbox of every good Data Quality practitioner. It is a very powerful way of Example CRUD Matrix

Contrary to what many people believe, it is not the Business Functions which create data that define the quality standards for it. These standards are actually defined by the Business Functions that read and update the data values. These Read and Update Functions from the CRUD Matrix are the ‘customers’ for the data and, therefore, define its quality requirements.

Document the R & U Data Requirements

The CRUD Matrix enables the Data Quality practitioner to identify those Business Functions that read and update data entities. Knowing this, they can then interview the people in the enterprise who carry out these Functions and establish what the data requirements are for each entity, attribute and relationship that they read or update.

When these requirements are known and documented, they can then be built into those systems and processes that implement the Business Function that creates the occurrences of data elements in question. These will also be shown on the CRUD Matrix.

This approach ensures that occurrences of entities can be created correctly first time, every time.

Summary

True Data Quality can only be achieved by preventing data errors being created in the first place. Data has no intrinsic value. It is only of value to an enterprise when it supports the information needs of its Business Functions (core activities).

• The Business Functions define all of the data and information needs of an enterprise. You cannot know these needs without having first modelled the Business Functions.

• The Logical Data Model is the ‘Wiring Diagram’ for enterprise data, showing all of its required elements and structure.

• Identifying Unique Identifiers (UIDs) for Data Entities is the only way in which duplication in databases can be prevented. UIDs are NEVER codes!

• The CRUD matrix is an essential tool for recognising the true ‘customers’ for data – the Read and Update Functions in the enterprise, as shown by the Matrix.

• The Create Functions from the CRUD Matrix can be built to exactly meet the data needs of the Read and Update ‘customers’.

These Five Pillars of Data Quality apply to, and will work in, enterprises of any size and in any industry sector.

Without implementing these Five Pillars, then no matter how much effort or money an enterprise puts into Data Quality, the results will always fall short of what they could and ought to be.

Share the Love!

If you enjoyed reading this post, then please feel free to share it with a colleague or friend who might alos enjoy it by clicking on one of the Social Media buttons below.

Tags:

3 Responses to “The Five Pillars of Preventative Data Quality”

  1. Vinod July 17, 2013 1:44 am #

    Both perspectives are relevant. I would do Physical Data Model
    for the sheer fact that we can show tangible results to data owners.

    A simple dashboard with 2 objective metrics will give DQ more
    credibility and relevance. Then comes policies and rules. All these
    relate more to Tables than entities.

    Moreover such disparate operational metrics will be
    potentially contributing to enterprise DQ metrics after a weighting
    is applied.

    Interesting challenge is when you have an industry model
    as your baseline LDM.

    Go Pragmatic DQ !

    • John Owens July 17, 2013 2:49 am #

      Hi Vinod

      Thanks for the input. Sadly, I have to disagree with you.

      Your suggested approach could seem seductively ‘pragmatic’ to management. However, the reality is that doing physical data modeling before you have put in place the Five Pillars is merely a means of giving a veneer of structural respectability to data that might be entirely wrong for the enterprise. This approach always leaves Data Quality exactly where it is in most enterprise at the moment – in a reality vacuum.

      Without first producing the Business Function Model, Data Quality activities in an enterprise will be all about, first creating, and then finding data errors. Data Quality practitioners have got to abandon this completely outmoded ‘Quality Control’ approach and move to Quality Assurance, i.e. getting it right first time, every time.

      You should only ever use an industry LDM after you have first checked it against the Business Functions of the enterprise in order to ensure that it fully supports them. If it does not, then you should avoid using it as it will bring the enterprise to its knees.

      The sad truth is that ‘Pragmatic’ DQ is, all to often, simply ‘Pretend’ DQ.

      For truly Preventative DQ (and who would want anything less?) the Five Pillars are the only way to go.

      Regards
      John

  2. Richard Ordowich July 15, 2013 2:33 pm #

    As usual a great article but requiring a few additional considerations. Data quality is not easy. It is time consuming, frustrating, and invasive. Data quality exposes many other quality inadequacies such as a lack of quality in business policies, business processes, training, hiring, rewards systems and organization design

    Before embarking on a data quality initiative it is best to define what quality needs to be improved. Start with some benchmarks and goals. Get some commitment (see RACI below).

    Modeling business “as-is” functions is a challenge when you include all the variances in the workflows that occur. There is no such thing as a standard transaction. Most transactions have some factor that requires human intervention and judgment. When these interventions and judgments are modeled, the model gets very complicated, very fast. Add the data objects to the models and they can get downright cosmic in proportions.

    Since most systems are already built I suggest going with the physical rather than logical data model. The physical model like the as-is process model is the way things are now, not the way they were intended to be. Many organizations change the physical model without ever referring to the logical model.

    Rather than just defining unique identifiers, as you are reverse engineering the physical model, extract the metadata as well. Even though the metadata such as names and definitions are probably poorly designed, they are what physically exists in the systems and can be used as a starting point to improve the metadata. Improving the quality of the metadata must be a pillar in any data quality initiative.

    CRUD is an excellent technique applied to data quality but before CRUD comes RACI. Identify those responsible, accountable, informed and consulted about the data first. CRUD is very tactical and difficult to achieve in a single step. Starting with RACI provides a foundation for CRUD.

    I think there are more than 5 pillars to data quality. Most organizations are built on a rather shaky foundation and perhaps pilings are a more appropriate form of construction for data quality. In addition build a project plan before embarking on data quality. A project plan for data quality will contain hundreds if not thousands of tasks when all is said as done. If your plan does not reflect this magnitude then you are not doing data quality.

Leave a Reply