Warning: Creating default object from empty value in /home/johnnz/public_html/wp-content/themes/simplicity/functions/admin-hooks.php on line 160

More on Meta Data Muddle

In his recent entertaining blog post Metadata, So Mom Can Understand, Rob Carel used an example of a TV episode to explain the concept of metadata to the lay man or woman.

This example is very useful as it shows how Meta Data and Entity Data can be easily confused.

Imagine the table below is a listing in a TV program guide

Information on a TV episode. Meta Data or not?

The current trend in articles and the media is to say that the above table is the Meta Data about the episode. This statement is wrong for several reasons.

The table contains several dimensions of data that we need to be able to identify and separate. So let’s look at these dimensions one by one.

The first thing to do is to identify the ‘Object of Significance’; that item about which we need to know and hold information. Once this has been identified and isolated, everything else is easy to identify and understand.

In our example the ‘Object of Significance’ is the TV episode. In Data Modeling terms an ‘Object of Significance’ is called a Data Entity.

Items of data that describe, classify, qualify or quantify a Data Entity are called Attributes of the Entity.

Items of data that describe the Attributes of Entities are called Meta Data.

So now let’s revisit the table describing the TV episode and see which data falls into which classification.

Table with data on TV episode listing all of the data elements - only one of which is Meta Data

The same structures can be found on a computer input screens.

The item of significance about which you are entering data is the Data Entity, for example, a Person, Product, Sale, etc. The values that you type into them (or choose from dropdown lists) are Attribute values of the Data Entity. The names of the input fields are the Meta Data.

Summary

The following are short and simple, yet powerful, definitions that I  keep at my fingertips to help me differentiate the different data dimensions.

  • Data Entities are items (physical or abstract) of significance to an enterprise about which information needs to be known and held.
  • Entity Attributes describe, classify, qualify or quantify Data Entities.
  • Meta Data describes (name, format, size, etc.) Entity Attributes.

They may be short and simple, but they are the foundation for Data Quality in enterprises of any size in any industry sector.

If you enjoyed reading this post then please feel free to share it with a colleague ir friend by clicking on one of the Social Media buttons below. Thank you.

8 Responses to “More on Meta Data Muddle”

  1. John O'Gorman August 30, 2013 8:24 pm #

    So, to summarize:

    A string used as a Label on a Column is an Instance of Metadata
    A string used as a Value in Field is an Instance of an Attribute
    A string used as a Label of an Entity is an Instance
    A string used as an Instance of an Entity is an Object of Significance

    Got it!

    • John Owens August 31, 2013 12:04 am #

      Hi John

      A few slight amendments.

      A string used as a Label on a Column is an example of Metadata
      A string used as a Value in Field represents the value of an Attribute and is called Data.
      A string used as a Label (Name) of an Entity is an example of Metadata
      A string used as an Instance the name of an Entity (which is an Object of Significance) is Metadata

      I hope that this helps. I am currently working on producing some other materials that I think will make it all much clearer for analysts working in this area.

      Thanks for your input.

      Regards
      John

      • John O'Gorman August 31, 2013 6:05 pm #

        Well, at least I’ve got you thinking about metadata (and all the other abstractions) as functions of a string. :D

        In data modelling, how we put the components *to use* is dictated by the connection we are trying to optimize between something in the real world and something in the digital domain. The problem with actually integrating all of these disparate functional collections is that there are virtually no rules – and even fewer foundational concepts – for doing so.

        It’s ok to split the digital world into four abstractions – like Master, Meta~, Transactional and Domain Data but put any two modellers in a room and the concept of ‘Customer’ can comfortably occupy a slot in any of the four.

        I respect what you are trying to do, but if the world at large wants to create a class of data called ‘metadata’ to help describe the role that one class plays in relation to another, it’s not a bad thing.

        • John Owens August 31, 2013 8:01 pm #

          Hi John

          You say, “there are virtually no rules – and even fewer foundational concepts – for doing so”. That is not so. There are already lots of clear, concise rules and foundational concepts that have been in existence for a very long time. The problem is not the lack of rules and definitions, it is that far too many current practitioners have either neither learnt these or, for some unfathomable reason, choose to ignore them.

          All of the data structure rules that I write about in my books and articles are not my inventions. They have been known about and used by good by data modelers for decades. These are Data Modeling 101.

          The questions that perplex are why are so few of todays practitioners aware of them? Why do they refuse to learn the basic tools of their trade? Why do they choose to stumble about in the fog of confusion that ignoring these rules brings about?

          I will go on trying to shine a light.

          Again, thanks for the input.

          Regards
          John

          • John O'Gorman September 1, 2013 5:27 pm
            #

            I’d like to go back to the ‘slight adjustments’ as proof of my argument that there are no foundational rules for how data models are constructed.

            Your four sets (types / classes / relations) of data in your compass article make a distinction based on function. Since meta data is a class, strings that satisfy the membership requirements of that class are, by definition, an instance of that class. You can say ‘example’ but that does not change anything.

            Likewise ‘Attribute’ is a class of data used to help uniquely identify members of the set identified in the model’s schema. If ‘Customer Address’ is the name (meta data) of the set of allowable values (attributes) required, then ’123 Main Street’ is a member (instance) of that particular set. You can use the same approach to demonstrate that the apparent ‘separateness’ of the concepts may be useful but it’s not cast in stone.

            I don’t disagree with your assertion that Data Modelling 101 is well established as a discipline, but the lack of interoperability between individual designs is further evidence that the creative side of the craft is far more prevalent than the engineering side.

          • John Owens September 2, 2013 8:46 pm
            #

            Hi John

            There is a complete set of simple, powerful and unambiguous data rules that already exists and that, if used by data partitioners in their daily work, would exponentially increase the quality of their work and of the data in their enterprises. These are not mere ‘creative’ rules. They are definitely engineering rules and they have been used for decades globally to build high quality robust databases in enterprises of all sizes, in all sectors. These rules work for people who know and work them.

            Confusion does arise when the vocabulary of these rules is not strictly adhered to. For example, instance and example are not the same thing. Allowable values are not attributes. Though attribute values describe or classify an instance of an entity, they might not uniquely identify it. Entities and attributes do not occur in a schema (physical), whereas tables and columns do.

            The lack of interoperability between individual designs of systems stems, not from the absence of data engineering rules, but from the fact that far too few people know and use them at the logical level before building what are essentially structurally flawed systems.

            Regards
            John

  2. James Cotton August 5, 2013 2:03 pm #

    Richard,

    In the Master Data Management space the term ‘instance’ is used to identify a single section of data from a source system. Many instances could be at the basis of the data visualized above. Using the term instance for something else as well might only help confuse poor John’s mom!

  3. Richard Ordowich July 30, 2013 7:49 pm #

    I prefer to use the term data instance to differentiate metadata and attribute.

    Why not just simplify things and just use basic terms. Forget the term metadata since it serves not purpose except to confuse.

    The first column contains: Labels
    The second column contains: Instances

    No wonder Mon’s so confused. As technologists we obfuscate everything in order to make it abstract which leads to us being obtuse.

Leave a Reply