Warning: Creating default object from empty value in /home/johnnz/public_html/wp-content/themes/simplicity/functions/admin-hooks.php on line 160

What’s the Matter with ‘Meta’?

Misuse is the matter with ‘Meta’.

‘Meta Data’ is a term that is beginning to appear and be misused in blogs and articles on Data Quality.

Although Meta Data does have a role to play in Data Quality, it is not the one described in some Data Quality articles.

In this post I will define what Meta Data is, the role that it plays in Data Quality and, more importantly, what it does not do!

Definition

The simplest definition for ‘Meta Data’ is that it is ‘data about data’.

To be more precise, Meta Data describes the structure and format (but not the content) of the Data Entities of an enterprise.

Meta Data does not describe occurrences of the Data Entities themselves.  This is done by the Attributes of the Entities.

What exactly does this mean?

Examples

Two examples will help to demonstrate the essential difference between Meta Data and Entity Attributes.

Meta Data Example

If you were going to hold data about a Data Entity called ‘Product’ in a database table, what fields would you need, what type of data would each field need to hold, what size would each field be and would the field be mandatory or optional?

Meta Data is the means by which all of this can be clearly defined. The following table displays the Meta Data that describes a database table suitable to hold the data for sportswear products.

Example of a Meta Data table

Example of a Meta Data Table

The data in the above table is true Meta Data, in that it is data that describes data structure and format.

Using the above Meta Data definition, the developers of a system could now create a table in a database that will hold all of the required data on Products for this enterprise.

Entity Attribute Example

If we now look at an example of a Product table in a database we will begin to see the essential difference between these two different data dimensions.

Example Products Table

Example Products Table

The first thing to notice is that each row in the Meta Data table has become a column in the Product table. This is not surprising as that was, in effect, what the Meta Data description is – the structure needed to hold the Product data.

The next thing to notice is that the Meta Data values have nothing to do with the data values that describe the Product.  The values that describe the Product are the Attributes Values. This shows that Meta Data is in an entirely different dimension to Attribute data.

Some writers get confused and mistake Attribute Values like Product Size for Meta Data.  Although these values were referred to in the Meta Data definition of the Product table, these values are not Meta Data.  They are in fact Domain Data from the Sportswear Letter Size Domain. The Meta Data tells us that these values are Alphanumeric, can be up to three characters in length and cannot be NULL (they are mandatory). The Sportswear Letter Size Domain tells us that the allowable values are S, M, L, XL, XXL, 3X and 4X.  The Attribute Value tells us what the actual value is for this product.

From all of the above we can see that it is Attributes and NOT Meta Data that describe, classify, qualify or quantify Data Entities.

Conclusion

Meta Data has a role to play in Data Quality. However, this is very much a ‘behind the scenes’ role. No user of any system should ever be aware of the Meta Data relating to the system or its data.  If they ever do become aware of them, then something has gone seriously wrong and Data Quality has failed.

Data Entities (all of those items of significance to the enterprise) are described, classified, qualified and quantified by Attributes of Entities – never by Meta Data.

Meta Data would not describe a Product such as a Football shirt, it would describe the structure and format of a table capable of holding data that will describe the shirt!!

Further Definitions

Here are some definitions that you might find useful in your Data Quality activities.

Data Entity

A Data Entity, normally called an Entity, is anything (whether real or abstract) of significance to an enterprise about which it needs to know and hold data.

Typical entities for an enterprise might be ‘Party, ‘Product’, ‘Sales Transaction’, etc.

All data relating to Data Entities is always created, used and transformed by the Business Functions of the Enterprise.

Entity Attribute

Attributes describe, classify, qualify or quantify Data Entities.  All attributes have names and Formats, for example:

  • Age: Number (2)
  • Description: Character (75)
  • Name: Character (25)
  • Weight: Number (4,2)

Meta Data describes the format of Attributes.

Domain Data

A Data Domain is a set of values that are used to validate or constrain the values of the Attributes of Data Entities.

For example, a Domain could be ‘Countries’.  This set of Domain Data would hold a list of all the valid countries that could be entered into the Attribute of ‘Country’ in a Data Entity of, say, Party.

Domain Data is also referred to as Reference Data in some enterprises.

Meta Data

Meta Data describes the structure and format (but not the content) of the Data Entities of an enterprise.

Meta Data does not describe occurrences of the Data Entities themselves.  This is done by the Attributes of the Entities.

Occurrence of an Entity

This refers to an Occurrence of a Data Entity.  “John Smith” would be an occurrence of the Data Entity ‘Employee’; “England” would be an occurrence of the Data Entity ‘Country’.

An occurrence of a Data Entity is also known as an Instance of a Data Entity.

Unique Identifier (UID)

Those elements that make one Occurrence of a Data Entity in an enterprise unique from every other occurrence of that Entity. These elements may be one or more Attributes, one or more Relationships or a combination of attributes and relationships.

Contrary to what many people believe and practice, the UID of an Entity will never be a code!

Share The Love!

If you enjoyed this post and think that it would be of value to a friend or colleague, then please feel free to share it by clicking on one of the Social Media buttons below.

You can also follow me on Twitter by clicking on the Twitter icon at the bottom right of this screen. Thank you.

Tags:

13 Responses to “What’s the Matter with ‘Meta’?”

  1. Garry Ure August 16, 2013 9:16 am #

    Hi John,

    Great article as usual and I only disagree with one point!

    “No user of any system should ever be aware of the Meta Data relating to the system or its data.”

    I believe that users of a system should be aware of metadata and in fact you give the reason yourself in the example metadata you use. It includes a description which is in fact a business description of what data is to be held in that field. It is important that the user (in conceptual terms) is aware of this and has input into it in the first place. This helps build a bridge between the technical and the business view of data. I’ve often successfully combined the “Data Dictionary” with the “Business Glossary” to help facilitate discussions between IT and the business.

    • John Owens August 16, 2013 10:41 am #

      Hi Garry

      Thank you for your input and kind comments.

      You are quite right. The user of the system should be aware of the metadata.

      However, I would say that this should only ever be in an unconscious manner. The screen prompts, the field names that are displayed, should be such that they make the screen form intuitive and easy to use. There should never be any reason why the user needs to know that these constitute metadata. They ought only be aware of the advantages that good metadata design brings to the system.

      Again, thank you for your input and feedback.

      Regards
      John

  2. Fran Alexander August 3, 2013 3:21 pm #

    Hi John
    I found your article fascinating because to me it illustrates the huge difference in perspective that exists between ” computer people” and “content people”.
    Your definition of metadata is perfect for your domain, but it is only ever going to be a subset of the meaning of metadata. In publishing, if you tried to tell people that the author’s name is not metadata about a book, they would be astonished.
    From Sears to the Library of Congress, cataloguers have been compiling metadata since the 19th century, granted they didn’t call it that at the time, but it is pretty normal to consider such information as metadata now. The five types of metadata (administrative, technical, descriptive, preservation, and use) are textbook stuff to professionals in many areas of the information industries.
    Personally, I think we need more precise terms rather than trying to persuade people not to use terms they have used all their careers. I have pioneered the use
    of the term parametadata to describe metadata about metadata, but perhaps we need even more terms for precision. This is not a surprise, as we work more with metadata we will need to specify exactly what sort of metadata we need at that moment. We used to have lots of words for spinning wheels and farmyard tools that have fallen into obscurity because people don’t work with such equipment any more, now it is time to define which metadata we mean.

    Very thought provoking – thanks.

    Fran

    • John Owens August 4, 2013 7:06 pm #

      Hi Fran

      Thanks for this contribution.

      The fact is that publishing, and other such professions, have got it wrong. The correct term for what they are recording and storing is simply ‘data’. Why they felt compelled to prefix this with ‘meta’ is a mystery, especially in light of the fact that this term already had a very specific meaning in the world of data modeling.

      This widespread misuse presents a big problem to Data Quality, who practitioners who, if they accept this misuse of a term that is key to achieving Quality, have then lost it for use it in its proper context.

      ‘Metadata’ is being misused as a lazy buzzword that sounds technical and suggests a precise meaning. In reality, its current use is both incorrect and quite meaningless. The types of data classifications that its users are incorrectly referring to as ‘metadata’ have already got more precise terms, such as Attribute Data, Domain Data, etc.

      The data quality world is left with the choice of speaking up or trying to invent a term to replace the hijacked ‘meta’ (such as your ‘parameta’). This widespread misuse has brought about a situation that is, sadly, all too similar the story of the Emperor’s New Clothes. With so many people (ab)using the term Meta Data, who is going to have to courage to say, ‘Enough! The Emperor is naked and it is not a pretty sight!’?

      Regards
      John

  3. Wim Ovaa July 31, 2013 7:30 am #

    John,

    You say: Meta Data merely defines the ‘containers’ for Instance Data.

    Are simple and complex Constraints then also Meta Data in your opinion?

    • John Owens July 31, 2013 11:20 pm #

      Hi Wim

      That is a very good question that you ask.

      Both simple and complex constraints on the values of an attribute represent business rules. The question is, how do you best model these business rules? The options are:

      Plain Text: Simply defining what the constraints or allowable values are and the circumstances in which they should be applied.

      Function Logic: The constraints or allowable values are defined as part of the execution logic for those Business Functions that create and transforms the data.

      Data Structure: The constraints or allowable values are modelled as data structures.

      In order to avoid hard coding, Function Logic might need to be combined with Data Structure.

      Constraints are most effectively modelled by constraining attribute values to those of Data Domains. These could be simple domains, such as ‘Yes/No’ or more extensive domains such as a list of valid countries or states.

      More complex constraints can be modeled by combining several domains, linked through a data structure.

      In all circumstances, it is Meta Data that defines which domains should be used and how. However, Meta Data would not define the values of the instances of each domain. These values have a simpler name, they are called ‘data’.

      The most common manifestation of Meta Data is the Logical Data Model.

      Thanks for the question. I hope that this helps.

      Regards
      John

  4. Rob Karel July 29, 2013 4:26 pm #

    Hi John
    While I appreciate your desire to have a single, fundamental definition for metadata, I can’t agree that your definition is the right one – at least not for everyone. Like data quality itself, definitions are contextual to the individual, role and use case. Your definition is certainly completely valid for data modelers and DBAs, for example, but absolutely does not translate into other use cases across your enterprise architecture. You point out that your definition is a fundamental one that has been around for decades – but all I can agree with is that confusion and disagreement on what metadata is has been around for decades. The “Data about data” definition has never been useful – everyone agrees with it because it really doesn’t mean anything. When I was with Forrester I recommended expanding the definition:
    “Metadata is information that describes or provides context for data, content, business processes, services, business rules, and policies that support an organization’s information systems.” It may not be the final say, but I at least proposed it held a grander vision.

    Those that support document management, content management, eDiscovery and collaboration platforms don’t care about “data entities” per se, but live and breathe by the classification and taxonomies they build to categorize, navigate and search for unstructured content. This is metadata, no way around it.

    We can debate semantics (which is also metadata) all day long, but in my opinion the bottom line is simply this: what definition of metadata is actually going to be the most useful in driving business change and prioritization and investment in data infrastructure from senior executives? It’s the more holistic, business inclusive definition that will get executive support rather than an IT-centric, philosophically pure but unfunded narrow definition.

    My 2 cents. Thanks for starting the discussion!

    • John Owens July 29, 2013 7:26 pm #

      Hi Rob

      Thanks for that great input.

      You say, “Those that support document ….. but live and breathe by the classification and taxonomies they build to categorize, navigate and search for unstructured content. This is metadata, no way around it.” I agree, this is data that describes the format and structure of data, which is Meta Data.

      What is happening is that Meta Data is now being used, primarily because it makes a good Buzz Word, to mean data describing anything, which it very certainly is not. People are taking what you hoped would be a more inclusive definition, misused it (probably because they never knew what it meant in the first place) and are now turning it into a term that will soon become meaningless.

      It does alarm me when you say, “Those that support document management, content management, eDiscovery and collaboration platforms don’t care about “data entities” per se,..” They are very definitely in danger of collecting data for the sake of data. Data has no intrinsic value. It is only of value to an enterprise when it supports the execution of the Business Functions of the enterprise, i.e. when it describes, classifies, qualifies or quantifies Data Entities. So they very definitely should care about the Data Entities, because these will be a majaor driver in choosing those classification and taxonomies that will be of true value to the enterprise.

      Regards
      John

  5. Putcha V. Narasimham July 24, 2013 3:48 pm #

    John Owens:

    This is very informative. Is it your interpretation of “meta data” or is it defined this way somewhere else?

    In this context what definitions of “data” and “information” do you use and recommend. I go by Knuth’s definitions. Many professionals and publications use much inferior misleading definitions.

    I have my notes on this. Please take a look and give your views.

    http://www.slideshare.net/putchavn/knuths-definitions-of-data-and-information-04-mar13

    http://www.slideshare.net/putchavn/knuths-definitions-of-data-and-information-proposed-definition-of-knowledge-03-mar13

    Thanks

    • John Owens July 24, 2013 11:38 pm #

      Hi Putcha

      Thank you for the comment.

      Meta Data is a term that has been about for decades and has always (until recently) been used to mean ‘data about data’. This definition can be found in several reputed sources on the internet.

      Thanks for sharing your slides.

      As per your slides Knuth sees data as representing a ‘fact’ or a ‘concept’. To me, Data may well represent a fact but it is at a lower level than a concept. Data is a value represented by symbols in a known, structured format, e.g. numbers, characters, etc, that can be read and interpreted by either a human or a computer.

      Until data is put into a context it has value but no meaning. Likewise, without context Data cannot represent a concept.

      The definition that I like for ‘Information’, because it is simple and powerful, is that it is ‘data in a context’. For example, K3P3 is an element of data, but what does it mean? Is it a cipher in a secret code, a European car registration or instructions in a knitting pattern (e.g. Knit 3, Purl 3)? Putting data into a context turns it into ‘Information’.

      To me ‘Knowledge’ is all about ‘Knowing’. When Information as been assimilated and understood it becomes ‘Knowledge’.

      Regards
      John

  6. Richard Ordowich July 24, 2013 12:05 pm #

    Once again a “provocative assertion”

    Since metadata is “data about data”, it suffers from all the ailments of data. So we can begin to examine the term metadata as we would examine any data term and identify its potential for misuse.

    1. Definition: there is no authoritative source for the definition of metadata. Everyone makes up their own definition. A common problem with data as well.
    2. The definitions do not use terms from a controlled vocabulary so the definition itself is ill conceived. How can you validate the definition?
    3. The definition is created from a single point of view, typically a technical one. As a result it does not reflect the varied interpretations that may be used when examining it from other viewpoints such as the business viewpoint. IS this the definition that the business would use? Is this the definition finance would use? Are the all identical?
    4. The business rules governing the use of the term metadata are not articulated. What role the metadata plays is lacking and once again defined by each context of use. Will the metadata be used do data model design, transactions, data warehouse, within reports and analytics?
    5. Most examples of metadata do not describe the semantics or taxonomy (such as lists). When a data filed is reverse engineered using semantics, it the various meanings and interpretations of the data becomes evident.
    6. Finally as with data, who “owns” the metadata? Who is responsible to create and maintain it? It is “data” after all.

    I agree that metadata is misused but suggest that it is critical to data and data quality. Imprecise metadata leads to imprecise data. Many data issues are traceable back to the fact that the metadata suffered from the same ailments as the data. A lack of standards (naming, definition, semantics), and a lack of discipline applying these standards.

    I think metadata is more important than instance data since it essentially governs the value domain for instance data. Metadata is not behind the scenes, it sets the scene for data instances.

    • John Owens July 25, 2013 12:02 am #

      Hi Richard

      Thank you for your comments.

      The post is definitely not a ‘provocative assertion’. It is a fundamental definition and explanation of Meta Data.

      All of the problems that you list are all the result of a lack of knowledge of the fundamentals of Quality by those involved in data management and data quality – even down to the lack of knowledge and misuse of basic terms such as ‘Meta Data’. Data Quality is essentially simple. Quality comes from removing complexity. An essential step in this is having a complete set of clear, concise, unambiguous definitions for all elements of data and of the Functions that create and transform it.

      Meta Data can never ‘outrank’ Instance Data as it is the requirements of Instance Data that defines Meta Data. Meta Data merely defines the ‘containers’ for Instance Data.

      You can have a Meta Data that is ‘incorrect’, e.g. an attribute defined as as a number rather than an integer, yet it would still be abe to hold the correct value of the Instance Data. However, if the value of the Instance Data is incorrect, e.g. 100 as opposed to 1000, then having the Meta Data defined correctly as an integer would in no way make the quality of the data any better for the enterprise.

      Regards
      John

Trackbacks/Pingbacks

  1. Information Development » Blog Archive » Metadata and the Baker/baker Paradox - July 30, 2013

    [...] his recent blog post What’s the Matter with ‘Meta’?, John Owens lamented the misuse of the term metadata — the metadata about metadata — when [...]

Leave a Reply