Warning: Creating default object from empty value in /home/johnnz/public_html/wp-content/themes/simplicity/functions/admin-hooks.php on line 160

Name That Place

There is one aspect of Data Management and Data Quality that causes the greatest proliferation of duplicates across the globe, is the confusion that exists regarding the Unique Identifiers (UIDs) of data entities.

I previously addressed UIDs in the posts such as One Version of the Truth. and The Power of QUACKs and UIDs

However, last week while reading a post from Henrik Sorenson called Typos in the Cloud, I was prompted to write some more because the post touched on one of the major areas where Unique Identifiers are far from apparent and confound both Data Quality Practitioners and ordinary humans alike, namely place names.

Absolute Truths

As human beings, although we mostly communicate in very loose terms, we expect and believe that some things in life (and the universe) are absolutes. Two such things that we believe to be absolute are 1) Time and 2) Place Names.

Sadly, neither of them are.  The reality is that time is a human invention, the vagaries of which can be experienced to the full at the International Dateline, where you can stand on one side of the Line and look into tomorrow or on the other side and look into yesterday!

Surely place names are unambiguous and constant? New York is New York, right? Yes,  it’s New York, but it’s also New York, New York and NYC, NY.  You can also call it The Big Apple, The Big Easy, The City That Never Sleeps and, depending on the date in history, New Amsterdam!

And that’s just in English. In Spanish and Portuguese you would also get Nueva York, Nova Iorque and lots, lots more.

So What Do You Call New York?

The question might seem ridiculous but it highlights the essence of the problem. There is

What Do You Call New This Place?

no name for New York! New York a name! More precisely it is a name that can be applied to something.

So, the real question is, ‘What name, or names, can be applied to the metropolitan area centred at 40° 42′ 51″ N / 74° 0′ 23″ W?’

Before you can answer this question correctly you need to ask further questions like, “In what language? Do you mean a formal or informal name? What date in history?”

This makes it quite clear that the unique identifiers of places are, contrary to common perceptions, definitely NOT place names. These names are just devices for referring to the place in formal, informal or even poetic terms.

So, if you have a whole lot of records in your database referring to the metropolitan area centred at 40° 42′ 51″ N / 74° 0′ 23″ W with different values in the Name field, do you have duplication?

Not necessarily. What you are more likely to have is a data normalisation error; a breach of First Normal Form! This a fundamental design flaw in many databases. It most commonly occurs is in databases that have been built without first having built a Logical Data Model (LDM).

Logical Data Model for Place Name Structure

So, how would a LDM help database designers avoid these fundamental flaws? Because it enables them to see what the data elements and structure ought to be in order to support the Business Functions of enterprise.  The above diagram shows how a location can have many names or references, based on date, language, source, etc.

The production of a logical LDM is an essential step in constructing any quality database. Just like an architects plan is an essential step in constructing a quality building.

The fact is, if your databases have not been modelled on a fully normalised LDM then there is no way you that you can achieve Data Quality!

Oh, incidentally, New York is also the name applied to a US warship, several hotels and, among many other things, a nineteenth steamship.

If you liked this post then please Tweet It.

4 Responses to “Name That Place”

  1. Todd Everett April 15, 2011 3:41 pm #

    Great post! It is refreshing to find a practitioner blogging on logical data models and normalization. Most blogs I follow only address implementation issues for a given dbms platform. Looking forward to reading more of your posts in the future.

    • john April 16, 2011 6:13 am #

      Thanks for your kind comment, Todd. Look forward to staying in touch.

      John

  2. Daragh O Brien April 15, 2011 8:13 am #

    John

    Excellent post. An additional element to this conundrum arises in the context of Master Data for addresses and the divergences that can arise between what different potential master sources (in Ireland that would be the Post Office, the Ordnance Survey, and The Placenames Commission) might have and, very importantly, what the locals who actually live in the place might have.

    A few years ago all the road signs in Wexford (and the rest of Ireland) were replaced with updated standardised spellings of place names from the Ordnance Survey master data set. In Wexford (and elsewhere) there was much consternation as the new signs did not reflect the historic spelling of the places. For example Murrintown (Lat. 52.2871361 Long. -6.5236746) is now spelled Murntown on all the signage, much to the confusion of locals and tourists alike. Of course, a trawl through the historical records will probably show that this is not the first time the format and spelling of the placename has changed.

    Of course, if we throw in the fact that Baile Mhurain is a place that shares the same coordinates as Murrintown we have a perplexing problem of inconsistent data. At least at first glance. As (at least nominally) a bi-lingual country, Baile Mhurain is the Irish language version of Murrintown (“The town of Murran”).

    Of course, there are examples where the Irish language version link to English is not as obvious and multiple variants can exist, all of them equally accurate and valid. What is the actual place name for Lat. 52.1433505 Long -10.2686507?

    • john April 16, 2011 6:12 am #

      Thanks, Daragh

      This is one of the reasons why Source is and essential entity to have in the Logical Data Model. This enable all of the names from all of the sources to to be known, e.g. the Post Office, the Ordnance Survey, The Placenames Commission, etc.

      This also enables all divergences to be identified and rectified, if an error.

      Thanks for the feedback.

      John

Leave a Reply