Warning: Creating default object from empty value in /home/johnnz/public_html/wp-content/themes/simplicity/functions/admin-hooks.php on line 160

What Are You doing to Protect Your Big Data from 5NF Syndrome?

Big Data x 5NF = Really Big Data Errors

It is true that Big Data may offer huge opportunities for enterprises to gain hitherto unimagined insights. Alas, it also true that is has the potential to tell enterprises really big lies!

This has nothing to do with the quality of the Big Data. These ‘lies’ can arise from 100% correct data.  They arise due to data structure anomaly.

This huge risk is all down to the fact that the structures of data that you will get from external sources are highly likely to break Fifth Normal Form (5NF).  In truth, it would be almost a fluke if such data did not!  I call this propensity for merged Big Data sets to lie 5NF Syndrome!

What is Fifth Normal Form?

A good definition for 5NF (that is understandable) is hard to find. The best way to explain it is by an example from real life.

THIS ARTICLE HAS BEEN UPDATED AND MOVED TO http://jo-international.com/what-are-you-doing-to-protect-your-big-data-from-5nf-syndrome/

Tags:

2 Responses to “What Are You doing to Protect Your Big Data from 5NF Syndrome?”

  1. Greg Michelson November 8, 2013 5:50 pm #

    Hi John
    love your ideas

    There is only one way I know of to “join” the three tables that you talk about. I have done it using only three sets of key values with 5 to 10 occurencies each, so that the final result of 125 rows to 500 rows can be vlsully inspected. In your simple example there are three sets of two unique values each.

    step 1 – determine the sets of unique values, and store as key tables 1, 2 and 3
    step 2 – generate the cross product from the three key tables, as the potential set of valid results and mark the status of each row as “invalid”. You now have 8 “invald” rows in your example
    step 3 – from each of the three data tables in turn (one at a time), “match” a corrresponding data row with the cross-product row and change the status to “valid”. (that query is left as an exercise for the reader)
    step 4 – select the “valid” rows from the cross-product table – or delete the “invalid” ones

    keep up the good work

    regards

    Greg M

    • John Owens November 8, 2013 11:19 pm #

      Hi Greg

      It is true that there are many ways in which you could join the tables. However, in order to validate the results, you would need to cross check it against the original data table.

      Of course, if you had access to this you would not need to join the three tables in the first place!

      Thanks for you feedback.

      Kind regards
      John

Leave a Reply