Today my 5 cents on TOGAF* principle 11: Data is Shared and principle 12: Data is Accessible
What immediately comes to my mind is how people love to work in silos. Occasionally you’ll have folks say “This is my data and you can’t have it” which is extreme. But what happens much more often is that people build their own little data mart of information but supply absolutely no metadata. Metadata being a description of what each of the data fields collected means. This is not quite as bad as “you can’t have my data” but it actually gets pretty close and leads to misinformation. E.g. Let’s say you do business in the US and Western Europe. In one field called “revenue” you collect the revenue in local currency and in another field you collect the “currency”. If you give this data to another department without information they might just pull the “revenue” field and assume it's in $s. This is just a very simple example but there could be thousands of reasons why data sets are misinterpreted if not explained. Oh, it doesn’t include revenues from Spain. Oh, this was the data from the actual sales orders vs what ended up really being bought (cancelled, returned orders).
Additionally the “Data is Accessible” principle states that data is easy to get to. So if I have to combine disjointed data marts spread across the company to get to my information then I’ve failed this principle. XLS downloads, combining and jujitsu math and manipulation done to it is not considered an easy way to get at data.
Following these principles though is daunting. You have to pull all the data from various systems into one enterprise data warehouse construct. You have to describe what each data element exactly means. You then have to put business logic on top of the data to make it comparable or turn it into KPIs that you also all agree on at the enterprise level. if you are in a large company with several business units this is a very difficult task.
So instead of abiding by these principles blindly you have to start distinguishing different types of data.
Data not needing to be broadly shared:
Temporary data sets for exploration
Data only relevant to single department
Data simple to assemble but only needed once a year
Data needing to be broadly shared
Data for Sox compliance
Data used across the entire division or enterprise
Data exposed to customers
You decide the breakout of the data and then you apply principles as appropriate. The later set above should be documented in some Data Cataloging tool, be loaded into the data lake in its raw form, needs to have a data steward and trustee, etc. But absolutely do not require that overhead for the former data set. You might have 2, 3 or 4 categories of data that you put different principles or data management requirements behind depending on the use of that data.
As with “Data is an Asset” the principles need to be applied at a level of rigor aligned with the enterprise value of the data.
*The Open Group - The TOGAF® Standard, Version 9.2 > Part III: ADM Guidelines & Techniques > Architecture Principles