Story image

More on the “Data Lake Fallacy”…

03 Sep 14

I visited a client the other day and they wanted to talk about data lakes. Someone at the client, not at the meeting, had been promoting the concept of a data lake as an answer to question we explored.

Before I tell you what happened, let me update you on my “opening” position.

A few weeks ago my colleague Nick Heudecker and I published a note (See The Data Lake Fallacy: All Water and No Substance) on data lakes.

The note called out what appeared to be missing from the vendor hype related to data lakes, that being the lack of any sustaining practice (or technology) to help any value persistence from re-use of the data in the lake.

There IS value in mining information in a lake. But to assume that the IP and structure used to expose that insight and value persists in the data lake is wrong.

A data lake does not persist that. In the jargon, “no information governance, no sustainable or repeatable value”. It seems to be good advice.

Not everyone agrees. Another colleague of mine brought this InfoWorld “review” by a “strategic developer” to my attention – see “Gartner gets the ‘data lake” concept all wrong”.

It seems we said that data lakes are not useful, and that somehow a large scale, enterprise wide, wall to wall governance effort is required.

Apparently we were also touting proprietary technology. Since we don’t support either perspective (devoid of context, and data lakes is not sufficient in either case) I don’t even feel the need to respond.

If there had been a response to the main fallacy we call out, I would have. Truth is, if you don’t maintain any structure in the data you use, how on earth can someone that follows you get a leg up, and avoid repeating your effort? Either way the hype around data lakes continues apace.

So let’s go back to the meeting this week with the client.

This client has several established data warehouses, each with some successful if local information governance supporting analytics.

The client had 17 or so data centers, each supporting one of these data warehouses. The business uses these 17 systems a lot and gets value from the data- they rely on what they get from them.

There was one question: can we use a data lake? However we had to drill down to the REAL questions behind what was being asked. There were two real questions/desires:

* Can we reduce IT costs by reducing the number of data centers, and

* Can we increase synergy by supporting shared governance across the silos, as if we had a single, unified layer?

In truth this client wants to consolidate data centers, and quite separately adopt a focused information governance program to sustain common data spanning and connecting the local insights for additional value.

As far as I can tell, a data lake plays no role in either question. Yet it was being pushed by a vendor to one of the end users at this client.

The end-user even spotted the fallacy themselves. They asked, “If we used a data lake, don’t we actually take steps backwards, in that we ‘lose’ all those currently silod yet effective IP and governance frameworks?

YES! A data lake by definition has a zero barrier to entry and so supports zero information governance. Any and all data is accepted because it has no need to confirm or relate to the rest of the data that exists in the lake already.

If there IS a cost to enter, it is not a data lake. In contrast, a data warehouse or EDW has a higher barrier to entry. So why not go for a balance? In this case the user was right. A data lake would be a step backward. .

So why was data lake being referenced? Perhaps this vendor is selling a form of data warehouse but wanting to use the new silvery bullet-like name. My final recommendation to the client: forget the new names.

Identify the real requirement (data center consolidation, and multi-warehouse information governance) and design the target architecture. If you really want a name for it, let’s chat again. But don’t use “data lake” since it does not seem to fit.

By Andrew White - Analyst, Gartner

Report finds GCSB in compliance with NZ rights
The Inspector-General has given the GCSB its compliance tick of approval for the fourth year in a row.
Preparing for e-invoicing requirements
The New Zealand and Australian governments are working on a joint approach to create trans-Tasman standards to e-invoicing that’ll make it easier for businesses in both countries work with each other and across the globe
5c more per share: Trade Me bidding war heats up
Another bidder has entered the bidding arena as the potential sale of Trade Me kicks up a notch.
Hootsuite's five social trends marketers should take note of
These trends should keep marketers, customer experience leaders, social media professionals and executives awake at night.
Company-X celebrates ranking on Deloitte's Fast 500 Asia Pacific
Hamilton-based software firm Company-X has landed a spot on Deloitte Technology’s Fast 500 Asia Pacific 2018 ranking - for the second year in a row.
Entrepreneur reactivates business engagement in AU Super funds
10 million workers leave it up to employers to choose their Super fund for them – and the majority of employers are just as passive and unengaged at putting that fund to work.
Tether: The Kiwi startup fighting back against cold, damp homes
“Mould and mildew are the new asbestos. But unlike asbestos, detecting the presence – or conditions that encourage growth – of mould and mildew is nearly impossible."
Capitalising on exponential IT
"Exponential IT must be a way of life, not just an endpoint."