Story image

More on the “Data Lake Fallacy”…

03 Sep 2014

I visited a client the other day and they wanted to talk about data lakes. Someone at the client, not at the meeting, had been promoting the concept of a data lake as an answer to question we explored.

Before I tell you what happened, let me update you on my “opening” position.

A few weeks ago my colleague Nick Heudecker and I published a note (See The Data Lake Fallacy: All Water and No Substance) on data lakes.

The note called out what appeared to be missing from the vendor hype related to data lakes, that being the lack of any sustaining practice (or technology) to help any value persistence from re-use of the data in the lake.

There IS value in mining information in a lake. But to assume that the IP and structure used to expose that insight and value persists in the data lake is wrong.

A data lake does not persist that. In the jargon, “no information governance, no sustainable or repeatable value”. It seems to be good advice.

Not everyone agrees. Another colleague of mine brought this InfoWorld “review” by a “strategic developer” to my attention – see “Gartner gets the ‘data lake” concept all wrong”.

It seems we said that data lakes are not useful, and that somehow a large scale, enterprise wide, wall to wall governance effort is required.

Apparently we were also touting proprietary technology. Since we don’t support either perspective (devoid of context, and data lakes is not sufficient in either case) I don’t even feel the need to respond.

If there had been a response to the main fallacy we call out, I would have. Truth is, if you don’t maintain any structure in the data you use, how on earth can someone that follows you get a leg up, and avoid repeating your effort? Either way the hype around data lakes continues apace.

So let’s go back to the meeting this week with the client.

This client has several established data warehouses, each with some successful if local information governance supporting analytics.

The client had 17 or so data centers, each supporting one of these data warehouses. The business uses these 17 systems a lot and gets value from the data- they rely on what they get from them.

There was one question: can we use a data lake? However we had to drill down to the REAL questions behind what was being asked. There were two real questions/desires:

* Can we reduce IT costs by reducing the number of data centers, and

* Can we increase synergy by supporting shared governance across the silos, as if we had a single, unified layer?

In truth this client wants to consolidate data centers, and quite separately adopt a focused information governance program to sustain common data spanning and connecting the local insights for additional value.

As far as I can tell, a data lake plays no role in either question. Yet it was being pushed by a vendor to one of the end users at this client.

The end-user even spotted the fallacy themselves. They asked, “If we used a data lake, don’t we actually take steps backwards, in that we ‘lose’ all those currently silod yet effective IP and governance frameworks?

YES! A data lake by definition has a zero barrier to entry and so supports zero information governance. Any and all data is accepted because it has no need to confirm or relate to the rest of the data that exists in the lake already.

If there IS a cost to enter, it is not a data lake. In contrast, a data warehouse or EDW has a higher barrier to entry. So why not go for a balance? In this case the user was right. A data lake would be a step backward. .

So why was data lake being referenced? Perhaps this vendor is selling a form of data warehouse but wanting to use the new silvery bullet-like name. My final recommendation to the client: forget the new names.

Identify the real requirement (data center consolidation, and multi-warehouse information governance) and design the target architecture. If you really want a name for it, let’s chat again. But don’t use “data lake” since it does not seem to fit.

By Andrew White - Analyst, Gartner

NZ investment funds throw weight against social media giants
A consortium of NZ funds managing assets worth more than $90m are appealing against Facebook, Twitter, and Google following the Christchurch terror attacks.
Poly appoints new A/NZ managing director, Andy Hurt
“We’re excited to be bringing together two established pioneers in audio and video technology to be moving forward and one business – Poly."
NVIDIA announces Jetson Nano: A US$99 tiny, yet mighty AI computer 
“Jetson Nano makes AI more accessible to everyone, and is supported by the same underlying architecture and software that powers the world's supercomputers.”
Unity and NVIDIA announce real-time ray tracing across industries
For situations that demand maximum photorealism and the highest visual fidelity, ray tracing provides reflections and accurate dynamic computations for global lighting.
Slack doubles down on enterprise key management
EKM adds an extra layer of protection so customers can share conversations, files, and data while still meeting their own risk mitigation requirements.
NVIDIA introduces a new breed of high-performance workstations
“Data science is one of the fastest growing fields of computer science and impacts every industry."
Apple says its new iMacs are "pretty freaking powerful"
The company has chosen the tagline “Pretty. Freaking powerful” as the tagline – and it’s not too hard to see why.
NZ ISPs issue open letter to social media giants to discuss censorship
Content sharing platforms have a duty of care to proactively monitor for harmful content, act expeditiously to remove content which is flagged to them as illegal.