Have You Visited the DataLake Yet?

Teja Manakame, Senior Director-Data Intelligence, Dell IT

In the era of ever exploding data volume and the realization of the value this data can bring to business, IT organizations have an upheav­ing challenge to capture and store all forms of data to derive insights from them. Faced with humongous volume and the heterogene­ous types of data, organizations need more than just a traditional data management or a data warehouse. They need something innovative that can offer better agility and flexibility to manage their Big Data.

With the exposure to cloud based technologies, busi­nesses are uniquely positioned to be more informed than ever before; they want to make better data-driven deci­sions and in turn expect that the latest analytic tech­nologies be available at their fingertips. The super con­nected network of people, processes, data and tools is disrupting both the implementation and consumption of traditional data management and its analytics. IT now needs tore-position itself towards a more efficient, cost-effective, self-service model to meet these demands.

Data Lakeis a relatively a new and increasingly popu­lar way to store and analyze data that addresses many of these challenges. A Data Lake is a pool of unstruc­tured and structured data coming from different sourc­es, stored as-is, without a specific purpose in mind, that can be “built on multiple technologies such as Hadoop, NoSQL, any Simple Storage Service, a relational data­base, or various combinations thereof,” saved on usually low commodity hardware.

With the growing popularity of Hadoop as the Big Data analytics platform, this solution helps speed time to insights into data from multiple dimensions and re­duces the risks and costs associated with deploying new systems or extending existing ones as business needs change. One of the basic ten­ets of Hadoop and distributed computing is the notion of moving the compute to the data, rather than the re­verse. The Hadoop –based datalake is gaining in popu­larity because it can capture the volume of big data and other new sources that enterprises want to leverage via analytics, and it does so at a low cost and with good interoperability with other platforms in the dataware­housing world. In this sense, Hadoop and datalakes add value to the Data Warehouse and its environment with­out ripping and replacing mature investments.

The data lake and the enterprise data warehouse must both do what they do best and work together as com­ponents of a logical data warehouse. The logical data warehouse, made up of an enterprise data warehouse, a datalake, and a discovery platform to facilitate analyt­ics across the architecture, will determine what data and what analytics to use to answer business needs. Data­Lake also comes with features like SQOOP where load­ing data from Relational Data bases onto HIVE is easy and faster. Data modeling skill is not required and data can be queried leveraging its query features without the knowledge of SQL.

The ability to capture and process this ever growing business data is now possible because of the growth of inexpensive storage and limitless compute, along with the invention of new technologies that enable real-time analysis and a direct connection to action through new applications and products. EMC Isilon is one such example; it has multi-protocol scale-out file storage for DataLake kind of applications.

The new datalake 2.0 strategy expands the datalake to extend from the datacenter to the enterprise edge locations and to your choice of pub­lic or private cloud options. With Isilon CloudPools software, the da­talake can be extended to provide virtually limitless capacity without adding any complexity to store or manage the data.

The Hadoop datalake isn't with­out its challenges. Even experienced Hadoop datalake users say that a successful implementation requires a strong architecture, security gates and disciplined data governance policies- without those things, they warn, datalake systems can be come out-of-control dumping grounds of exploding data.

In conclusion, in the era of Data-Driven Innovation the emer­gence of the data lake comes from the need to manage and exploit new forms of data. Many companies feel like they are on the cutting edge of BigData analytics in the enterprise by leveraging this. More important­ly, it helps with the foundation and tools to use data and analytics to create sustainable, long-term com­petitive differentiation.

The shape of your datalake is determined by what you need to do but cannot with your current data processing architecture. The right datalake can only be created through experimentation. Togeth­er, the data lake and the enterprise data warehouse provide a synergy of capabilities that delivers acceler­ating returns, allowing people to do more with data faster and driv­ing business results. It is a game-changer not because it saves IT a whole bunch of money, but be­cause it can help the business make huge money.