And the Big Shall Lead

Friday, November 11, 2011 by Dr. Elliot King

The ongoing growth in the amount of data organizations collect continues to defy imagination. Those of you who have been around for the entire personal computer revolution probably remember when computer manufacturers started integrating 10 megabyte hard drives with their PCs. Who would ever need that much data, many a sharp pundit asked?

Today, 10 megabyte units are unimaginable. Now Cisco Systems is predicting that buy 2015, we will enter the age of the zettabyte. And that is just one estimate. According to IBM, more than a zettabyte of data was produced last year. If the present patterns continue, according to IDC by 2020 the digital universe will be made up of more than 35 zettabytesImage contributed by: renjith krishnan

And just like 10 megabytes of information was hard to imagine 30 years ago, a zettabyte (1000 exabytes) is hard to imagine now. It is the equivalent of 36,000 hours of HD video or every person on earth tweeting continuously for 100 years. Yeah, those kinds of comparisons have no meaning for me either.

But the key point is that the entire digital universe is actually made up from one tweet at a time, so to speak. So while the whole pie is getting bigger, so is the slice that you will have to manage. And the current conventional approaches to data management may be insufficient to allow companies to safely storeretrieve and analyze their data efficiently. 

Moreover, the concept of big data doesn’t really refer to zettabytes, exabytes or petabytes. The advent of Big Data means that there is too much data to be managed using conventional tools. For many companies that means 10 terabytes of data or more. And managing 10 terabytes is not really uncommon any more.

So the strategies being developed by organizations managing really large amounts of data may be useful to smaller organizations in the near future. For example, the Library of Congress - which processes 2.5 petabytes of data a year-- uses an approach that reflects its roots as a library. It invests heavily in developing a comprehensive metadata database and stores most of the actual content offline. Amazon uses what can be described as a federated approach to managing its data; slicing its databases into manageable pieces. 

The point is this. The era of Big Data is upon us. And in this case, large enterprises will lead the way in pioneering the techniques needed to negotiate the challenges that will emerge.

Image contributed by: renjith krishnan

Comments for And the Big Shall Lead

Wednesday, November 16, 2011 by Dennis Fletcher:
It looks like size of database repositories isn't the only thing to worry about.....the RATE that the data arrives and the FORM it arrives in contribute to the challenge just as much.

Leave a comment





Captcha