Friday, March 30, 2012

4 Big Data Myths - Part I



It was cloud then and it's Big Data now. Every time there's a new disruptive category it creates a lot of confusion. These categories are not well-defined. They just catch on. What hurts the most is the myths. This is the first part of my two-part series to debunk Big Data myths.

Myth # 4: Big Data is about big data

It's a clear misnomer. "Big Data" is a name that sticks but it's not just about big data. Defining a category just based on size of data appears to be quite primitive and rather silly. And, you could argue all day about what size of data qualifies as "big." But, the name sticks, and that counts. The insights could come from a very small dataset or a very large data set. Big Data is finally a promise not to discriminate any data, small or large.

It has been prohibitively expensive and almost technologically impossible to analyze large volumes of data. Not any more. Today, technology — commodity hardware and sophisticated software to leverage this hardware — changes the way people think about small and large data. It's a data continuum. Big Data is not just about technology, either. Technology is just an enabler. It has always been. If you think Big Data is about adopting new shiny technology, that's very limiting. Big Data is an amalgamation of a few trends - data growth of a magnitude or two, external data more valuable than internal data, and shift in computing business models. The companies mainly looked at their operational data, invested into expensive BI solutions, and treated those systems as gold. Very few in a company got very little value out of those systems.

Big Data is about redefining what data actually means to you. Examine the sources that you never cared to look at before, instrument your systems to generate the kind of data that are valuable to you and not to your software vendor. This is not about technology. This is about completely new way of doing business where data finally gets the driver's seat. The conversations about organizations' brands and their competitors' brands are happening in social media that they neither control nor have a good grasp of. At Uber, Bradly Voytek, a neuroscientist is looking at interesting ways to analyze real-time data to improve the way Uber does business. Recently, Target came under fire for using data to predict future needs of a shopper. Opportunities are in abundance.

Myth # 3: Big Data is for expert users    

The last mile of Big Data is the tools. As technology evolves the tools that allow people to interact with data have significantly improved, as well. Without these tools the data is worth nothing. The tools have evolved in all categories ranging from simple presentation charting frameworks to complex tools used for deep analysis. With rising popularity and adoption of HTML 5 and people's desire to consume data on tablets, the investment in presentation side of the tools have gone up. Popular javascript frameworks such as D3 have allowed people to do interesting things such as creating a personal annual report. Availability of a various datasets published by several public sector agencies in the US have also spurred some creative analysis by data geeks such as this interactive report that tracks money as people move to different parts of the country.

The other exciting trend has been the self-service reporting in the cloud and better abstraction tools on top of complex frameworks such as Hadoop. Without self-service tools most people will likely be cut off from the data chain even if they have access to data they want to analyze. I cannot overemphasize how important the tools are in the Big Data value chain. They make it an inclusive system where more people can participate in data discovery, exploration, and analysis. Unusual insights rarely come from experts; they invariably come from people who were always fascinated by data but analyzing data was never part of their day-to-day job. Big Data is about enabling these people to participate - all information accessible to all people.

Coming soon in the Part II: Myth # 2 and Myth # 1.

Wednesday, March 21, 2012

Learning From Elevators To Design Dynamic Systems


Elevators suck. They are not smart enough to know which floor you might want to go. They aren't designed to avoid crowding in single elevator. And they make people press buttons twice, once to call an elevator and then to let it know which floor you want to go to. This all changed during my recent trip to Brazil when I saw the newer kind of elevators.

These elevators have a common button panel outside in the lobby area of a high rise building. All people are required to enter their respective floor numbers and the machine will display a specific elevator number that they should get into. Once you enter into an elevator you don't press any numbers. In fact the elevators have no buttons at all. The elevator would highlight the floor numbers that it would stop at. That's it! I love this redesigned experience of elevators. It solves a numbers of problems. The old style elevators could not predict the demand. Now the system exactly knows how many people are waiting at what floors wanting to go where. This allows the system to optimize the elevator experience based on several variables and criteria such as speed, priority, even distribution, power conservation etc. This also means an opportunity to write interesting algorithms for these elevators.

This is how I want ALL the systems to be - smart, adaptive, and dynamic. Just like this elevator I would like to see the systems, especially the cloud and the analytics, to anticipate the needs of the end users as opposed to following their commands. The context is the key to the success of delivering what users would expect. If the systems are designed to inquire about the context — directly or indirectly, just like asking people to push buttons before they get into an elevator — they would perform more intelligently. Some location-based systems have started to explore this idea, but it's just the beginning. This also has significant impact on designing collaborative recommendation systems that could help the end users find the right signal in the ever increasing noise of social media.

The very idea of the cloud started with the mission to help users with elasticity of the commodity resources without having users to learn a different interface by giving them a unified abstraction. If you had two elevators in a lobby, you wouldn't use this. But, for a high rise with a few elevators, the opportunities are in abundance to optimize the system to use the available resources to provide the best experience to the people, the end users.

Self-configuring and self-healing dynamic systems have been a fantasy, but as the cloud becomes more mature, dynamic capabilities to anticipate the needs of an application and its users are not far fetched. Computing and storage are commodity on the cloud. I see them as resources just like elevators. Instead of people pushing buttons at the eleventh hour I would prefer the cloud take a driver's seat and becomes much smarter at anticipating and managing applications, platforms, and mixed workload. I want the cloud to take this experience to the next level by helping developers develop such adaptive and dynamic applications. I almost see it as a scale issue, at system as well as at human level. If the cloud does promise scale I expect it to go beyond the commodity computing. This is why PaaS excites me more than anything else. That's a real deal to make a difference.