Wednesday, November 30, 2011

Coming To A Place Near You: A Private Cloud Spiked With Big Data


Netflix similarity map
Yesterday, I moderated a couple of panels at the Big Data Cloud event. I have been a keynote speaker, panelist, moderator, and participant for many conferences in the last few years. It has always been a pleasure to see the cloud and big data becoming more and more mainstream. Here are my quick observations and insights from the event:

Private cloud getting momentum: As a public cloud proponent I thought I would never have to write this. But lately I have seen more and more interest in private cloud; new start-ups, established cloud vendors, and large legacy vendors are designing private or hybrid cloud solutions. Vendors have recognized that prospects and customers have started to take cloud very seriously but they still have the same concerns what they had few years back: security, moving data to public cloud, and giving up control. I am not interested in the private/public debate (though I do love to mess with fellow clouderati on Twitter on this topic). My take on this trend is that the vendors should do whatever it takes to move the organizations to the cloud, private or public. Once companies dip their toes, they themselves will realize what's good for them.

Big Data as a serious category: A few days back, I blogged about big data going mainstream. Coming out from this event, it felt like, today Big Data is where cloud was a couple of years back. When I asked people a few years back "What's Hadoop?" They would reply "Huh?" Now, everyone wants to know more about Hadoop, Hive, HBase, S4, Oozie, Pig, Cassandra, and other big data frameworks. They're interested in analyzing and comparing available solutions. They're asking all the right questions. The VC investment in this category has been record high. Hadoop World was a sold out event this year with 1500 participants. Milind Bhandarkar, Chief Architect with Greenplum Labs, mentioned that in 2008, during the first Hadoop summit, they had to coax people to come to the summit. The people who willingly came to the summit either worked for Yahoo or Facebook. We have come a long way and there's a long way to go but this is a rock solid category. As the first set of big data infrastructure companies settle in we will see people building killer applications and PaaS solutions specifically designed to leverage big data. It is encouraging to see more and more companies and venture capitalists recognizing that the data is worth a lot more if they have the right tools and right people — the data scientists — to do something interesting with it. For example, Greylock partners have hired DJ Patil as a "data scientist in residence" to help them with evaluating their opportunities and advising their portfolio companies on big data strategies.

Rise in popularity of open source frameworks: If you follow the history of open source you'll realize that when a proprietary way of doing things become popular, commercial vendors pose a lock-in threat, and things don't work as expected, developers get frustrated and start to work on filling that gap by building open source technology. Linux started that way and so were many other open source projects. This is why I am excited to see OpenStack gaining rapid momentum. It's slowly becoming a de-facto standard to build a commercial cloud solutions. I also like Cloud Foundry since many companies that I know of, ISVs and large IT shops, won't use a public PaaS. They would prefer to launch their own PaaS solution in the cloud. Without an open source solution, it does become a big challenge.

Monday, November 7, 2011

Early Signs Of Big Data Going Mainstream


Today, Cloudera announced a new $40m funding round to scale their sales and marketing efforts and a partnership with NetApp where NetApp will resell Cloudera's Hadoop as part of their solution portfolio. These both announcements are critical to where the cloud and Big Data are headed.

Big Data going mainstream: Hadoop and MapReduce are not only meant for Google, Yahoo, and fancy Silicon Valley start-ups. People have recognized that there's a wider market for Hadoop for consumer as well as enterprise software applications. As I have argued before Hadoop and Cloud is a match made in heaven. I blogged about Cloudera and the rising demand of data-centric massive parallel processing almost 2.5 years back, Obviously, we have come a long way. The latest Hadoop conference is completely sold out. It's good to see the early signs of Hadoop going mainstream. I am expecting to see similar success for companies such as Datastax (previously Riptano) which is a "Cloudera for Cassandra."

Storage is a mega-growth category: We are barely scratching the surface when it comes to the growth in the storage category. Big data combined with the cloud growth is going to drive storage demand through the roof and the established storage vendors are in the best shape to take advantage of this opportunity. I wrote a cloud research report and predictions this year with a luminary analyst Ray Wang where I mentioned that cloud storage will be a hot cake and NoSQL will skyrocket. It's true this year and it's even more true next year.

Making PaaS even more exciting: PaaS is the future and Hadoop and Cassandra are not easy to deploy and program. Availability of such frameworks at lower layers makes PaaS even more exciting. I don't expect the PaaS developers to solve these problems. I expect them to work on providing a layer that exposes the underlying functionality in a declarative as well as a programmatic way to let application developers pick their choice of PaaS platform and build killer applications.

Push to the private cloud: Like it or not, availability of Hadoop from an "enterprise" vendor is going to help the private cloud vendors. NetApp has a fairly large customer base and their products are omnipresent in large private data centers. I know many companies that are interested in exploring Hadoop for a variety of their needs but are somewhat hesitant to go out to a public cloud since it requires them to move their large volume of on-premise data to the cloud. They're more likely to use a solution that comes to their data as opposed to moving their data to where a solution resides.