cloud computing: April 2012

Monday, April 30, 2012

Fixing Software Patents, One Hack At Time

Software patents are broken and patent trolls are seriously hurting innovation. Companies are spending more money on buying patents to launch offensive strikes against other companies instead of competing by building great products. There are numerous patent horror stories I could outline where they are being used for all purposes except to innovate. In fact the software patent system as it stands today has nothing to do with innovation at all. This is the sad side of the Silicon Valley. While most people are whining about how software patent trolls are killing innovation some are trying to find creative ways to fix the problems. This is why it was refreshing to see Twitter announcing their policy on patents, Innovator's Patent Agreement, informally called IPA. As per IPA, patents can only be used in an offensive litigation if the employees who were granted the patents consent to it. I have no legal expertise to comment on how well IPA itself might hold up in a patent litigation but I am thrilled to see companies like Twitter stepping up to challenge the status quo by doing something different about it. If you're an employee you want three things: innovate, get credit for your innovation, and avoid your patents being used as an offensive tool. IPA is also likely to serve as a hiring magnet for great talent. Many other companies are likely to follow the suit. I also know of a couple of VCs that are aggressively pushing their portfolio companies to adopt IPA.

The other major challenge with software patents is the bogus patents granted based on obvious ideas. I really like the approach taken by Article One Partners to deal with such patent trolls. Article One Partners crowdsources the task of digging the prior art to identify bogus patents and subsequently forces the US patent office to invalidate them. Turns out that you don't have to be a lawyer to find prior art. Many amateurs who love to research this kind of stuff have jumped into this initiative and have managed to find prior art for many bogus patents. It's very hard to change the system but it's not too hard to find creative ways to fix parts of the system.

I would suggest going beyond the idea of crowdsourcing the task to find the prior art. We should build open tools to gather and catalog searchable prior art. If you have an idea just enter into that database and it becomes prior art. This would make it incredibly difficult for any company to patent an obvious idea since it would already be a prior art. We should create prior art instead of reactively research for it. Open source has taught us many things and it's such a vibrant community. I can't imagine the state of our industry without open source. Why can't we do the same for patents? I want to see Creative Commons of patents.

The industry should also create tools to reverse translate patents by taking all the legal language out of it to bring transparency to show for what purposes that patents are being granted for.

I would also want to see an open source like movement where a ridiculously large set of patents belong to one group - a GitHub of patents. And that group will go after anyone who attempt to impede innovation by launching an offensive strike. If you can't beat a troll then become one.

Silicon valley is a hacker community and hackers should do what they are good at, hack the system — to fix it — using creative ways.

Photo: Opensource.com

Wednesday, April 18, 2012

4 Big Data Myths - Part II

This is the second and the last part of this two-post series blog post on Big Data myths. If you haven't read the first part, check it out here.

Myth # 2: Big Data is an old wine in new bottle

I hear people say, "Oh, that Big Data, we used to call it BI." One of the main challenges with legacy BI has been that you pretty much have to know what you're looking for based on a limited set of data sources that are available to you. The so called "intelligence" is people going around gathering, cleaning, staging, and analyzing data to create pre-canned "reports and dashboards" to answer a few very specific narrow questions. By the time the question is answered its value has been diluted. These restrictions manifested from the fact that the computational power was still scarce and the industry lacked sophisticated frameworks and algorithms to actually make sense out of data. Traditional BI introduced redundancies at many levels such as staging, cubes etc. This in turn reduced the the actual data size available to analyze. On top of that there were no self-service tools to do anything meaningful with this data. IT has always been a gatekeeper and they were always resource-constrained. A lot of you can relate to this. If you asked the IT to analyze traditional clickstream data you became a laughing stroke.

What is different about Big Data is not only that there's no real need to throw away any kind of data, but the "enterprise data", which always got a VIP treatment in the old BI world while everyone else waited, has lost that elite status. In the world of Big Data, you don't know which data is valuable and which data is not until you actually look at it and do something about it. Every few years the industry reaches some sort of an inflection point. In this case, the inflection point is the combination of cheap computing — cloud as well as on-premise appliances — and emergence of several open computing data-centric software frameworks that can leverage this cheap computing.

Traditional BI is a symptom of all the hardware restrictions and legacy architecture unable to use relatively newer data frameworks such as Hadoop and plenty of others in the current landscape. Unfortunately, retrofitting existing technology stack may not be that easy if an organization truly wants to reap the benefits of Big Data. In many cases, buying some disruptive technology is nothing more than a line item in many CIOs' wish-list. I would urge them to think differently. This is not BI 2.0. This is not a BI at all as you have known it.

Myth # 1: Data scientist is a glorified data analyst

The role of a data scientist has exponentially grown in its popularity. Recently, DJ Patil, a data scientist in-residence at Greylock, was featured on Generation Flux by Fast Company. He is the kind of a guy you want on your team. I know of a quite a few companies that are unable to hire good data scientists despite of their willingness to offer above-market compensation. This is also a controversial role where people argue that a data scientist is just a glorified data analyst. This is not true. Data scientist is the human side of Big Data and it's real.

If you closely examine the skill set of people in the traditional BI ecosystem you'll recognize that they fall into two main categories: database experts and reporting experts. Either people specialize in complicated ETL processes, database schemas, vendor-specific data warehousing tools, SQL etc. or people specialize in reporting tools, working with the "business" and delivering dashboards, reports etc. This is a broad generalization, but you get the point. There are two challenges with this set-up: a) the people are hired based on vendor-specific skills such as database, reporting tools etc. b) they have a shallow mandate of getting things done with the restrictions that typically lead to silos and lack of a bigger picture.

The role of a data scientist is not to replace any existing BI people but to complement them. You could expect the data scientists to have the following skills:

Deep understanding of data and data sources to explore and discover the patterns at which data is being generated.
Theoretical as well practical (tool) level understanding of advanced statistical algorithms and machine learning.
Strategically connected with the business at all the levels to understand broader as well deeper business challenges and being able to translate them into designing experiments with data.
Design and instrument the environment and applications to generate and gather new data and establish an enterprise-wide data strategy since one of the promises of Big Data is to leave no data behind and not to have any silos.

I have seen some enterprises that have a few people with some of these skills but they are scattered around the company and typically lack high level visibility and an executive buy-in.

Whether data scientists should be domain experts or not is still being debated. I would strongly argue that the primary skill to look for while hiring a data scientist should be how they deal with data with great curiosity and asking a lot of whys and not what kind of data they are dealing with. In my opinion if you ask a domain expert to be a data expert, preconceived biases and assumptions — knowledge curse — would hinder the discovery. Being naive and curious about a specific domain actually works better since they have no pre-conceived biases and they are open to look for insights in unusual places. Also, when they look at data in different domains it actually helps them to connect the dots and apply the insights gained in one domain to solve problems in a different domain.

No company would ever confess that their decisions are not based on hard facts derived from extensive data analysis and discovery. But, as I have often seen, most companies don't even know that many of their decisions could prove to be completely wrong had they have access to right data and insights. It's scary, but that's the truth. You don't know what you don't know. BI never had one human face that we all could point to. Now, in the new world of Big Data, we can. And it's called a data scientist.

Photo courtesy: Flickr