Tuesday, July 31, 2012

Data Scientists Should Be Design Thinkers

World Airline Routes

Every company is looking for that cool data scientist who will come equipped with all the knowledge of data, domain expertise, and algorithms to turn around their business. The inconvenient truth is there are no such data scientists. Mike Loukides discusses the overfocus on tech skills and cites DJ Patil:

But as DJ Patil said in “Building Data Science Teams,” the best data scientists are not statisticians; they come from a wide range of scientific disciplines, including (but not limited to) physics, biology, medicine, and meteorology. Data science teams are full of physicists. The chief scientist of Kaggle, Jeremy Howard, has a degree in philosophy. The key job requirement in data science (as it is in many technical fields) isn’t demonstrated expertise in some narrow set of tools, but curiousity, flexibility, and willingness to learn. And the key obligation of the employer is to give its new hires the tools they need to succeed.
I do agree there's a skill gap, but it is that of "data science" and not of "data scientists." What concerns me more about this skill gap is not the gap itself but the misunderstanding around how to fill it.

There will always be a skill gap when we encounter a new domain or rapidly changing technology that has a promise to help people do something radically different. You can't just create data scientists out of thin air, but if you look at the problem a little differently — perhaps educating people on what the data scientists are actually required to do and have them follow the data science behind it — the solution may not be that far-fetched as it appears to be.

Data scientists, the ones that I am proposing who would practice "data science" should be design thinkers, the ones who practice design thinking. This is why:

Multidisciplinary approach

Design thinking encourages people to work in a multidisciplinary team where each individual team member champions his or her domain to ensure a holistic approach to a solution. To be economically viable, technologically feasible, and desirable by end users summarizes the philosophy behind this approach. Without an effective participation from a broader set of disciplines the data scientists are not likely to be that effective solving the problems they are hired and expected to solve.

Outside-in thinking and encouraging wild ideas

As I have argued before, the data external to a company is far more valuable than the one they internally have since Big Data is an amalgamation of a few trends - data growth of a magnitude or two, external data more valuable than internal data, and shift in computing business models. Big Data is about redefining (yet another design thinking element, referred to as "reframing the problem") what data actually means to you and its power resides in combining and correlating these two data sets.

In my experience in working with customers, this is the biggest challenge. You can't solve a problem with a constrained and an inside-out mindset. This is where we need to encourage wild ideas and help people stretch their imagination without worrying about underlying technical constraints that have created data silos, invariably resulting into organization silos. A multidisciplinary team, by its virtue of people from different domains, is well-suited for this purpose.

What do you do once you have plenty of ideas and a vision of where you want to go? That brings me to this last point.

Rapid prototyping

Rapid prototyping is at the heart of design thinking. One of the common beliefs I often challenge is the overemphasis on perfecting an algorithm. Data is more important than algorithms; getting to an algorithm should be the core focus and not fixating on finding the algorithm. Using the power of technology and design thinking mindset, iterating rapidly on multiple data sets, you are much likely to discover insights based on a good-enough algorithm. This does sound counterintuitive to the people that are trained in designing, perfecting, and practicing complex algorithms, but the underlying technology and tools have shifted the dynamics.

Wednesday, July 18, 2012

Learn To Fail And Fail To Learn




"I have never let my schooling interfere with my education" - Mark Twain

In a casual conversation with a dad of an eight-year old over a little league baseball game on a breezy bay area evening, who also happens to be an elementary school teacher, he told me that teaching cursive writing to kids isn't particularly a bright idea. He said, "it's a dying skill." The only thing he cares about is to teach kids write legibly. He even wonders whether kids would learn typing the same way some of us learned or they would learn tap-typing due to the growing popularity of tablets. He is right.

When the kids still have to go to a "lab" to work on a "computer" while "buffering" is amongst the first ten words of a two-year old's vocabulary, I conclude that the schools haven't managed to keep up their pace with today's reality.

I am a passionate educator. I teach graduate classes and I have worked very hard to ensure that my classes — the content as well as the delivery methods — are designed to prepare students for today's and tomorrow's world. At times, I feel ashamed we haven't managed to change our K-12 system, especially the elementary schools, to prepare kids for the world they would work in.

This is what I want the kids to learn in a school:  

Learn to look for signal in noise:

Today's digital world is full of noise with a very little signal. It's almost an art to comb through this vast ocean of real-time information to make sense out of it. Despite the current generation being digital native the kids are not trained to effectively look for signal in noise. While conceited pundits still debate whether multi-tasking is a good idea or not, in reality the only way to deal with an eternal digital workflow and the associated interactions is to multitask. I want the schools to teach kids differentiate between the tasks that can be accomplished by multitasking and the ones that require their full attention. Telling them not to multitask is no longer an option.

I spend a good chunk of of time reading books, blogs, magazines, papers, and a lot of other stuff. I personally taught myself when to scan and when to read. I also taught myself to read fast. The schools emphasize a lot on developing reading skills early on, but the schools don't teach the kids how to read fast. The schools also don't teach the kids how to scan - look for signal in noise. The reading skills developed by kids early on are solely based on print books. Most kids will stop reading print books as soon as they graduate, or even before that. Their reading skills won't necessarily translate well into digital medium. I want schools to teach the kids when to scan and how to read fast, and most importantly to differentiate between these two based on the context and the content.

Learn to speak multiple languages:

I grew up learning to read, write, and speak three languages fluently. I cannot overemphasize how much it has overall helped me. One of the drawbacks of the US education system is that emphasize on a second or a third language starts very late. I also can't believe it's optional to learn a second language. In this highly globalized economy, why would you settle with just one language? Can you imagine if a very large number of Americans were to speak either Mandarin, Portuguese, Russian, or Hindi? Imagine the impact this country will have.

A recent research has proven that bilinguals have heightened ability to monitor the environment and being able to switch the context. A recent study also proved that bilinguals are more resistant to dementia and other symptoms of Alzheimer's disease.

Learn to fail and fail to learn:

"For our children, everything they will 'know' is wrong – in the sense it won’t be the primary determinant of their success. Everything they can learn anew will matter – forever in their multiple and productive careers." - Rohit Sharma

As my friend Rohit says you actually want to teach kids how to learn. Ability to learn is far more important than what you know because what you know is going to become irrelevant very soon. Our schools are not designed to deal with this. On top of that there is too much emphasis on incentivizing kids at every stage to become perfect. The teachers are not trained to provide constructive feedback to help kids fail fast, iterate, and get better.

Our education system that emphasizes on measuring students based on what and how much they know as opposed to how quickly they can learn what they don't know is counterproductive in serving its own purpose.

Learn to embrace unschooling:


Peter Thiel's 20 under 20 fellowship program has received a good deal of criticism from people who are suggesting that dropping out from a college to pursue entrepreneurship is not a good idea. I really liked the response from one of the fellows of this program, Dale Stephens, where he discusses unschooling. He is also the founder of UnCollege. Unschooling is not about not going to school but it's about not accepting the school as your only option. Lately if you have looked at the education startups, especially my favorite ones — Khan Academy, Coursera, and Codeacademy — you would realize the impact of technology and social networks on radically changing the way people learn. Our schools are neither designed to comprehend this idea nor to embrace it. This is what disruption looks like when students find different ways to compensate for things that they can't get from a school. This trend will not only continue but is likely to accelerate. This is a leading indicator suggesting that we need a change. Education is what has made this country great and it is one of the main reasons why skilled immigrants are attracted to the US. Let's not take it for granted, and let's definitely not lose that advantage.


Originally, I had written this as a guest post for Vijay Vijayasankar's blog

Photo courtesy: BarbaraLN

Monday, June 25, 2012

With Yammer, Microsoft Begins Its Journey From Collaborative To Social


Confirming what we already knew, today Microsoft announced they are acquiring Yammer for $1.2 billion in cold cash. Here's a blog post by David Sacks, the CEO of Yammer.

Microsoft doesn't report a revenue breakdown for their individual products but SharePoint is believed to be one of the fastest growing products with annual revenue of more than $1 billion. Regardless of how Microsoft markets and positions SharePoint, it has always been collaboration software and not really social software. Microsoft does seem to understand the challenges it faces in moving their portfolio of products to the cloud, including SharePoint. Microsoft also understands value of having end users on their side even though SharePoint is sold as enterprise software. Microsoft's challenges in transitioning to the cloud are similar to the ones faced by other on-premise enterprise software vendors.

But, I really admire Microsoft's commitment by not giving up on any of these things. Skype's acquisition was about reaching those millions of end users and they continue to do that with their acquisition of Yammer. Going from collaborative to social requires being able to play at the grassroots level in an organization as opposed to a top down push and more importantly being able to create and leverage network effects. It's incredibly difficult to lead in with an on-premise solution retrofitted for cloud to create network effects. Native cloud solutions do have this advantage. Yammer will do this really well while helping Microsoft to strengthen SharePoint as a product and maintain its revenue without compromising margins. If Microsoft executes this well, they might unlock a solution for their Innovator's Dilemma.

With Yammer, Microsoft does have an opportunity to fill in the missing half of social enterprise by transforming productivity silos into collaborative content curation. As a social enterprise software enthusiast, I would love to see it happen, sooner rather than later.

At personal level, I am excited to see the push for social in enterprise software and a strong will and desire to cater to the end users and not just the decision makers.  I hope that more entrepreneurs recognize that enterprise software could be social, cool, and lucrative. This also strengthens market position for the vendors such as Box and Asana.

It's impressive what an incumbent can do when they decide to execute on their strategy. Microsoft is fighting multiple battles. They do have the right cards. It's to be seen how they play the game.

Friday, June 15, 2012

Proxies Are As Useful As Real Data


Last year I ran a highly unscientific experiment. I would regularly put a DVD in an open mail bin in my office to mail it back to Netflix, every late Monday afternoon. I would also count the total number of Netflix DVDs put inside that bin by other people. Over a period of time I observed a continuous and consistent decline in the number of DVDs. I compared my results with the numbers released by Netflix. They matched. I'm not surprised. Even though this was an unscientific experiment on a very small sample size with a high degree of variables, it still gave me insights into the overall real data, that I otherwise had no access to.

Proxies are as useful as real data.

When Uber decides to launch a service in a new city or when they are assessing demand in an existing city they use crime data as surrogate to measure neighborhood activity. This measurement is a basic input in calculating the demand. There are many scenarios and applications where access to a real dataset is either prohibitively expensive or impossible. But, a proxy is almost always available and it is good enough in many cases to make certain decisions that eventually can be validated by real data. This approach, even though simple, is ignored by many product managers and designers. Big Data is not necessarily solving the problem of access to a certain data set that you may need, to design your product or make decisions, but it is certainly opening up an opportunity that didn't exist before: ability to analyze proxy data and use algorithms to correlate them with your own domain.

As I have argued before, the data external to an organization is probably far more valuable than the data that they internally have. Until now the organizations barely had capabilities to analyze a subset of their all internal data. They could not even think of doing anything interesting with the external data. This is rapidly going to change as more and more organizations dip their toes in Big Data. Don't discriminate any data sources, internal or external.

Probably the most popular proxy is the per-capita GDP to measure the standard of living. The Hemline Index is yet another example where it is believed that the women's skirts become shorter (higher hemline) during good economic times and longer during not-so-good economic times.

Source: xkcd
Proxy is just a beginning of how you could correlate several data sources. But, be careful. As wise statisticians will tell you, correlation doesn't imply causation. One of my personal favorite example is the correlation between the Yankees winning the worldseries and a democratic president in the oval office. Correlation doesn't guarantee causation, but it gives you insights into where to begin, what question to ask next, and which dataset might hold a key to that answer.This iterative approach wasn't simply feasible before. By the time people got an answer to their first question, it was too late to ask the second question. Ability to go after any dataset anytime you want opens up a lot more opportunities. At the same time when Big Data tools, computing, and access to several external public data sources become a commodity it would come down to human intelligence prioritizing the right questions to ask. As Peter Skomoroch, a principal data scientist at LinkedIn, puts it "'Algorithmic Intuition' is going to be as important a skill as 'Product Sense' in the next decade."

Thursday, May 31, 2012

I Want USPS To Think Outside The Box


Recently I had to go to a consulate to get a visa and the consulate would only accept a USPS money order and a USPS pre-paid envelope. I went to a post office to get those. That particular post office decided to change their business hours that day to open late. I hurriedly drove to a different post office where two out of there clerks didn't know how to issue a pre-paid envelope! At personal level I never look forward to going to a post office. It invariable delays my schedule. I am met with unpleasant customer service and inefficiency everywhere. This is also true with some of the other services that I get but there's one major difference. I cannot opt out of USPS.

USPS anticipates to lose about $7 billion during the fiscal year that ends in September. They even have their own conference called PostalVision 2020 where they have invited technology thought leaders such as Vint Cerf and many others to honestly and seriously look at the issues they have. The agenda is to:
"Postal Vision 2020/2.0 is as much a movement as it is a Conference.  It is a forum for an open and honest dialog to better understand the future of postal communications and shipping, and what this means to those who regulate, supply and use mail.  It’s about sharing ideas and knowledge with the hope of sparking innovation and the creation of new successful business models.  It’s about asking each other lots of difficult questions for which there may be many answers to consider before finding those that serve the long term health of the industry and any particular enterprise."
USPS is broken at so many levels; they have short term as well long term issues to deal with and it is likely to get uglier before it may get better. Channeling Geoffrey Moore, USPS needs to retain their core and and redefine the context. Massive fleet of trucks, logistics, and outlets in all foreseeable locations is their core strength. Postal mail and other related services is their context where they are simply unable to compete because of shrinking addressable market (due to digital communication) and poor service design that applies the legacy mindset to solve today's and tomorrow's problems.

USPS should think outside the box. No pun intended.  

Here are some ideas/suggestions:

Deliver groceries: Remember Webvan? I loved their service during the dot com boom. One of the main reasons they went out of business is they had no expertise on logistics. Since then nothing much has changed in home-delivered grocery business. What if USPS delivered grocery to your home? What if they partnered with a local supermarket and took over their logistics business? This is a complimentary business model. The supermarkets are not in the delivery business and it's not economical for them to enter into the logistics business. This is also a sustainable business that helps the environment. The USPS trucks are on the road no matter what, but now they can take a few cars off the road. This may sound crazy but times are changing and it's time for USPS to rethink what unfair advantage they have over others.

Re-think mail delivery: It's perfectly acceptable to me if I only receive my mail every other day. In many cases, I am fine if I don't get my mail for a week at times. There's nothing time-sensitive about my mail. And with changing demographics, this is true with a lot of other people as well. Incentivize customers to skip mail by offering them discount on other services and have less trucks and less people going around the neighborhoods. This brings the overall cost down and opens up new revenue opportunities.

Double down on self-service: I know USPS is trying hard to add more and more self-service kiosks but they're not enough. Think like Coinstar and Redbox. I should be able to do everything related to USPS at the places where I can get milk at the 11th hour, money from ATM, and gas for my car. They really need to work hard to give people a reason to use USPS when people have much better alternatives to mail packages. Think of UPS, DHL, and FedEx as incumbents and leap frog them at places, using the unfair advantage that USPS has, where they can't possibly compete.

Rethink the identity: USPS doesn't directly receive federal tax dollars and it is expected to meet expenses from the revenue it generates. But, it's not that black and white. Even though USPS doesn't get any tax money it receives plenty of other money via grants and other special funds. It's neither truly a government entity nor truly a business entity. If USPS needs to be fixed it needs to rethink its identity and decide whether it's a complete public sector or a mix of private and public sector and how. Once that identity is set they can follow through on their revenue sources, cost measures, and building an ecosystem of partners. Mixed and complicated business structure introduces complexity at all the levels and prevents the organization to think and execute in a unified way.

Monday, May 21, 2012

Data Is More Important Than Algorithms


Netflix Similarity Map

In 2006 Netflix offered to pay a million dollar, popularly known as the Netflix Prize, to whoever could help Netflix improve their recommendation system by at least 10%. A year later Korbel team won the Progress Prize by improving Netflix's recommendation system by 8.43%. They also gave the source code to Netflix of their 107 algorithms and 2000 hours of work. Netflix looked at these algorithms and decided to implement two main algorithms out of it to improve their recommendation system. Netflix did face some challenges but they managed to deploy these algorithms into their production system.

Two years later Netflix awarded the grand prize of $1 million to the work that involved hundreds of predictive models and algorithms. They evaluated these new methods and decided not to implement them. This is what they had to say:
"We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment. Also, our focus on improving Netflix personalization had shifted to the next level by then."
This appears to be strange on the surface but when you examine the details it totally makes sense.

The cost to implement algorithms to achieve incremental improvement isn't simply justifiable. While the researchers worked hard on innovating the algorithms Netflix's business as well as their customers' behavior changed. Netflix saw more and more devices being used by their users to stream movies as opposed to get a DVD in mail. The main intent behind the million dollar prize for Netflix was to perfect their recommendation system for their DVD subscription plan since those subscribers carefully picked the DVDs recommended to them as it would take some time to receive those titles in mail. Customers wanted to make sure that they don't end up with lousy movies. Netflix didn't get any feedback regarding those titles until after their customers had viewed them and decided to share their ratings.

This customer behavior changed drastically when customers started following recommendations in realtime for their streaming subscription. They could instantaneously try out the recommended movies and if they didn't like them they tried something else. The barrier to get to the next movie that the customers might like significantly went down. Netflix also started to receive feedback in realtime while customers watched the movies. This was a big shift in user behavior and hence in recommendation system as customers moved from DVD to streaming.

What does this mean to the companies venturing into Big Data?

Algorithms are certainly important but they only provide incremental value on your existing business model. They are very difficult to innovate and way more expensive to implement. Netflix had a million dollar prize to attract the best talent, your organization probably doesn't. Your organization is also less likely to open up your private data into the public domain to discover new algorithms. I do encourage to be absolutely data-driven and do everything that you can to have data as your corporate strategy including hiring a data a scientist. But, most importantly, you should focus on your changing business — disruption and rapidly changing customer behavior — and data and not on algorithms. One of the promises of Big Data is to leave no data source behind. Your data is your business and your business is your data. Don't lose sight of it. Invest in technology and more importantly in people who have skills to stay on top of changing business models and unearth insights from data to strengthen and grow business. Algorithms are cool but the data is much cooler.

Monday, April 30, 2012

Fixing Software Patents, One Hack At Time



Software patents are broken and patent trolls are seriously hurting innovation. Companies are spending more money on buying patents to launch offensive strikes against other companies instead of competing by building great products. There are numerous patent horror stories I could outline where they are being used for all purposes except to innovate. In fact the software patent system as it stands today has nothing to do with innovation at all. This is the sad side of the Silicon Valley. While most people are whining about how software patent trolls are killing innovation some are trying to find creative ways to fix the problems. This is why it was refreshing to see Twitter announcing their policy on patents, Innovator's Patent Agreement, informally called IPA. As per IPA, patents can only be used in an offensive litigation if the employees who were granted the patents consent to it. I have no legal expertise to comment on how well IPA itself might hold up in a patent litigation but I am thrilled to see companies like Twitter stepping up to challenge the status quo by doing something different about it. If you're an employee you want three things: innovate, get credit for your innovation, and avoid your patents being used as an offensive tool. IPA is also likely to serve as a hiring magnet for great talent. Many other companies are likely to follow the suit. I also know of a couple of VCs that are aggressively pushing their portfolio companies to adopt IPA.

The other major challenge with software patents is the bogus patents granted based on obvious ideas. I really like the approach taken by Article One Partners to deal with such patent trolls. Article One Partners crowdsources the task of digging the prior art to identify bogus patents and subsequently forces the US patent office to invalidate them. Turns out that you don't have to be a lawyer to find prior art. Many amateurs who love to research this kind of stuff have jumped into this initiative and have managed to find prior art for many bogus patents. It's very hard to change the system but it's not too hard to find creative ways to fix parts of the system.

I would suggest going beyond the idea of crowdsourcing the task to find the prior art. We should build open tools to gather and catalog searchable prior art. If you have an idea just enter into that database and it becomes prior art. This would make it incredibly difficult for any company to patent an obvious idea since it would already be a prior art. We should create prior art instead of reactively research for it. Open source has taught us many things and it's such a vibrant community. I can't imagine the state of our industry without open source. Why can't we do the same for patents? I want to see Creative Commons of patents.

The industry should also create tools to reverse translate patents by taking all the legal language out of it to bring transparency to show for what purposes that patents are being granted for.

I would also want to see an open source like movement where a ridiculously large set of patents belong to one group - a GitHub of patents. And that group will go after anyone who attempt to impede innovation by launching an offensive strike. If you can't beat a troll then become one.

Silicon valley is a hacker community and hackers should do what they are good at, hack the system — to fix it — using creative ways.

Photo: Opensource.com