Wednesday, May 22, 2013

Lead, Follow, Or Get Out Of The Way



If you have been following this blog you would know that I mainly blog about enterprise software, cloud, and big data with a few occasional posts on design and design thinking. That's what I am most passionate about. Having spent my entire career building enterprise software I have realized that success and competitive differentiation in market place boil down to an organization's unique ability to get three things right where management plays a key role: 1) people who can continuously learn and adapt to change 2) processes that are nimble and evolve as the company evolves 3) products that solve a real problem and delight the end users. While I continue to blog about enterprise software I have decided to evolve this blog further by adding a few management posts going forward.

There are a series of management topics that I am interested in but let's start with the basic one which is about my core management philosophy. My management philosophy is "lead, follow, or get out of the way." In any situation I ask myself whether I should be leading in this situation or following someone's lead and extend my full support to do so. If neither make sense I simply get out of the way and let people do their job. Building, selling, and supporting software, like many other things, require a loosely-connected (to put it in software terms) organization where there are leaders who lead and follow other leaders at the same time. This gets more and more complicated as the size and portfolio of an organization grow over years. People draw artificial boundaries and lose sight of the mission and the big picture.

Leading is hard, following is harder, and getting out of the way is the hardest which requires a conscious attempt to empower people to do their job without getting into their way. But, it is an approach that does work and I encourage you to try it out and share it with others.

Photo courtesy: Pison Jaujip 

Tuesday, April 30, 2013

Justifying Big Data Investment


Traditionally companies invest into software that has been proven to meet their needs and has a clear ROI. This model falls apart when disruptive technology such as Big Data comes around. Most CIOs have started to hear about Big Data and based on their position on the spectrum of conservative to progressive they have either started to think about investing or have already started investing. The challenge these CIOs face is not so much whether they should invest into Big Data or not but what they should do with it. Large companies have complex landscapes that serve multiple LOBs and all these LOBs have their own ideas about what they want to get out of Big Data. Most of these LOB executives are even more excited about the potential of Big Data but are less informed about the upstream technical impact and the change of mindset that IT will have to go through to embrace it. But, these LOBs do have a stronger lever - money to spend if they see that technology can help them accomplish something that they could not accomplish before.

As more and more IT executives get excited over the potential of Big Data they are underestimating the challenges to get access to meaningful data in single repository. Data movement has been one of the most painful problems of a traditional BI system and it continues to stay that way for Big Data systems. A vast majority of companies have most of their data locked into their on-premise systems. Not only it is inconvenient but it's actually impractical to move this data to the cloud for the purposes of analyzing it if Big Data platform happens to be a cloud platform. These companies also have a hybrid landscape where a subset of data resides in the cloud inside some of the cloud solutions that they use. It's even harder to get data out from these systems to move it to either a cloud-based or an on-premise Big Data platform. Most SaaS solutions are designed to support ad hoc point-to-point or hub and spoke REST-ful integration but they are not designed to efficiently dump data for external consumption.

Integrating semantics is yet another challenge. As organizations start to combine several data sources the quality as well as the semantics of data still remain big challenges. Managing semantics for single source in itself isn't easy. When you add multiple similar or dissimilar sources to the mix this challenge is further amplified. It has been the job of an application layer to make sense out of underlying data but when that layer goes away the underlying semantics become more challenging.

If you're a vendor you should work hard thinking about business value of your Big Data technology - not what it is to you but what it could do for your customers. The spending pie for customers hasn't changed and coming up with money to spend on (yet another) technology is quite a challenge. My humble opinion on this situation is that vendors have to go beyond technology talk and start understanding the impact of Big Data, understand the magnitude of these challenges, and then educate customers on the potential and especially help them with a business case. I would disagree with people who think that Big Data is a technology play/sale. It is not.

Photo Courtesy: Kurtis Garbutt

Sunday, March 31, 2013

Thrive For Precision Not Accuracy


Jake Porway who was a data scientist at the New York Times R&D labs has a great perspective on why multi-disciplinary teams are important to avoid bias and bring in different perspective in data analysis. He discusses a story where data gathered by Über in Oakland suggested that prostitution arrests increased in Oakland on Wednesdays but increased arrests necessarily didn't imply increased crime. He also outlines the data analysis done by Grameen Foundation where the analysis of Ugandan farm workers could result into the farmers being "good" or "bad" depending on which perspective you would consider. This story validates one more attribute of my point of view regarding data scientists - data scientists should be design thinkers. Working in a multi-disciplinary team to let people champion their perspective is one of the core tenants of design thinking.

One of the viewpoints of Jake that I don't agree with:

"Any data scientist worth their salary will tell you that you should start with a question, NOT the data."

In many cases you don't even know what question to ask. Sometimes an anomaly or a pattern in data tells a story. This story informs us what questions we might ask. I do see that many data scientists start with knowing a question ahead of time and then pull in necessary data they need but I advocate the other side where you bring in the sources and let the data tell you a story. Referring to design, Henry Ford once said, ""Every object tells a story if you know how to read it." Listen to the data—a story—without any pre-conceived bias and see where it leads you.

You can only ask what you know to ask. It limits your ability to unearth groundbreaking insights. Chasing a perfect answer to a perfect question is a trap that many data scientists fall into. In reality what business wants is to get to a good enough answer to a question or insight that is actionable. In most cases getting to an answer that is 95% accurate requires little effort but getting that rest 5% requires exponentially disproportionate time with disproportionately low return.

Thrive for precision, not accuracy. The first answer could really be of low precision. It's perfectly acceptable as long as you know what the precision is and you can continuously refine it to make it good enough. Being able to rapidly iterate and reframe the question is far more important than knowing upfront what question to ask; data analysis is a journey and not a step in the process.

Photo credit: Mario Klingemann

Friday, March 15, 2013

We Got Hacked, Now What?



Hopefully you really have a good answer for this. Getting hacked is no longer a distant probability; it's a harsh reality. The most recent incident was Evernote losing customer information including email addresses and passwords to a hacker. I'm an Evernote customer and I watched the drama unfold from the perspective of an end user.  I have no visibility into what level of security response planning Evernote had in place but this is what I would encourage all the critical services to have:

Prevent

You are as secured as your weakest link; do anything and everything that you can to prevent such incidents. This includes hardening your systems, educating employees on social engineering, and enforce security policies. Broadly speaking there are two kinds of incidents - hijacking of a specific account(s) and getting unauthorizd access to a large set of data. Both of these could be devastating and they both need to prevented differently. In the case of Evernote they did turn on two-factor authentication but it doesn't solve the problem of data being stolen from their systems. Google has done an outstanding job hardening their security to prevent account hijacking. Explore shared-secret options where partial data loss doesn't lead to compromised accounts.

Mitigate

If you do get hacked, is your system instrumented to respond to such an incident? It includes locking acconts down, taking critical systems offline, assess the extent of damage etc. In the case of Evernote I found out about the breach from Twitter long before Evernote sent me an email asking to change the password. This approach has a major flaw: if someone already had my password (hard to decrypt a salted and hashed value but still) they could have logged in and changed the password and would have had full access to my account. And, this move—logging in and changing the password—wouldn't have raised any alarms on the Evernote side since that's exactly what they would expect users to do. A pretty weak approach. A slightly better way would have been to ask users to reset the password and then follow up with an email verification process before users could access the account.

Manage

If the accounts did get hacked and the hackers did get control over certain accounts and got access to certain sensitive information what would you do? Turns out the companies don't have a good answer or any answer for this. They just wish such things won't happen to them. But, that's no longer true. There have been horror stories on people losing access to their Google accounts. Such accounts are further used for malicious activities such as sending out emails to all contacts asking to wire you money due to you being robbed in . Do you have a multi-disciplinary SWAT team—tech, support, and communication—identified when you end up in such a situation? And, lastly, have you tested your security response? Impact of many catastrophes, natural or otherwise, such as flood earthquakes, and terrorist attacks can be reduced if people were prepared to anticipate and respond. Getting hacked is no different.

Photo courtesy: Daniele Margaroli

Thursday, February 28, 2013

A Data Scientist's View On Skills, Tools, And Attitude



I recently came across this interview (thanks Dharini for the link!) with Nick Chamandy, a statistician a.k.a a data scientist at Google. I would encourage you to read it; it does have some great points. I found the following snippets interesting:

Recruiting data scientists:
When posting job opportunities, we are cognizant that people from different academic fields tend to use different language, and we don’t want to miss out on a great candidate because he or she comes from a non-statistics background and doesn’t search for the right keyword. On my team alone, we have had successful “statisticians” with degrees in statistics, electrical engineering, econometrics, mathematics, computer science, and even physics. All are passionate about data and about tackling challenging inference problems.
I share the same view. The best scientists I have met are not statisticians by academic training. They are domain experts and design thinkers and they all share one common trait: they love data! When asked how they might build a team of data scientists I highly recommend people to look beyond traditional wisdom. You will be in good shape as long as you don't end up in a situation like this :-)

Skills:
The engineers at Google have also developed a truly impressive package for massive parallelization of R computations on hundreds or thousands of machines. I typically use shell or python scripts for chaining together data aggregation and analysis steps into “pipelines.”
Most companies won't have the kind of highly skilled development army that Google has but then not all companies would have Google scale problem to deal with. Though I suggest two things: a) build a very strong community of data scientists using social tools so that they can collaborate on challenges and tools they use b) make sure that the chief data scientist (if you have one) has very high level of management buy-in to make things happen otherwise he/she would be spending all the time in "alignment" meetings as opposed to doing the real work.

Data preparation:
There is a strong belief that without becoming intimate with the raw data structure, and the many considerations involved in filtering, cleaning, and aggregating the data, the statistician can never truly hope to have a complete understanding of the data.
I disagree. I do strongly believe the tools need to involve to do some of these things and the data scientists should not be spending their time to compensate for the inefficiencies of the tools. Becoming intimate with the data—have empathy for the problem—is certainly a necessity but spending time on pulling, fixing, and aggregating data is not the best use of their time.

Attitude:
To me, it is less about what skills one must brush up on, and much more about a willingness to adaptively learn new skills and adjust one’s attitude to be in tune with the statistical nuances and tradeoffs relevant to this New Frontier of statistics.
As I would say bring tools and knowledge but leave bias and expectations aside. The best data scientists are the ones who are passionate about data, can quickly learn a new domain, and are willing to make and fail and fail and make.

Image courtesy: xkcd

Friday, February 15, 2013

Commoditizing Data Science



My ongoing conversations with several people continue to reaffirm my belief that Data Science is still perceived to be a sacred discipline and data scientists are perceived to be highly skilled statisticians who walk around wearing white lab coats. The best data scientists are not the ones who know the most about data but they are the ones who are flexible enough to take on any domain with their curiosity to unearth insights. Apparently this is not well-understood. There are two parts to data science: domain and algorithms or in other words knowledge about the problem and knowledge about how to solve it.

One of the main aspects of Big Data that I get excited about is an opportunity to commoditize this data science—the how—by making it mainstream.

The rise of interest in Big Data platform—disruptive technology and desire to do something interesting about data—opens up opportunities to write some of these known algorithms that are easy to execute without any performance penalty. Run K Means if you want and if you don't like the result run Bayesian linear regression or something else. The access to algorithms should not be limited to the "scientists," instead any one who wants to look at their data to know the unknown should be able to execute those algorithms without any sophisticated training, experience, and skills. You don't have to be a statistician to find a standard deviation of a data set. Do you really have to be a statistician to run a classification algorithm?

Data science should not be a sacred discipline and data scientists shouldn't be voodoos.

There should not be any performance penalty or an upfront hesitation to decide what to do with data. People should be able to iterate as fast as possible to get to the result that they want without worrying about how to set up a "data experiment." Data scientists should be design thinkers.

So, what about traditional data scientists? What will they do?

I expect people that are "scientists" in a traditional sense would elevate themselves in their Maslow's hierarchy by focusing more on advanced aspects of data science and machine learning such as designing tools that would recommend algorithms that might fit the data (we have already witnessed this trend for visualization). There's also significant potential to invent new algorithms based on existing machine learning algorithms that have been into existence for a while. What algorithms to execute when could still be a science to some extent but that's what the data scientists should focus on and not on sampling, preparing, and waiting for hours to analyze their data sets. We finally have Big Data for that.

Image courtesy: scikit-learn

Thursday, January 31, 2013

Empathize Not Sympathize

Many enterprise software vendors sympathize. "We know it's a bad experience" or "We will fix the usability." One of the reasons the software is not usable is because the makers never had any empathy for the end users who would use it. In many cases the makers didn't even know who their end users were; they only knew who would buy the software. As far as enterprise software is concerned people who write checks don't use the software and people who use software don't write checks and have a little or no influence in what gets bought. Though the dynamics are now changing.

Usability is the last step; it's about making software usable for the tasks that it is designed for. It's not useful at all when the software is designed to solve a wrong problem. Perfectly usable software could be completely useless.


It's the job of a product manager, designer, and a developer to assess the end user needs—have empathy for them—and then design software that meets or exceeds their needs in a way that is usable. That way they don't have to sympathize later on.

Design Thinking encourages people to stay in the problem space for a longer duration without jumping to a solution. What problem is being solved—needs—is far more important than how it is solved—usability. Next time you hear someone say software is not usable, ask whether it's the what or how. The how part is relatively easy to fix, what part is not. For fixing the "what" you need to have empathy for your end users and not sympathy.