Friday, February 15, 2013

Commoditizing Data Science



My ongoing conversations with several people continue to reaffirm my belief that Data Science is still perceived to be a sacred discipline and data scientists are perceived to be highly skilled statisticians who walk around wearing white lab coats. The best data scientists are not the ones who know the most about data but they are the ones who are flexible enough to take on any domain with their curiosity to unearth insights. Apparently this is not well-understood. There are two parts to data science: domain and algorithms or in other words knowledge about the problem and knowledge about how to solve it.

One of the main aspects of Big Data that I get excited about is an opportunity to commoditize this data science—the how—by making it mainstream.

The rise of interest in Big Data platform—disruptive technology and desire to do something interesting about data—opens up opportunities to write some of these known algorithms that are easy to execute without any performance penalty. Run K Means if you want and if you don't like the result run Bayesian linear regression or something else. The access to algorithms should not be limited to the "scientists," instead any one who wants to look at their data to know the unknown should be able to execute those algorithms without any sophisticated training, experience, and skills. You don't have to be a statistician to find a standard deviation of a data set. Do you really have to be a statistician to run a classification algorithm?

Data science should not be a sacred discipline and data scientists shouldn't be voodoos.

There should not be any performance penalty or an upfront hesitation to decide what to do with data. People should be able to iterate as fast as possible to get to the result that they want without worrying about how to set up a "data experiment." Data scientists should be design thinkers.

So, what about traditional data scientists? What will they do?

I expect people that are "scientists" in a traditional sense would elevate themselves in their Maslow's hierarchy by focusing more on advanced aspects of data science and machine learning such as designing tools that would recommend algorithms that might fit the data (we have already witnessed this trend for visualization). There's also significant potential to invent new algorithms based on existing machine learning algorithms that have been into existence for a while. What algorithms to execute when could still be a science to some extent but that's what the data scientists should focus on and not on sampling, preparing, and waiting for hours to analyze their data sets. We finally have Big Data for that.

Image courtesy: scikit-learn

Thursday, January 31, 2013

Empathize Not Sympathize

Many enterprise software vendors sympathize. "We know it's a bad experience" or "We will fix the usability." One of the reasons the software is not usable is because the makers never had any empathy for the end users who would use it. In many cases the makers didn't even know who their end users were; they only knew who would buy the software. As far as enterprise software is concerned people who write checks don't use the software and people who use software don't write checks and have a little or no influence in what gets bought. Though the dynamics are now changing.

Usability is the last step; it's about making software usable for the tasks that it is designed for. It's not useful at all when the software is designed to solve a wrong problem. Perfectly usable software could be completely useless.


It's the job of a product manager, designer, and a developer to assess the end user needs—have empathy for them—and then design software that meets or exceeds their needs in a way that is usable. That way they don't have to sympathize later on.

Design Thinking encourages people to stay in the problem space for a longer duration without jumping to a solution. What problem is being solved—needs—is far more important than how it is solved—usability. Next time you hear someone say software is not usable, ask whether it's the what or how. The how part is relatively easy to fix, what part is not. For fixing the "what" you need to have empathy for your end users and not sympathy.

Wednesday, January 16, 2013

A Journey From SQL to NoSQL to NewSQL


Two years back I wrote that the primary challenge with NoSQL is that it's not SQL. SQL has played a huge rule in making relational databases popular for the last forty years or so. Whenever the developers wanted to design an(y) application they put an RDBMS underneath and used SQL from all possible layers. Over a period of time, the RDBMS grew in functions and features such as binary storage, faster access, clusters, sophisticated access control etc. and the applications reaped these benefits. The traditional RDBMS became a non-fit for cloud-scale applications that fundamentally required scale at whole different level. Traditional RDBMS could not support this scale and even if they could it became prohibitively expensive for the developers to use it. Traditional RDBMS also became too restrictive due to their strict upfront schema requirements that are not suitable for modern large scale consumer web and mobile applications. Due to these two primary reasons and a lot more other reasons we saw the rise of NoSQL. The cloud movement further fueled this growth and we started to see a variety of NoSQL offerings.

Each NoSQL store is unique in which how a programmer would access it. NoSQL did solve the scalability and flexibility problems of a traditional database, but introduced a set of new problems, primary ones being lack of ubiquitous access and consistency options, especially for OLTP workload, for schema-less data stores.

This has now led to the movement of NewSQL (a term initially coined by Mat Aslett in 2011) whose working definition is: "NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for OLTP workloads while still maintaining the ACID guarantees of a traditional single-node database system." NewSQL's focus appears to be on gaining performance and scalability for OLTP workload by supporting SQL as well as custom programming models and eliminating cumbersome error-prone management tasks such as manual sharding without breaking the bank. It's a good first step in the direction of a scalable distributed database that supports SQL. It doesn't say anything about mixed OLTP and OLAP workload which is one of the biggest challenges for the organizations who want to embrace Big Data.

From SQL to NoSQL to NewSQL, one thing that is common: SQL.

Let's not underestimate the power of a simple non-procedural language such as SQL. I believe the programmers should focus on what (non-procedural such as SQL) and not how. Exposing "how" invariably ends up making the system harder to learn and harder to use. Hadoop is a great example of this phenomenon. Even though Hadoop has seen widespread adoption it's still limited to silos in organizations. You won't find a large number of applications that are exclusively written for Hadoop. The developers first have to learn how to structure and organize data that makes sense for Hadoop and then write an extensive procedural logic to operate on that dataset. Hive is an effort to simplify a lot of these steps but it still hasn't gained desired populairty. The lesson here for the NewSQL vendors is: don't expose the internals to the applications developers. Let a few developers that are closer to the database deal with storing and configuring the data but provide easy ubiquitous access to the application developers. The enterprise software is all about SQL. Embracing, extending, and augmenting SQL is a smart thing to do. I expect all the vendors to converge somewhere. This is how RDBMS and SQL grew. The initial RDBMS were far from being perfect but SQL always worked and the RDBMS eventually got better.

Distributed databases is just one part of the bigger puzzle. Enterprise software is more about mixing OLAP and OLTP workload. This is the biggest challenge. SQL skills and tools are highly prevalent in this ecosystem and more importantly people have SQL mindset that is much harder to change. The challenge to vendors is to keep this abstraction intact and extend it without exposing the underlying architectural decisions to the end users.

The challenge that I threw out a couple of years back was:

"Design a data store that has ubiquitous interface for the application developers and is independent of consistency models, upfront data modeling (schema), and access algorithms. As a developer you start storing, accessing, and manipulating the information treating everything underneath as a service. As a data store provider you would gather upstream application and content metadata to configure, optimize, and localize your data store to provide ubiquitous experience to the developers. As an ecosystem partner you would plug-in your hot-swappable modules into the data stores that are designed to meet the specific data access and optimization needs of the applications."

We are not there, yet, but I do see  signs of convergence. As a Big Data enthusiast I love this energy. Curt Monash has started his year blogging about NewSQL. I have blogged about a couple of NewSQL vendors, NimbusDB (NuoDB) and GenieDB, in the past and I have also discussed the challenges with the OLAP workload in the cloud due to its I/O intensive nature. I am hoping that NewSQL will be inclusive of OLAP and keep SQL their first priority. The industry is finally on to something and some of these start-ups are set out to disrupt in a big way.

Photo Courtesy: Liz

Thursday, December 27, 2012

Minimize Regrets And Not Failures



While I ponder on 2012 and plan for 2013, I always keep the regret minimization framework (watch the short video clip above) in back of my mind. Of course luck plays a huge part in people's success, but we owe it a lot to Jeff Bezos. We probably wouldn't have seen Amazon.com and we most certainly would not have seen EC2. No one predicted anything about Amazon being a key cloud player. A few years back Twitter didn't exist and Facebook was limited to college kids. I do make plans but I have stopped predicting since I will most certainly get it wrong.

"Plans are useless, but planning is indispensable." - Dwight Eisenhower

I use regret minimization framework not only as a long-term thinking tool but also to make decisions in short-term. It helps me assess, prioritize, and focus on right opportunities. While long-term thinking is a good thing, I strongly believe in setting short term goals, meeting them, and more importantly cherishing them. If you're not minimizing regret you're minimizing fear of failures. I don't fear failures, I celebrate them; they're a learning opportunity. As Bill Cosby put it, "In order to succeed, your desire for success should be greater than your fear of failure."

All the best with your introspection and indispensable planning for 2013. Focus on the journey, the planning, and not the destination, the plan.

Tuesday, December 18, 2012

Objectively Inconsistent




During his recent visit to the office of 37 Signals, Jeff Bezos said, "to be consistently objective, one has to be objectively inconsistent." I find this perspective very refreshing that is applicable to all things and all disciplines in life beyond just product design. As a product designer you need to have a series of point of views (POV) that would be inconsistent when seen together but each POV at any given time will be consistently objective. This is what design thinking, especially prototyping is all about. It shifts a subjective conversation between people to an objective conversation about a design artifact.

As I have blogged before I see data scientists as design thinkers. Most data scientists that I know of have knowledge-curse. I would like them to be  consistently objective by going through the journey of analyzing data without any pre-conceived bias. The knowledge-curse makes people commit more mistakes. It also makes them defend their POV instead of looking for new information and have courage to challenge and change it. I am a big fan of work of Daniel Kahneman. I would argue that prototyping helps deal with what Kahneman describers as "cognitive sophistication."
The problem with this introspective approach is that the driving forces behind biases—the root causes of our irrationality—are largely unconscious, which means they remain invisible to self-analysis and impermeable to intelligence.
This very cognitive sophistication works against people who cannot self-analyze themselves and be critical to their own POV. Prototyping brings in objectivity and external validation to eliminate this unconscious-driven irrationality. It's fascinating what happens when you put prototypes in the hands of users. They interact with it in unanticipated ways. These discoveries are not feasible if you hold on to single POV and defend it.

Let it go. Let the prototype speak your design—your product POV—and not your unconscious.

Photo courtesy: New Yorker

Friday, November 30, 2012

Enterprise Software Needs Flow And Not Gamification



I don't believe in gamifying enterprise applications. As I have argued before, the primary drivers behind revenue and valuation of consumer software companies are number of users, traffic (unique views), and engagement (average time spent + conversion). This is why gamification is critical to consumer applications since it is an effort to increase the adoption of an application amongst the users and maintain the stickiness so that the users keep coming back and enjoy using the application. This isn't true for enterprise applications at all. This is not only not true for enterprise applications, but gamifying enterprise applications is couterproductive that makes existing task more complex and creates an artificial carrot that does not quite work.

A design philosophy that we really need for enterprise applications is flow. I am a big fan of Mihaly Csikszentmihalyi and his book "Flow: The Psychology of Optimal Experience." I would highly recommend you to read it. Mihaly describes flow as a series of autotelic experiences as an activity that consumes us and becomes intrinsically rewarding. The core intent of gamification is to make the applications a pleasure to use. What people really want is enjoyment and not just pleasure. They are different. Enjoyment is about moving forward and accomplishing something. Enjoyment happens due to unusual investment of attention. It comes from tasks that you have a chance to complete, has clear goals, provides feedback, and makes you lose your self-consciousness.

All the gamification efforts by new innovative entrants that I see seem to be disproportionately focused on "edge" applications since it's relatively easy for an entrant to break into edge applications to beat an incumbent as opposed to redesigning a core application. But most users I know spend their lives using the core systems. They have no intrinsic or extrinsic motivation to use these systems. Integrate flow in these systems to create intrinsic rewards that creates autotelic experiences. Application designers have traditionally ignored flow since it's a physical element that is external to an application, but life and social status extend beyond the digital life and enterprise applications. You get to be known as that finance guy or that marketing gal who is really awesome at work and helps people with their problems to get work done. Needless to say, helping people and getting work done are intrinsically rewarding. Help these people with their core activities and make non-core activities as minimum or transparent as possible. If I am hiking, make my drive to the trail head as easy as possible but make my hike as rewarding as possible. That should be the design principle of how you integrate flow into enterprise applications. Also, focus on perpetual intermediaries; design applications to reduce or eliminate learning curve but introduce users to advanced features as they make progress to increase their productivity on performing repeated tasks. This helps create an intrinsic reward of having learned and mastered a system. As people learn new things they become more complex and unique human beings, and believe it or not, you can influence that in your design of your enterprise software that they spend their lives using it.

Photo Courtesy: Mark Chadwick

Tuesday, November 20, 2012

5 Tips On How To Network Effectively At Conferences


I go to a lot of conferences and quite a few people, including the ones that I mentor, have asked me how they can effectively network at a conference. Here are five simple but effective tips. Start practicing them at local meetups and refine them for large conferences.

Connect before the conference: 

Your networking efforts should start as soon as you decide to go to a conference or even before that. Go through the speaker list and search Twitter exhaustively to find and follow these folks, either directly or via a list. Interact with these speakers on Twitter to ask them meaningful questions. Also ask them if you can have 5 minutes of their time at the conference. Look up on LinkedIn and Plancast to identify who is going to be at the conference. Ask the organizer to send you a list of attendees. Some organizers would happily oblige. If any of these folks sound interesting, follow them on Twitter and reach out to them with a request to see them at the conference. Be specific about why you would want to see them. Do your homework to get up to the speed on some of the topics that you're interested in hearing more about at the conference. Use the conference sessions to enrich yourself and not to educate.

Be smart with your time:

Design your agenda upfront and put the sessions that you want to go to on your calendar. Spend your time wisely by not going to too many sessions. On an extreme, for certain conferences, I would suggest not to go to any sessions, at all. Differentiate between content and inspirational sessions - ask yourself why you are there. Once you sit down, you're in a zombie mode receiving content. Some speakers and panelists are good and some are not. Don't hesitate to leave or join a session in the middle. I closely monitor my Twitter stream in real-time based on a conference hashtag. If I see tweets from people praising other sessions, I walk out and go there. For asking questions, the worst time to approach a speaker is right before and right after the session. You're competing for his/her attention. Find (don't stalk) the person later on during a conference and follow up with your questions. I have sent emails to the speakers after the sessions and have received great responses.

Don't waste your time watching pitches of a vendor in the exhibit area or talking to a marketing guy/gal for the purposes of gathering information. You should research the products of vendors ahead of time and have a list of exhibits that you want to visit. Write down what you want to know and who you want to meet. Go to the booth and ask them those specific questions or demand to see a specific person. Even better, set up appointments ahead of time. If they can't answer your question or if you don't get to see the person you wanted to see, leave your business card and ask them to reach out to you. Don't become a victim of meaningless marketing and a sales pitch. Your time at a conference is far more valuable than that.

Don't miss coffee breaks and cocktail receptions:

Meet any and all people you can. Have meaningful conversations. Offer them to help and ask for help. The experts don't become experts merely based on what they think; they extensively collect information from other people and synthesize that to form a point of view. Ask yourself how you might be able to help them so that they can help you. Use your smartphone to send them a LinkedIn invitation while you are at the conference and take some notes of the conversation that you had. I typically use the back of the business card (that I receive) to take notes. Use Highlight to instrument and take advantage of serendipity.

Do not run out of business cards:

I have come across people during a conference telling me they don't have their business cards. If they are not lying, it's just ridiculous. You should never run out of business cards at a conference, ever. Keep them in your bag and keep them in your coat pocket. I even have a designated coat pocket to keep my business cards so that I don't have to shuffle things to look for one. I also use another designated pocket to collect business cards that I receive. I also keep a pen in my coat to take notes on the business cards. I keep two sets of business cards, on that has my cell phone on it and the other that has my land line on it. I never use my landline to take any incoming calls, only voicemail. If you want the person to call you, give them the ones with the mobile number on it. If not, give them the other card. Instead of a landline number you can also use a Google Voice number. Print a small QR code on your business card that directs people to your website which could be your LinkedIn page, about.me page, or your blog. Make it easy for people to find you and know more about you. Needless to say, you should have a fairly detailed profile on the internet before you decide to go to a conference. If your company doesn't allow you to print your personal social media details on your company business cards, keep two sets of cards - the business as well as the personal.

Follow-up after the conference: 

This is the biggest mistake that I always see people make. Once you're back from a conference, you have only accomplished 50% of your task. Follow up with all the people whom you met. Send them emails with enough relevant information to jog their memory. The influential people meet a lot of people during a conference. So, don't just say hi, but go back to you notes and refer to a very specific conversation you had with them. Ask them if it would be okay to follow up with them. Typically no one says no, but you should ask. This gives you a right to send them a second email. Do NOT call them even if you have their phone number. That's what sales people do. Some people scan the business cards they receive using cloud-based services such as Cloud Contacts. If it works for you, do it. I don't.

Read the analysis and coverage of the conference by thought leaders and bloggers i.e. do read what I write :-). Compare and contrast your views. Comment on their blogs and tweets and continue interacting with them. Even better, create a Storify of what you liked the most. Use delicious to tag all the research material that you went through. Share your delicious tags on Twitter and let people add to it.