Wednesday, January 16, 2013

A Journey From SQL to NoSQL to NewSQL


Two years back I wrote that the primary challenge with NoSQL is that it's not SQL. SQL has played a huge rule in making relational databases popular for the last forty years or so. Whenever the developers wanted to design an(y) application they put an RDBMS underneath and used SQL from all possible layers. Over a period of time, the RDBMS grew in functions and features such as binary storage, faster access, clusters, sophisticated access control etc. and the applications reaped these benefits. The traditional RDBMS became a non-fit for cloud-scale applications that fundamentally required scale at whole different level. Traditional RDBMS could not support this scale and even if they could it became prohibitively expensive for the developers to use it. Traditional RDBMS also became too restrictive due to their strict upfront schema requirements that are not suitable for modern large scale consumer web and mobile applications. Due to these two primary reasons and a lot more other reasons we saw the rise of NoSQL. The cloud movement further fueled this growth and we started to see a variety of NoSQL offerings.

Each NoSQL store is unique in which how a programmer would access it. NoSQL did solve the scalability and flexibility problems of a traditional database, but introduced a set of new problems, primary ones being lack of ubiquitous access and consistency options, especially for OLTP workload, for schema-less data stores.

This has now led to the movement of NewSQL (a term initially coined by Mat Aslett in 2011) whose working definition is: "NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for OLTP workloads while still maintaining the ACID guarantees of a traditional single-node database system." NewSQL's focus appears to be on gaining performance and scalability for OLTP workload by supporting SQL as well as custom programming models and eliminating cumbersome error-prone management tasks such as manual sharding without breaking the bank. It's a good first step in the direction of a scalable distributed database that supports SQL. It doesn't say anything about mixed OLTP and OLAP workload which is one of the biggest challenges for the organizations who want to embrace Big Data.

From SQL to NoSQL to NewSQL, one thing that is common: SQL.

Let's not underestimate the power of a simple non-procedural language such as SQL. I believe the programmers should focus on what (non-procedural such as SQL) and not how. Exposing "how" invariably ends up making the system harder to learn and harder to use. Hadoop is a great example of this phenomenon. Even though Hadoop has seen widespread adoption it's still limited to silos in organizations. You won't find a large number of applications that are exclusively written for Hadoop. The developers first have to learn how to structure and organize data that makes sense for Hadoop and then write an extensive procedural logic to operate on that dataset. Hive is an effort to simplify a lot of these steps but it still hasn't gained desired populairty. The lesson here for the NewSQL vendors is: don't expose the internals to the applications developers. Let a few developers that are closer to the database deal with storing and configuring the data but provide easy ubiquitous access to the application developers. The enterprise software is all about SQL. Embracing, extending, and augmenting SQL is a smart thing to do. I expect all the vendors to converge somewhere. This is how RDBMS and SQL grew. The initial RDBMS were far from being perfect but SQL always worked and the RDBMS eventually got better.

Distributed databases is just one part of the bigger puzzle. Enterprise software is more about mixing OLAP and OLTP workload. This is the biggest challenge. SQL skills and tools are highly prevalent in this ecosystem and more importantly people have SQL mindset that is much harder to change. The challenge to vendors is to keep this abstraction intact and extend it without exposing the underlying architectural decisions to the end users.

The challenge that I threw out a couple of years back was:

"Design a data store that has ubiquitous interface for the application developers and is independent of consistency models, upfront data modeling (schema), and access algorithms. As a developer you start storing, accessing, and manipulating the information treating everything underneath as a service. As a data store provider you would gather upstream application and content metadata to configure, optimize, and localize your data store to provide ubiquitous experience to the developers. As an ecosystem partner you would plug-in your hot-swappable modules into the data stores that are designed to meet the specific data access and optimization needs of the applications."

We are not there, yet, but I do see  signs of convergence. As a Big Data enthusiast I love this energy. Curt Monash has started his year blogging about NewSQL. I have blogged about a couple of NewSQL vendors, NimbusDB (NuoDB) and GenieDB, in the past and I have also discussed the challenges with the OLAP workload in the cloud due to its I/O intensive nature. I am hoping that NewSQL will be inclusive of OLAP and keep SQL their first priority. The industry is finally on to something and some of these start-ups are set out to disrupt in a big way.

Photo Courtesy: Liz

8 comments:

Mazhar Iqbal Rana said...

How ia this referrring to cloud storage or best cloud stroage service? Iasnt it coming under develoment umbrella?

Just have a look at http://www.cloudstoragebest.com/. Tis would tewll you what actually clud is instead of psoting irrlevant suff on cloud bloggers.

john said...

great article...
about cloud computing you can also follow here...
http://webtech.frontplayers.com/what-is-cloud-computing/

marksmith.bvs said...

Cloud Computing provides reliable hosting platform. Specially Clod Storage provides huge spaces.

Cloud Computing

Ulf Wendel said...

I agree that the key is in using a declarative language: say what data you want, don't say how it shall be accessed. That lesson seems to spread more and more in NoSQL land. For example, see Google F1. Unfortunately standard ISO SQL is no good match for schema-less, loosely typed data stores. SQL:1999/SQL:2003 have eNF extensions for handling nested data but it gets tricky when talking about types. And, the query statements using standard-compliant SQL don't look "nice" to a novice.

Martin Farach-Colton said...

Great post. I couldn't agree with more about OLTP vs OLAP. I have my own indexing-centric take on how the great divide arose between these two approaches, and what needs to happen to fix the situation. Check out an interview I did on ODBMS Industry Watch (http://www.odbms.org/blog/2012/10/scaling-mysql-and-mariadb-to-tbs-interview-with-martin-farach-colton/).

Cloudways Hosting said...

Cloud computing is about maintaining applications and data through the central remote servers and the internet. With the help of this technology, consumers can use various applications and can have access to their personal folders at any system which has an internet access. The main idea behind this technology is to make the entire process of computing, centralized and efficient.

Cloud Hosting I Drupal Hosting I Managed Cloud Computing

cloud ways said...

There are numerous hosting solutions available in the market but few are acquiescent for development environment in comparasion to Drupal hosting. Drupal hosting had humble origins and started out as a message board but due to its ease of use and user friendly features it garnered the attention of developers community.

Drupal Hosting I Hosting Drupal I Drupal Web Hosting

Puneet Srivastava said...

Nice Post ... Agreed with all the facts except few ...
Ati-Erp Inc.