1) Failing to design for rollback
"...you can only make one tweak to your current process, make it so that you can always roll back any code changes..."
This is a universal truth for any design decision for a platform irrespective of the delivery model, SaaS or on-premise. eBay makes it a good case study to understand the code change management process called "trains" that can track down code in a production system for a specific defect and can roll back only those changes. A philosophical mantra for the architects and developers would be not to make any decisions that are irreversible. Framing it positively prototype as fast as you can, fail early and often, and don't go for a big bang design that you cannot reverse. Eventually the cumulative efforts would lead you to a sound and sustainable design.
2) Confusing product release with product success
"...Do you have “release” parties? Don’t — you are sending your team the wrong message. A release has little to do with creating shareholder value..."
I would not go to the extreme of celebrating only customer success and not release milestones. Product development folks do work hard towards a release and a celebration is a sense of accomplishment and a motivational factor that has indirect shareholder value. I would instead suggest a cross-functional celebration. Invite the sales and marketing people to the release party. This helps create empathy for the people in the field that developers and architects never or rarely meet and this could also be an opportunity for the people in the field to mingle, discuss, and channel customer's perspective. Similarly include non-field people while celebrating field success. This helps developers, architects, and product managers understand their impact on the business and an opportunity to get to know who actually bought and started using their products.
5) Scaling through third parties
"....If you’re a hyper-growth SaaS site, you don’t want to be locked into a vendor for your future business viability..."
I would argue otherwise. A SaaS vendor or any other platform vendor should really focus on their core competencies and rely on third parties for everything that is non-core.
"Define how your platform scales through your efforts, not through the systems that a third-party vendor provides."
This is partially true. SaaS vendors do want to use Linux, Apache, or JBoss and still be able to describe the scalability of a platform in the context of these external components (that are open source in this case). The partial truth is you still can use the right components the wrong way and not scale. My recommendation to a platform vendor would be to be open and tell their customers why and how they are using the third party components and how it helps them (the vendor) to focus on their core and hence helps customers get the best out of their platform. A platform vendor should share the best practices and gather feedback from customers and peers to improve their own processes and platform and pass it on to third parties to improve their components.
6) Relying on QA to find your mistakes:
"QA is a risk mitigation function and it should be treated as such"
The QA function has always been underrated and misunderstood. QA's role extends way beyond risk mitigation. You can only fix defects that you can find and yes I agree that mathematically it is impossible to find all the defects. That's exactly why we need QA people. The smart and well-trained QA people think differently and find defects that developers would have never imagined. The QA people don't have any code affinity and selection bias and hence they can test for all kinds of conditions that otherwise would have been missed out. Though I do agree that the developers should put themselves in the shoes of the QA people and make sure that they rigorously test their code, run automated unit tests, and code coverage tools and not just rely on QA people to find defects.8) Not taking into account the multiplicative effect of failure:
"Eliminate synchronous calls wherever possible and create fault-isolative architectures to help you identify problems quickly."
No synchronous calls and swimlane architecture are great concepts but a vendor should really focus on automated recovery and self-healing and not just failure detection. A failure detection could help vendor isolate a problem and help mitigate the overall impact of that failure on the system but for a competitive SaaS vendor that's not good enough. Lowering MTBF is certainly important but lowering MDT (Mean down time) is even more important. A vendor should design a platform based on some of the autonomic computing fundamentals.
10) Not having a business continuity/disaster recovery plan:
"Even worse is not having a disaster recovery plan, which outlines how you will restore your site in the event a disaster shuts down a critical piece of your infrastructure, such as your collocation facility or connectivity provider."
Having a disaster plan is like posting a sign by an elevator instructing people not to use it when there is a fire. Any disaster recovery plan is, well, just a plan unless it is regularly tested, evaluated, and refined. Fire drills and post-drill debriefs are a must-have.
I will describe some of the design and architectural must have characteristics of a SaaS platform in the part 2 of this post.