Skip to content

A Perspective for More Accurately Estimating Cloud App Development Costs

December 15, 2014

My previous blog article “Build Cloud Apps that Deliver Superior Business Value” of March 16, 2014 lists 4 references that were quite helpful in shifting my perspective to understanding what it takes to build scalable, failsafe “cloud-native apps”. A “cloud native application” is an app whose architecture and design has been guided by software engineering practices repeatedly used in highly successful cloud apps. [Wilder, p ix]. The body of knowledge required to build cloud-native apps is generally described in the 4 sources listed in my previous blog article.

Upon reading these 4 sources, plus others I have read since, it becomes clear that it takes significantly more effort to develop failsafe, robust cloud-native apps when compared to the effort typically involved in developing a functionally similar normal app hosted in a data center. Beyond failsafe, it takes even more development effort to make the cloud-native app highly scalable.

To aid in more accurate estimation of the development effort for cloud-native apps, this article aims to facilitate understanding the root causes of why that development effort is significantly larger. Plus it also identifies explicit areas requiring a larger effort.

As developers coming from developing normal apps that run in data centers, we have all developed a set of expectations that guide our ideas of what it takes to develop apps. In early 2014 as I began coming up to speed on developing apps for Azure I found my “normal data center” expectations being violated time after time due to the following underlying characteristics of the cloud: Multi-tenancy, Commodity Hardware, and Programmatic Error Compensation versus Transactions to Deal with Errors. Thanks to Bill Wilder’s Cloud Architecture Patterns for providing some of these categories. Below is a sketch of how each category impacts the effort required to develop robust cloud-native apps.


We often think of multi-tenancy as in an SaaS app, where multiple organizations share the capabilities of the app such that a number of users in each organization can use the SaaS app concurrently without “getting in each other’s way”. The same multi-tenancy concept is used for many of the basic services offered by cloud platforms as well.

Do you think that PaaS load balancer your cloud app is using belongs only to you? No. Likely it is a multi-tenant load balancing cloud service, shared by other cloud apps as well. The same applies to many other PaaS features, like worker roles, web roles, data storage, identity management, etc. [Wilder, pp 77 – 79]. It may well apply to IaaS features as well.

So, when an individual multi-tenant resource becomes overloaded and slows down, or becomes temporarily unavailable due to the cloud “fabric” itself shifting some of the overloaded resource’s users to another less loaded resource behind the scenes, what do you need to do to ensure your cloud app remains robust and responsive?

Design, write, and test code in your app to deal with this situation. Some of the patterns listed in the previous blog article’s 4 references that deal with this are: Auto-Scaling, Busy Signal, Throttling, Retry, and Circuit Breaker for starters.

Commodity Hardware

Our data centers and much of their hardware have been designed to minimize the Mean Time Between Failure (MTBF). We don’t want our hardware crashing so we use fail over hardware designs. We use high-end (and expensive) hardware like RAID disk drives, etc. As a result, we have deep seated expectations that hardware failures are rather rare. And our software designs and development techniques reflect this expectation.

One main reason cloud computing is often more cost efficient than data center computing is that clouds rely upon commodity hardware that has a high value-to-cost ratio.   But, there is a higher probability that commodity hardware will fail more often. Therefore, cloud computing focuses on minimizing the Mean Time to Recovery (MTTR) rather than minimizing the MTBF [Wilder, pp 79 – 82]. This is a completely different dynamic from what developers are used to in data center apps.

So what do you need to do to make your cloud app robust (not crash) and responsive in the face of significantly more frequent hardware failures?

Design, write, and test code in your app to deal with this situation. Some of the patterns listed in the previous blog article’s 4 references that deal with this are: Node Failure Pattern, Busy Signal, Retry, Circuit Breaker, and perhaps Health Endpoint Monitoring for starters.

Programmatic Error Compensation versus Transactions to Deal with Errors

Most of us routinely use transactions as a way of compensating for errors when developing traditional apps. An exception occurs. No problem, just design the software to rollback the transaction that wrapped the operation in progress and all the changes thus far made are gone. Easy!

In the cloud one cannot always rely upon transactions as the common means for compensating for errors. Why? Part of the reason is that some cloud resources do not support transactions! Nada. You will need to closely examine this when selecting the kinds of resources you plan to use. Does that data storage support transactions? Maybe not! How about that queue you want to use? Some do not support transactions, while others support them only in certain limited configurations.

Another part of the reason for not using transactions is the Eventual Consistency that is common in the cloud. How can one do a transaction on an operation that involves an eventually consistent piece of data? Hmmm… For more on Eventual Consistency and Data Consistency please read Cloud Architecture Patterns or Cloud Design Patterns: Perscriptive Architecture Guidance for Cloud Applications

Without transactions to do error compensation automatically, how will you compensate for errors?

Design, write, and test code in your app to deal with this situation. Some of the patterns listed in the previous blog article’s 4 references that deal with this are: The Compensating Transaction pattern and the Schedule Agent Supervisor pattern, plus the primers concerning consistency. These 2 articles are also key: “Failsafe: Guidance for Resilient Cloud Architectures” and “Best Practices for the Design of Large-Scale Services on Windows Azure Cloud Services”

This is only a partial list of areas that require more code to be developed in a cloud-native app. Other areas are Scalability (including the awesome capability to auto-scale), Health Monitoring, Instrumentation and Telemetry, and Service Metering. Perhaps throw in deployment as well.

Bill Wilder summarizes the situation by saying “Architecting to deal with failure is part of what distinguishes a cloud-native application from a traditional application” [Wilder, p 82].

Above we have seen that cloud-native apps need a substantial amount of code to be designed, written and tested that is not required in a normal app running in a data center. Much of this “extra code” has absolutely nothing to do with providing the basic functionality of the app itself. Rather the “extra code” is required to make the app highly useable in the cloud. In other words, it is code to ensure the app meets its supplemental requirements. Estimation of the effort to develop this “extra code” needs to be done in addition to the estimation of the effort to develop the code that produces the apps functionality.

Please note that many of the areas requiring more code, and many of the patterns mentioned above, are amenable to code reuse. Much of this “extra code” is infrastructure code that can be reused in the form of libraries. That’s good news for organizations developing a steady stream of cloud-native apps, so that the full amount of “extra effort” may not have to be expended for each cloud-native app developed.

The following links will provide the reader with code examples of some of the above mentioned solutions to robustness and scalability challenges in the cloud. Also consult Cloud Design Patterns for code examples as well.

The following links are all more or less complete apps which have downloadable code samples that show how they have dealt with many of the above areas, including the use of some of the above mentioned patterns. Taking a look at the code samples in the books, plus reading the actual source code, is invaluable at giving one an idea of the kind of complexity (and potential effort) that is involved in developing could-native apps. And, all of the books are available online for free!

  1. Building Hybrid Applications in the Cloud on Microsoft Azure, circa 2012, by Microsoft Patterns and Practices.
  1. Developing Multi-tenant Applications for the Cloud, 3rd Edition, circa 2012, by Microsoft Patterns and Practices.
  1. Exploring CQRS and Event Sourcing, circa 2012, by Microsoft Patterns and Practices.
  1. Cloud Service Fundamentals in Windows Azure, circa 2013. A body of sample code from MSDN that contains “fundamental building blocks for scale-out Azure apps”. Built based on “real world customer learnings of the Windows Azure Customer Advisory Team (CAT)”.


  1. Cloud Architecture Patterns by Bill Wilder, Copyright 2012 by Bill Wilder, O’Reilly Media, Sebastopol, CA

I hope you find this article and its references as useful as I have.

George Stevens
Creative Commons License

dotnetsilverlightprism blog by George Stevens is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at


Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: