Skip to content

SO Apps 4, Coping with the Rapid Rise of Distributed Systems

September 27, 2015

We have been building distributed computing systems since about 1970 [Rotem-Gal-Oz]. Their use has accelerated in the last 15 years, and has accelerated even more in the last 5 years (in part due to the demise of Moore’s Law).  Distributed systems get their work done by distributing the computing work required by an app over a number of separate computers, rather than doing it all on a single computer.  A specific part of the work is intentionally done on one computer, and other specific portions of the work are done on other computers.   Some of the reasons for using distributed computing are to facilitate the reuse of the capabilities of a piece of software by multiple apps and services, and to produce extensibility and/or location transparency in a system.

Back in the “old days” most apps did all their computing work on only one or two computers.  For example in 2008 Word or Excel would typically be run on your own desktop computer.  And back then a typical website would usually involve running a browser on your desktop computer which communicated with a server running on a single remote computer – Just 2 computers.

Nowadays a single app is likely to use a number of different services and servers to get its work done.  An individual service often runs on its own separate computer.  So the computers used by today’s apps can be:  A computer to run the browser or user interface (UI); another computer to run the primary server used by the browser/UI computer; plus other computers as well that the primary server may itself call upon for other specialized services; plus the app running in the browser/UI may also use multiple servers (running on separate computers) for their specialized services in addition to the primary server.  Then add in the cloud (an interconnected system of distributed computers) with its elastic ability to dynamically scale the computers used by apps to handle varying workloads and you’ll find even more computers being used by a distributed system.

Internet-of-Things (IoT) systems are complex distributed computing systems as well.  IoT system development will fuel the accelerating rise of distributed systems for years to come.  Figure 1 below shows how much things have changed.

DistSysThenAndNowDiagramFigure 1 — A conceptual sketch of the increase in complexity of today’s apps and their supporting distributed systems, versus typical apps of 10 to 15 years ago.

And take note of this:  All the different services and computers used by the apps of today are usually completely hidden from the end user, and rightly so.   App users don’t want to be distracted by all these details.  They just want to get their work done via a good user experience.  So the accelerating rise of distributed computing is hidden from app users and public awareness.  Out of sight and out of mind.

Thoughtful consideration of the accelerating rise of distributed systems produces three vital conclusions:

1.      Distributed computing has become a permanent, disruptive part of a “new era” of software and systems rather than being something that only banks and the Department of Defense used when it first started 4 and a half decades ago [Rotem-Gal-Oz].  Today almost every time you use an app it involves at least 2 computers, and typically more.  In spades for IoT systems.  Distributed computing is becoming the rule, rather than the exception as was the case prior to 2008 or so.  The “new era of ubiquitous distributed computing” is a significant paradigm shift that is well underway.  You cannot afford to let this disruptive change go unnoticed.

2.      The accelerating use of distributed computing has largely been off the radar of general awareness and has created a “knowledge gap” in the software industry:

o   We in the software industry currently do not have widespread knowledge of the best practices of designing, developing, planning, and maintaining the software of distributed systems and apps.  Why?  1)  The backlog of software needing to be developed is vast, creating very high demand for software developers that need to be put to work ASAP.  And 2) software technology is changing so fast on so many fronts at once in recent years that keeping up with all the changes is very time consuming.  So new topics fall through the cracks, especially those not at the forefront of awareness and which are not specific products and technologies sold by software and systems vendors.  Thus, a relatively small percentage of software industry participants now possess the knowledge of the best practices of developing distributed systems.  And these best practices are increasingly required for successful development projects as more and more projects involve distributed systems.

3.      Apps and software systems have become much, much more complex since their logic (contained in their software) is now spread over multiple computers that are connected by networks.  And complex systems require more work and time to develop.  The complexity and amount of work to develop distributed apps increases as follows:

o   Apps themselves are both more complex and more numerous than in decades past.  How many apps are on your smart phone?  How many apps were on your desk top computer in 2008?  There is more complexity since we have more apps.  Plus each app typically requires more functionality and usability than in the “old days”, resulting in more complexity.

o   The app software is now distributed, rather than most of it being on one or two computers — It takes substantial extra work to manage, coordinate, secure, deploy, do robust error recovery, debug, and adequately test the app’s software logic that is now spread out over a number of distributed computers.

o   Apps now use far more networks and connectivity — It takes substantial extra work to manage, coordinate, secure, deploy, do robust error recovery, debug, and adequately test the networks and connections used by the interconnected distributed computers running the distributed software.

o   The software, computers, networks and connectivity all must be effectively integrated — It takes significant extra work to integrate, coordinate, secure, deploy, do robust error recovery, debug, and adequately test all of the above so that the software, the computers, the networks and the connections cooperatively interact with each other to behave as if the app were running as a single unit on a single computer.

o   To summarize, in distributed systems the whole is indeed greater than the sum of its many parts, including the whole amount of work it takes to design, plan, develop, test, debug, secure, deploy, operate, and maintain complex distributed systems and apps.

  • Note, most design and planning methods, plus development processes, in use today do not account for the significant increase in complexity and the resultant increase in work caused by distributing an app’s computing work over many computers.  This is a key “missing link” to a much less bumpy transition into the “new era”.

o   And, the increase in complexity and the work required is not a simple linear straight line increase.  Rather it is an upward accelerating curve,  as shown below in Figure 2.  A crude, yet fairly accurate, example of measuring the increase in the “distributed” complexity today, as compared to the “old days”, is to count the number of arrows (aka connections) and the number of “Server X  Distributed Computer” boxes (aka functionality) in each of the diagrams in Figure 1 above.  Then add the arrow count to the box count for each diagram [Sessions], and finally add in 1 to represent the UI computer at the top of each diagram, as follows:

In the “old days” we have:  1 arrow  + 1 box     +  1 UI =   3 total complexity.

Today we have:                     7 arrows + 6 boxes + 1 UI = 14 total complexity.

By this measure the complexity has increased 14/3 = 4.67 times (4.7x or a 467% increase)!  Note that I have held constant the “functionality” measurement of complexity per box above for brevity.

This is not to say the work required to produce and maintain the app will always increase this much.  But it does say that the work required will not increase like a straight line.  Rather the work increases in a non-linear curve since it is directly related to the non-linear increase in complexity of the app and it’s supporting distributed system.  In other words, the work required to produce a distributed system app is a function of the complexity of the connections in the system (arrows), plus the complexity of the required functionality implemented in each part (boxes) of the system.  Therefore, beware of the temptation to estimate the amount of effort required to build a distributed system by extrapolating from the actual effort previously used to build non-distributed systems.  Your estimate will far fall short.

Below is a family of curves showing how a typical measure of interconnection complexity increases.  Note what happens when the number of items (n, the number of services or distributed computers, i.e. boxes) is doubled Double the items (a 2x increase) and the complexity can increase by 3x, 4x, or more — Even when the number of connections is tightly constrained (e.g. by a 75% reduction in connections).  This is the driving force behind how the amount of work required in a complex distributed systems development project can undergo non-linear expansion behind the scenes over several months, eventually catching everyone by surprise when they find the project is buried in unanticipated work.   This also applies to the features of an app Research shows adding 25% more features can increase the complexity by 100% [Sessions], thus non-linearly increasing the work needed to build the app.

DistSysComplexityGraph2Figure 2 – Graph of formulas of interconnection complexity — Curves that constantly accelerate upward.

Above, the complexity of software systems and distributed systems (as well as the communication paths in groups of people) increases in an ever accelerating upward curve.  This upward acceleration will happen unless specific complexity reducing software design techniques are used to prevent such accelerating expansion.  Such simplifying techniques are key parts of the best practices for the software architectural design of distributed systems listed below.  Reductions in complexity significantly reduce the amount of work that must be done in a project.  And they also reduce the number of failures and security vulnerabilities [Sessions].  That’s a lot of return for the effort spent reducing complexity!

The Way Forward

Welcome to the “new era of ubiquitous distributed computing” being used by more and more new apps, plus IoT systems.  Happily, despite the obstacles of the non-linear expansion of complexity and work, we have learned how to effectively develop distributed systems.  Below I outline how to adapt to the disruptions caused by the “new era”, rather than being at their mercy.

If you are developing distributed systems you should not assume your development project will proceed like they did in the past with non-distributed systems. Many of the techniques we are accustomed to routinely using in software projects were developed back in the “old days”, before the explosion of complexity and work in distributed systems.  And many of these are not up to the task required of them today.  Things have changed.  Not mitigating the risk of “no-longer-effective techniques” will almost surely result in troubled distributed system development projects.

Here is what to do to have successful software projects that produce robust distributed apps and systems which deliver the functionality, usability, business agility, data security, and operational efficiency required of software systems today.

o        Adopt a distributed system mindset, a “new era” mindset, recognizing there is an important disruptive paradigm shift underway.  Delay in adopting this mindset this will only cause you problems and pain.  Instead, consider this as an opportunity to move ahead of the pack.

o        Accept that the techniques used in the past in project estimating, planning, organization, and sequencing of activities will need modification to work well in developing distributed systems.  Then use new techniques that adequately quantify cost, risk, the extra work due to complexity, the new development activities, their new sequencing, and the interaction of all of these with the overall project cost, schedule, and risk [PDMC].

o        Recognize the key differences between complex distributed apps, versus simpler “old style” apps.  You’ll not have to do things differently when developing the “old style” apps that do not use distributed computing.  Although applying some of the best practices of distributed systems to “old style” apps can add significant value.

o        Use proven Service Oriented software architecture and design techniques, including those that reduce the complexity of a system.  These, combined with the improvements to project planning and organization, can significantly reduce the amount of work required to build a distributed system by reigning in the explosion of complexity.  Plus these will keep the level of complexity and amount of work required to extend and maintain the system much, much lower for years throughout the life of the software system, in addition to speeding up the time-to-market of subsequent releases [AMC], [PDMC].

o        Executives, product owners, project managers, architects, developers, and test engineers must explicitly focus beyond the features, the software and its logic, and the data (all were the main focus in the “old days”).  Now they must also fully embrace dealing with complexity, the network, connectivity, plus the substantial integration and testing work (on multiple levels) that is required to make it all play well together.  And please do this long, long before the end part of the project when you are out of time.  This new focus will have an impact on both the amount of work required (more work) and the sequencing of activities in a project.  However it will also produce a noticeable increase in quality, security, and being on schedule and on budget [AMC], [PDMC].

o        Plan on the participants of a software development project having to climb a substantial learning curve. Not only for learning any new technologies involved (the usual focus of new learning), but also plan for learning the new techniques and best practices required to deal with complexity, the distributed software, the network, the connectivity, the integration, security, deployment, automated health monitoring of the app, automated scaling, distributed error recovery, distributed testing, plus new ways of project estimation, planning, and project design.

o        Augment your staff with temporary outside experts in key areas as necessary.  Not only can they immediately add great value to design, planning, organization and implementation, but they can also significantly reduce risk plus mentor your staff to bring them up to speed faster and more thoroughly than some other forms of learning.  An example is to utilize the services of a security expert to design the security for your new app and it’s supporting distributed systems, plus training your staff in the security design and its implementation. [PDMC].

You can learn the details of many of the above items in the IDesign Architects Master Class and Project Design Master Class.  These classes are developed and taught by Juval Lowy, an internationally recognized expert in distributed systems and service oriented design, who has been architecting and planning such systems for decades.

For more information on the details of any of the above items, please contact me by posting a comment on this blog.  For more detailed information on aspects of building distributed systems please see the following, some of which also serve as references:

o        An article on the additional effort it takes to build cloud apps in my December 2014 blog post:  “A Perspective for More Accurately Estimating Cloud App Development Costs”.

o        My February 2012 blog article “Software Structure Can Reduce Costs and Time-to-Market” shows that the post-initial-development cost of a software system can vary by over 400%.  Now you see one key source of that variation – the complexity of a software system.  Another key source is the encapsulation of volatility, but I’ll save that for another article.

o        A must read article looking at the current mindset and architectural practices, and then showing diagrams of decoupled, scalable software architectures that work with both distributed and not-so-distributed systems:  “From the Field: Escaping Appland” by Monty Montgomery, Master Architect at IDesign.

o        [Rotem-Gal-Oz] An excellent in depth article on 8 specific challenges in building distributed systems – The reliability, latency, bandwidth, security, topology, administration, transport, and homogeneity of networks: “Fallacies of Distributed Computing Explained” by Arnon Rotem-Gal-Oz, Service Oriented Architect.

o        [Sessions] An article on system complexity by Roger Sessions, an expert in IT Complexity who has been working in this area since at least 2008:  “Thirteen Laws of Highly Complex IT Systems”.

o        [AMC] IDesign Architects Master Class.

o        [PDMC] IDesign Project Design Master Class.

I hope this article has been helpful to you, despite its length.  In presenting the long winded “whole story” in rather broad brush strokes I have glossed over a number of details.  But I believe it is vital to comprehend the big picture here.  Major technology paradigm changes do not often happen.  And when they do it is often not apparent until way after the fact.  The “new era of ubiquitous distributed computing” has clearly taken shape.  Understanding the big picture and the “way forward” can put you well ahead of the curve as this wave of technology change washes over human societies for decades to come.

Be mindful of the significance of this “new era” in all of human history – In the twenty-teens the human race definitively crossed the threshold into the “era of ubiquitous distributed computing”. Just connect to the internet from anywhere on earth and the awesome capabilities of multiple, powerful distributed information processing engines are available! This is a clear milestone of major technological and social change that will be visible far into the future as humans look back on their past. One cannot even begin to imagine the cascading changes that will result and how they will affect the course of history.

George Stevens

Creative Commons License

dotnetsilverlightprism blog by George Stevens is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at dotnetsilverlightprism.wordpress.com.

Advertisements
Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: