Skip to content

Tips on Time Series Data

Over the years as a software engineer I have repeatedly come into contact with time series data.  Some of the useful “time series data” lessons I’ve learned have become increasingly relevant in our current age of IoT, the Cloud, and Big Data.  This blog article sketches a few of them.  To start, please review the Wikipedia’s definition of time series data.

Race Conditions and Queues Can Disorder Time Series Data

As time series data flows through a software system, like an IoT system, it typically flows through a number of components, each doing a step of processing the data in what can be viewed as a data pipeline or data stream.  Downstream components often assume that incoming data is in proper time series sequence (temporal order), which means that all upstream components must absolutely keep that data in proper time series order.  When this assumption is violated, bad things can happen.

For example moving averages are typically used to smooth time series data, removing the spiky, short term fluctuations so that the underlying trends in the data can be more clearly seen, measured and acted upon.  Many times the alerts and alarms in IoT systems are based on moving averages so as to avoid spurious trigger conditions often present in the more rapidly fluctuating raw time series data.

If time series data becomes disordered, then moving averages, plus any statistics based upon them (like rate of change of the moving average) can become absolutely meaningless, dependent upon the magnitude and time duration of the temporal disorder.  And, critically, any alerts and alarms based on the moving average of the disordered time series also becomes meaningless.  In this situation a system has lost its capability to notify its operators of dangerous conditions.  This can result in serious trouble.

Disordering of time series data can be prevented by the following means:

  • Calculate the time series dependent measures, like moving averages, as close to the data source as possible so as to reduce the opportunity for disordering the time series data by reducing the number of components that process or contain the data as it flows through its data pipeline to its final destination.
  • Don’t use Azure Service Bus Queues to contain time series data since they are not guaranteed to be first-in-first-out queues. Generally an Azure Event Hub is a better choice to contain time ordered data.  If you must use Service Bus Queues you must reorder the data downstream of the queue, or take other measures to ensure the data stays in proper time series order.
  • A potential race condition exists when using Azure Functions to remove time series data from an IoT Hub, Event Hub, blob, or to do any kind of processing of a data stream that must remain time ordered. Why?  The race condition here is that since Azure Functions are capable of scaling out with multiple instances of the same function executing in parallel, yet with no guarantee of the order in which they complete.  Thus, a simple intermittent transient fault (very common in the cloud) could easily result in a burst of time series data being input into an Azure Function that runs instances in parallel to handle the load of the burst.  Each Azure Function will complete in its own time, rather than in a way guaranteeing preservation of the time series order of the data stream.  The time series ordering of the data downstream of the Azure Functions cannot be guaranteed under all conditions, although many times the temporal ordering will be OK.  But, you cannot count on that happening all the time.  If you must use Azure Functions like this, then you need to reorder the data downstream of the Azure Functions output.

The Display of Missing Data

What happens when an IoT device stops transmitting its operational telemetry data for a period of time?  And how can this scenario be presented to a user in a helpful way?  This is especially important in a mission critical measure.  An example is an equipment high temperature condition that will endanger the equipment or people’s safety.

There are 3 key concepts in this area to be aware of:

  • The distinction between Operational Data and Health Monitoring Data:
    • Operational data is the main data of interest emitted from an IoT device or a sensor (like temperature, pressure, etc.). Operational data is closely related to why that device or sensor is there.
    • Health monitoring data is about the health state of a device or sensor, including its ability to transmit and/or process data. For example health data can include “IsAlive” information that the device is responsive, exceptions encountered by the device or sensor when doing its job, etc.  Health monitoring data can also be about the health of other components in a data pipeline downstream of devices and sensors.
  • LKV – The Last Known Value of a data stream. This will either be a valid measurement or “MISSING DATA”. It applies to both operational and health monitoring data.
  • LGV – The Last Good Value of a data stream. This will always be a valid measurement and will never be “MISSING DATA”.  This also applies to both operational and health monitoring data.

In a user interface displaying mission critical operational data, my past experience has shown that such a display is vastly more useful when the following data items are displayed near each other on a dashboard or control panel.  Note this only applies to key mission critical data items since the display takes up a lot of space.

  • The LKV of the mission critical operational data item. This may be a valid measurement value like a number, or it may display “MISSING DATA” indicating the time series data is out of whack.
  • The LGV of the mission critical operational data item. This will always be a valid measurement value, specifically the last good measurement value of the mission critical operational LKV data item.  If there are no problems, then LKV will equal LGV.
    • The LGV turns out to be immensely helpful for dealing with emergencies since at least you have some information, even if it is out of date.
    • And it is also highly useful to display the time of the LGV as well, so the user will know how stale the LGV is.  That information could greatly aid the user in making effective decisions and reduce uncertainty when mission critical measurements have missing LKV data.
  • The health monitoring data related to the device or sensor that is doing the measurements of the operational data making up the LKV and LGV.
    • It turns out to be very helpful to know the health status of a device or sensor, or another relevant component of a data pipeline, when the LKV starts displaying “MISSING DATA”.

Using the above ideas will make your time series data processing systems more robust and more effective for humans to operate.

George Stevens

Software Architect, Sr. Software Engineer at Solid Value Software, LLC.

Creative Commons License

dotnetsilverlightprism blog by George Stevens is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at dotnetsilverlightprism.wordpress.com.

Advertisements

SO Apps 9: Creating Success with Microservices — Info Sources

Developing microservices might be more involved than you initially think. While a single microservice itself may be relatively small and simple, systems composed of multiple microservices are much more complex than one may expect since microservices dramatically affect numerous key areas in the entire software development and operations lifecycles.  This article describes the key areas affected through learning from people’s experiences with microservices, both successes and failures.  Herein I present a list of info sources I have found to be helpful during my own journey of learning to design and implement microservice based systems that started in 2012.  Also included is a synopsis of the “Key Areas for Success with Microservices” gleaned from those info sources.

About Microservices

Microservice systems are also distributed systems since each microservice, or group of closely collaborating microservices, is hosted within its own process.  And each such process is most likely to be hosted on a separate computer or virtual machine, thus acting to distribute portions of a single client’s workload across multiple computers.  These characteristics require the microservices to communicate with each other via a network, rather than communicating via much simpler direct method/function calls as is done in monolithic non-distributed systems.  Note that software running on distributed systems (e.g. microservices) is notably more difficult to design, develop, debug, and operate than monolithic software for non-distributed systems due to this networking, plus the, at times complex, orchestration needed to manage the concurrency of multiple processes running independently of each other.  Thus, microservices and distributed systems represent a major sea change for the mental models, tools, techniques and processes of software engineering, development, and operations due to their different characteristics, challenges and higher level of difficulty when compared to the monolithic systems and “apps” of the past.

The rise of microservices in the last few years is driven by 3 main factors:

  1. The great and rapid increase in the use of distributed systems, as noted in Monty Montgomery’s “Escaping Appland” article below. In the “Great Expectations” section he describes a “major industry inflection point” occurring now where most “apps” developed now are in fact distributed systems.  He also describes the implications of that inflection point.  One key implication is that since this growth in distributed systems has happened extremely rapidly, our current software development workforce has relatively few people having experience designing and building distributed systems, as described in his “Still Stuck” section.  This leaves organizations with the need to grow their own expertise in microservices and distributed systems mental models, technologies, and techniques. Note that microservice software architectures were originally developed to support distributed systems.  For more on distributed systems see my blog “SO Apps 4, Coping with the Rapid Rise of Distributed Systems”, 9/27/2015.
  2. The rise of the need for highly scalable software systems and the cloud computing platforms that supply such scalability. A cloud is a distributed system by definition.  And microservices allow effective horizontal scaling way, way better than monolithic services do.
  3. The rise of the need for Continuous Delivery of feature updates to software systems. In many businesses it is no longer good enough to release new software feature updates each quarter, half-year, or year.  Microservice architectures can greatly facilitate a more rapid schedule of software feature update releases.  With sufficient automation, plus the other things listed below, an accelerated release pace is quite achievable and sustainable.  Please see the “Architecting for Speed” article below by Jesper Nordstrom and Thomas Betzholtz for an excellent discussion of this area.

Given the higher level of difficulty of developing and operating microservices, we can learn about the main factors that create success with microservices from those who have had hands on experience developing them and facilitating their adoption by organizations.  Many of the authors of the below info sources have such experience. The general gist of the following info sources is this:

It’s a new world! Developing microservice systems requires new and different mental models and skills in each of several key areas, listed below.  Some of these areas are easy to miss.  Some areas are unforgiving.  And some of the knowledge and practices that worked so well for you in the past can be anti-patterns in the microservice realm.

Beware. As you will see from many of the articles in the following “Info Sources” section, if you are unable to provide what it takes to be successful in all the key areas below, the probability of large failure increases.  Large failure in software development typically results in unplanned longer development schedules, unplanned higher development and sustaining costs, and more software quality problems (more bugs).  These usually significantly increase the Time-to-Market of the initial release and most subsequent releases as well.  Plus, large failure entails a heightened risk of having to abandon a development project gone wrong or do a major rewrite of it, as you will see from one of the below info sources.  Large failure scenarios can be avoided by explicitly focusing on what creates success in microservice development projects, and making sure your organization is doing all the success creating things.

Key Areas for Success with Microservices

To be successful with microservices you need to adopt new mental models in each of the following key areas, plus develop the skills and ability to effectively execute within the new mental models:

o   The ability to decompose a business solution into a system of right-sized microservices – not too big, and not too small – so as to have an effective microservice architecture (the software’s structure) that fulfills your organization’s needs. Think of it this way: “Effective software structure significantly reduces Time-to-Market and Total-Cost-of-Ownership, plus speeds innovation”.  Copyright © 2017 by Solid Value Software, LLC. All rights reserved.

o   The diverse technologies needed to support microservices. This includes the mental models and technologies of microservices themselves, plus those concerned with the broad and rapidly growing set of public cloud services that can be usefully consumed by microservices, new database and data storage models, new hosting models, plus those of software development, quality and testing, operations, and project planning and management as they relate to the specifics of microservice based development. This shows the very broad scope of the main areas involved in developing and operating microservices. It’s a new world!

o   Developers and other staff experienced in utilizing the above technologies, plus thinking in the terms of the new mental models (as opposed to habitually thinking in the terms of previous mental models and technology concepts that can well be microservice anti-patterns). Given the current workforce’s relatively low level of knowledge of the mental models and techniques required to develop solid microservices and distributed systems, there is a large need here for organizations to grow their own expertise.  Suggestions to support this are listed subsequently.

o   DevOps Processes and DevOps Tools. It’s a new world!

o   The willingness and ability to do a major reorganization of the development and operations processes of your organization in areas involving microservices so that the processes explicitly excel at supporting success with microservices. This includes processes involving requirements, architecture, project design/planning and project management, software development, quality control, hosting, deployment, upgrades/rollbacks, microservice health monitoring, and heavy integration testing (an absolute must for distributed systems of any kind).  Plus a number of these areas need to use significant amounts of automation.  It’s a new world!

o   Effective governance, i.e. exercising control over the microservices and their support systems: Who has decision making authority in key areas? Who is accountable for producing what key results?  What to do to rapidly resolve issues when things go wrong?  How does effective communication happen within the organization given the preceding?  How does effective planning of features, development, testing, and releases happen?  As some of the below info sources show, just “winging it” here produces severe problems.  It’s a new world here too!

o   Adequate levels of individual, team and organization discipline required to effectively make all the above happen.

o   Strong management support, including support of people learning all the above over time so they can effectively do what is required of them to create success in this “new world”, plus calm acceptance of occasional small and medium failures.

That is quite a learning curve.  If your organization is new to microservice development here are some proven ways to reduce risk and climb the learning curve more surely:

  1. Start by discovering the overall goals and objectives for your microservice system, plus its key high level business requirements. Get buy in from all stake holders and write them down in a wiki. Things are much easier when you are clear on what you are trying to accomplish.
  2. Use staged development to reduce the risk of large failure by climbing the learning curve stage by stage, rather attempting to do everything in a single “big bang” project. Each successive stage gives teams a chance to learn new areas and then refine their approach to the area in subsequent stages.  For example, after the mission, objectives and high level requirements are done, then do a very high level architecture to act as a “road map” and “vision” to guide the stages of development.  Immediately after the architecture is done  do two or three very quick “proof-of-concept” code implementations of important, but very narrowly scoped, portions of the architecture.  Then learn from these and make adjustments in the next stage where you develop a Minimum Viable Product (MVP) or Pilot Project.  Then learn from that as well and make adjustments.  After the MVP you will have climbed a significant part of the learning curve.  You will now be in a much stronger position to do the subsequent development stages at a much lower risk and much faster, ultimately resulting in a fully functional production microservice system.  Plus an MVP/Pilot Project will give the business valuable insights and feedback irrespective of microservice design, technology, implementation, and operation matters.
  3. Setup your budget to fund your developers, operations, and other personnel to attend classes (in person or online), plus also to have time to do hands-on work with the new mental models and technologies by doing short exploratory coding projects in areas critical to the microservices project. This is key to growing your own expertise.
  4. Use appropriately experienced outside specialists engaged on a shorter term basis to assist your staff in some of the above areas. Such specialists can teach new mental models, technologies, techniques, and processes, plus also develop requirements, software architecture, project plans, detailed designs, and code to get your staff started in areas new to them.  Plus such specialists can assist individuals and teams in becoming much more productive with these new areas once they’ve learned the basics.

Finally, in my experience, becoming conversant with new mental models seems to be an important tipping point in climbing up the learning curve in new areas.  Once a person has “groked” a new mental model, all subsequent learning and application of the model becomes much easier and proceeds faster.  This can be a used as a leverage point in the process of learning and change.

Microservice Info Sources

The below info sources supply details and examples of the key areas in the above list:

  1. Microservices – Not a free lunch!” by Benjamin Wootton, 4/8/2014.  This is highly worthwhile, by someone with hands on experience with microservices, devops, plus assisting organizations in adopting new technologies. Be sure to read the section titled “Distributed System Complexity” to understand what we are up against, i.e. “Distributed systems are an order of magnitude more difficult to develop and test against…” compared to monolithic systems.
  2. This is a worthwhile real world microservice failure story – “so we had no idea who implemented what and where”, and it gets worse! “Seven Microservices Anti-Patterns”, by Vijay Alagarasan, 8/24/15, InfoQ.
  3. From the Field: Escaping Appland”, by Michael “Monty” Montgomery, March 2015 in the IASA online journal.  Monty has been commercially architecting and developing microservice based systems for years — way before they became popular.  He’s seen it all, states the challenges very well, and has lots of experience with how to rise to meet and exceed these challenges and create success with microservices.  This article includes killer diagrams that make his point that here is a better way that has a much higher probability of success with microservices.  I consider this a must read and refer to it often.  Monty is a Master Software Architect with IDesign.  You can learn how to design microservices at IDesign’s classes, some of which he teaches.
  1. Architecting for speed: How agile innovators accelerate growth through microservices”, by Jesper Nordstrom, Thomas Betzholtz, 8/30/2016. This is an excellent wide ranging overview of microservices, good reasons why to use them, what the costs and risks are, and how their use requires new forms of organization of development and operations teams.  And, it shows how the architecture (software structure) of a software system is directly connected to business agility, i.e. reducing time-to-market.  The authors specialize in facilitating organizations in effectively adopting strategically critical technologies.
  2. Here is another worthwhile real world microservice failure story. “Failing at Microservices”, by Richard Clayton, 7/15/2014.  Note common themes in both of the failure stories involve issues with architecture, people, leadership and organization as major causes of problems developing microservice system.  These are things that technology alone cannot cure.
  3. The Astonishingly Underappreciated Azure Service Fabric”, by Ben Spencer, 2/10/2017. Microsoft’s Service Fabric microservice-oriented compute platform innately provides effective solutions for many of the microservice software development, hosting, and operations issues outlined in the above articles.  Service Fabric provides amazing DevOps features that automate many of the hard spots in deployments, upgrades, rollbacks, and resiliency/availability.  That is because Service Fabric was explicitly built to do just that for Microsoft’s internal use.  Microsoft has found their internal Service Fabric to be so useful in streamlining their own development and operation processes for their hyper-scale cloud services like Office 365, Azure SQL, etc. that they are now selling it for use by their customers.  Service Fabric can be hosted in Azure, or in a customer’s on-premises data center, or in a private cloud, or in the new Azure Stack private cloud appliance.  And it can be hosted on either Windows or Linux servers.

Thanks to the authors of all these articles.

To the readers of this blog – I hope you find this information as useful as I have.

George Stevens

Software Architect, Sr. Software Engineer at Solid Value Software, LLC.

Creative Commons License

dotnetsilverlightprism blog by George Stevens is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at dotnetsilverlightprism.wordpress.com.

SO Apps 8: Event Driven Architecture Info Sources

Event Driven Architecture (EDA) greatly facilitates building systems that are more readily extensible than most other forms of software architecture.  Nowadays in the cloud era, such extensibility is a valuable characteristic of many service oriented systems. EDA also:

  • Is a highly scalable way of achieving reliable data integration in widely distributed systems (David Chou, below).
  • Supports near-real-time information dissemination and reactive business processes. (David Chou, below).

This article briefly outlines what EDA is, and its primary components. Then it presents a list of links to various info sources I have found to be most helpful in understanding and using EDA.

EDA systems use a message bus through which several services communicate with each other via messaging as the primary integration mechanism.  Keep in mind messaging is an asynchronous form of communication.  A service is an autonomous process with an API that implements a key chunk of specific business functionality in a system.  A message bus is a logically related group of queues and topics shared by several services so they can send messages to one another, plus a message bus API component providing functionality so services can send and receive messages via the queues and topics in the bus.  A message bus is NOT an enterprise service bus (ESB) since it is much, much lighter weight than an ESB.

Since EDA services mainly communicate with each other via messages sent over a message bus, message sender services are called “publishers” and message receiver services are called “subscribers”.  Any one service can typically both publish and subscribe to messages flowing through the message bus.  And multiple services can subscribe to receive the same types of messages, which opens the door to parallel execution.  EDA messages are known as “event messages” or just “events”.  Drawing from David Chou’s below article, plus others, here is a definition of events:

  • Events usually represent the change of state of a system, and/or the change of state of a business process. For example an event may represent the beginning or ending of the execution of a use case in a business process.
  • Events have a unique type (the event type) and name.
  • Events typically have past tense names since they represent things that have already happened.
  • Events have a payload containing just enough data needed by the event’s subscribers to do the work relevant to the purpose of the event, i.e. to do the work required by the change of state represented by the event.

    • For example, an event payload may contain only a key used by the subscriber to do a database query to get all the data it needs to do the required work. Or the payload may contain all data used directly by a subscriber.
    • The event payload’s data is equivalent to the arguments of a remote-procedure-call service operation. The payload is also equivalent to the “EventArgs” of a UI event (e.g. button click) in UI frameworks like WPF.

As mentioned above, one key characteristic of EDA architectures is they result in a software code bases requiring much less work to extend since its services are highly decoupled from each other by the message bus.  In other words, one service does not know about the existence of other services, nor their messaging endpoints, etc.   Thus EDA services are loosely coupled, as opposed to being tightly coupled to each other as often happens in remote-procedure-call architectures.  EDA services using a message bus have the absolute minimal knowledge required for services to communicate with each other.  A service must know only the following to communicate with other services using events via a message bus:

  • A service subscribes to a message bus to receive certain events that are published to the message bus by various unknown services.
  • A service’s API service operations do a specific kind of work for each specific kind of event type it receives. Each event type is often a business domain “unit of work” of some sort.  (See David Chou’s article, below).
  • A service’s API service operations also publish certain events to the message bus when a notable step of its processing is completed, or when the service operation’s entire processing sequence is complete, so that other services can act in response to those events.
  • Whether a sender or receiver, a service must use the proper data contract (aka event payload format) when dealing with a given event.

In EDA architectures only the message bus API component knows the various queue and topic names, the endpoints, and how to map events to an endpoint, and for subscribers how to map the event to invoke the proper service operation of the subscribing service’s API.  As such, the message bus acts as the intermediary between services, decoupling them from each otherThis strong decoupling allows rapid extension of a system’s capabilities – just add new services and new events unique to it, plus make use of existing events as necessary, and update the message bus API component with new endpoints, etc.  With a message bus, services are coupled to each other only through the events and the event payload data format they share in common.  This is very loose coupling that promotes a great deal of independent variation between a system’s services.  Please see Hohpe’s “Programming Without a Call Stack – Event Driven Architectures” below for key details on the great extent of loose coupling EDA provides, plus a few key challenges in EDA worth knowing about as well.  The challenges of EDA require substantial software engineering experience to effectively use it.

Also note that EDA can be used to extend existing non-EDA systems, due to EDA’s excellent support of extensibility.  While this will require retrofitting some existing software components to support messaging, it may be a worthwhile avenue to explore when considering cloud/on-premises hybrid systems.

The following reference links contain more in depth information about Event Driven Architecture:

  1. The “Event Message” on pp 151 – 153 of Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions by Gergor Hohpe and Bobby Woolfe, Copyright 2004 by Pearson Education.  And at http://www.enterpriseintegrationpatterns.com/patterns/messaging/EventMessage.html
  2. The “Message Bus” on pp 137 – 141 of Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions by Gergor Hohpe and Bobby Woolfe, Copyright 2004 by Pearson Education.  And at http://www.enterpriseintegrationpatterns.com/patterns/messaging/MessageBus.html.
  3. Programming Without a Call Stack — Event-driven Architectures” by Gregor Hohpe at eaipatterns.com. This is an excellent article that focuses upon how EDA changes the programming model and reduces coupling and the consequences of that.
  4. Using Events in Highly Distributed Architectures” by David Chou, October 2008, in Microsoft’s The Architecture Journal. This article provides a great overview of EDA and definition of terms.
  5. Event-Driven Architecture: SOA Through the Looking Glass” by Udi Dahn.  Provides an overview of EDA with  a focus on data consistency.
  6. Event-Driven Data Management for Microservices” by Chris Richardson of NGINX. Good examples of EDA scenarios, especially in regards to data persistence and consistency.
  7. John Mathon’s 3 article series on EDA in his Cloud Ramblings blog is useful — March 31, 2015;  April 1, 2015, April 2, 2015.  The first blog has a great history of EDA and SOA.  John Mathon is the founder of TIBCO and was CEO for a long time.  TIBCO was a key player in EDA tools in the early days and beyond. John knows the area very well.
  8. Message Driven Architecture Explained — Basics” by Mike at Sapiens Works. A succinct summary of the very basics.

I hope you find this information as useful as I did.

George Stevens

Creative Commons License

dotnetsilverlightprism blog by George Stevens is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at dotnetsilverlightprism.wordpress.com.

Most Useful IoT Security Info Sources

Want to know exactly how to assess the end-to-end security needs of an IoT system?  Want to understand the limitations of IoT devices acting like servers, and a much, much simpler alternative?  Want to inform a non-technical person about IoT security?

I’ve found the 3 links listed below to be some of the most useful ones I’ve encountered in the area of IoT Security since I got started in IoT a couple years ago.  For more IoT security links please see my November 30, 2016 blog article Internet-of-Things Security — Info Sources.

1. Want to know exactly how to assess the end-to-end security needs of an IoT system? While this link includes the traditional IT security required for IoT systems, it also brings in other areas of security as well, e.g. physical security of IoT devices.  The Microsoft white paper “Evaluating Your IoT Security” published in March 2017 presents an “IoT Security Evaluation Framework” that deals with a complete list of threats, their consequences, and security evaluation strategies.  It also contains links to other valuable info sources, like the European Union Agency for Network and Information Security (ENISA) Threat Taxonomy:  “a rich and multi-level definition of threats” that goes far beyond the bounds of traditional IT security.

2.  Want to understand the limitations of IoT devices acting like servers, and a much simpler alternative? Clemens Vasters’ excellent blog article “Service Assisted Communication” for Connected Devices provides a deep dive into this area, and more. This is a must read article for IoT system developers since it shows specific ways how using services to communicate with devices can both dramatically simplify an IoT solution, while also significantly increasing its security.  One key concept used is that devices always act in the role of a client, calling services.  “Devices do not accept unsolicited network information.  All connections and routes are established in an outbound-only fashion.”  Mr. Vasters presents 7 fundamental principles.  They are backed up with a detailed exploration into device connectivity through a number of the network layers, exploring the strengths and weaknesses of various security techniques.

3.  Want to inform a non-technical person about IoT security on Azure? The Microsoft white paper “Microsoft Azure and Data Compliance – In the Context of the Internet of Things (IoT)” published in March 2017 presents a good non-technical discussion of the IoT security capabilities Microsoft provides in Azure, and why they are necessary.  This is aimed at the decision makers in an organization.

I hope you find these links as useful as I have.

George Stevens

Creative Commons License

dotnetsilverlightprism blog by George Stevens is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at dotnetsilverlightprism.wordpress.com.

.NET Async/Await Best Practices

On a recent project involving the Microsoft ASP.NET WebAPI and Service Fabric, I needed to use the .NET Async/Await and Task features of the Task Parallel Library (TPL).  Most code in Service Fabric Reliable Services and Actors requires the use of Tasks to make each individual service operation run on its own thread.  This code often uses Async/Await as well.  Asynchronous methods are often used via Async/Await in the controllers of the ASP.NET MVC and WebAPI frameworks to keep user interfaces highly responsive.

However, Async/Await can cause deadlocks in certain scenarios.  And in other scenarios their use can cause unexpected forms of exceptions.  To ensure my code was rock solid I searched the web for “.NET async await best practices”.  I found several sources of guidance that focused on two key areas:

  1. Coding techniques that avoid problems with Async/Await and Tasks in .NET.
  2. The underlying reasons why these coding techniques avoid problems.

Here are the most helpful references I found:

Both of these articles are valuable to review, and contain roughly the same best practices.  Stephen Cleary’s article appeared in MSDN Magazine, with Stephen Toub doing the technical review of the article.  Stephen Toub is a Microsoft employee and a widely known expert in TPL, asynchronous programming, and parallelism.  Thus, you know the common information presented in both articles is accurate.

The article by Jordan Morris contains more code examples, plus points at example code in Git Hub projects that is runnable, i.e. with some of these code examples you can create your very own deadlock on your development system!  Plus, the whole gist of this article is aimed at providing guidance to Jordan’s development team, rather than being aimed towards being a magazine article.  As such it provides a practical approach to documenting and effectively maintaining a code base that uses Tasks and Async/Await.

I hope you find these as helpful as I have.

George Stevens

Creative Commons License

dotnetsilverlightprism blog by George Stevens is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at dotnetsilverlightprism.wordpress.com.

Why I Like Modeling IoT Devices with Azure Service Fabric Actors

Service Fabric Actors offer a simple, reliable programming model to efficiently act as an IoT Device Shadow.  Device Shadow is the concept used when a software entity (a class, a service, etc.) is used to contain the recent state of an individual remote IoT device.  By “state” I mean the current data that an IoT device is designed to furnish to an IoT system.  For example, the state of a remote oil temperature sensor device would be the current temperature of the oil it is monitoring.   Device Shadows are often implemented with a JSON document, but here I use Service Fabric Actors.

Essentially, a Device Shadow is a virtual representation of a remote IoT device.  It contains the persistent recent state of the device.  A Device Shadow is used by the rest of the software system in an IoT solution to access a device’s state to avoid a time consuming and resource expensive process of gathering the state information from various places in an IoT system, perhaps including a trip all the way out to the remote IoT device to get it’s state.  Continuing the previous example, with a Device Shadow for an oil temperature sensor device the software does not need to ask the sensor device directly for its current measurement.  Nor does the software have to do a series of database queries to get the current state of a device saved in disk storage. Rather, the software can simply query the associated Device Shadow instead, since it contains all of a device’s recent state in one place.  This assumes the rest of the system is designed to have the IoT devices periodically report their state to their Device Shadow — A standard practice.   There may be dozens to 100s of thousands or more of remote devices and their Device Shadows in an IoT System!  The ability to deal with such massive scale is extremely important here.

Service Fabric is Microsoft’s next-generation middleware platform that has strong support for high scale microservices.  And it is aimed at providing very high reliability achieved by having multiple replicas of each service it hosts running on multiple virtual machines that make up a Service Fabric cluster.   Only the “primary replica” is used, but the “secondary replicas” in standby on other virtual machines in the cluster have their state always kept up to date with that of the primary replica.  This allows for very quick and precisely accurate automatic recovery from a crashed virtual machine in the cluster.  No operator involvement is required for this recovery.

A Service Fabric Actor is a highly reliable, single threaded, persistently stateful microservice that runs in a Service Fabric cluster.  Actors support high scale and high reliability since each one is a special Service Fabric Reliable Stateful Service that runs under control of the Service Fabric Actor Service which provides them with the unique characteristics of Actors.  Thus, SF Stateful Actors are very well suited to act as a Device Shadow.  And, since they are individual microservices, they can scale out without a negative impact on the code, operations, or performance of the overall software system.

Here is a conceptual data flow diagram showing how Service Fabric Actors can implement the Device Shadow role in an Azure IoT solution.

deviceshadowactoriotsystem

I have omitted detail in Figure 1 so as to focus only on the Actors and the flow of commands, command responses, and telemetry in the solution.  Please see “Microsoft IoT Reference Architecture” for definitions of terms used above.

The Business Backend System (Backend for short) is responsible for the UI display of alerts, dashboards, visualizations, and reports, plus some analysis of data, provisioning the system’s resources, and monitoring the health of the system as it runs.  The Backend also is responsible for issuing commands to remote IoT devices through Device Shadows as directed by user interaction with the UI or a via a programmatic workflow. The Device Shadow Actors are intermediaries, standing between Backend System and the Cloud IoT System (with its IoT Gateway data ingestion buffer to which the IoT devices connect).

During December 2016 and January 2017 I implemented an exploratory proof of concept (POC) system that used SF Stateful Actors as Device Shadows.  A challenging feature of my POC is that it had a requirement to implement significant commanding from the Backend System to the IoT devices.  Specifically, I had to make the IoT devices start, stop, and when running put them online and take them off line.  Commanding devices entails a lot more than just passively collecting telemetry from sensors, as is common in many IoT solutions.

Commanding involves the Device Shadows maintaining the command state of a remote device, in addition to its telemetry state.  It also requires a Device Shadow to implement behavior to support commanding.  Specifically, the Device Shadow must receive a command from the Backend System and then send the command to the correct remote device, allowing time for the device to execute the command, and then receive a command response from the IoT device when the device has completed command execution.  In addition, the Device Shadow needs the ability to time-out when no command response is received within a specified time.  And, the Device Shadows must know when a command is in the process of being executed by a device so as to prevent attempts to have the device execute multiple commands concurrently.  Finally, in all of the above scenarios a Device Shadow Actor must notify the Backend System of the command state at critical junctures in the commanding process, e. g. normal command completion, command time out, device error, etc.  These command response notifications allow the Backend to push them to UI client’s for alerts and dashboard updates.

Service Fabric Stateful Actors offer a great programming model to efficiently act as a Device Shadow that models the telemetry and command needs of an IoT device, to allow it to work well with the Backend System as well as other parts of an IoT solution.   As such, SF Actors speed up development time, plus reduce the level of skill required to effectively program and debug potentially complex logic associated with commanding and telemetry.

Keep in mind, with many cloud solutions there is usually no guarantee that difficult problems will not arise when scaling out to 1) service tens of thousands or even millions of IoT devices, or to 2) service high data rates moving to and from the devices.  Such problems often involve throughput and contention issues in accessing storage, the need to use multi-threaded programming techniques, excessive latency, and the need to orchestrate the timing of several different collaborating components in a cloud environment which is eventually consistent by its nature.  Any of these issues by themselves is a challenge to a seasoned developer, and taken together they will likely take a substantial amount of time to get right.  My POC showed me that Service Fabric Stateful Actors can relieve a substantial amount of this burden in the realm of Device Shadows.

In what follows I list what I liked about developing my POC with SF Actors, followed by a list of links that facilitate you learning more about Actors and Service Fabric, rather than explaining them in this blog.

Why I Like Service Fabric Actors as IoT Device Shadows

  • Most of the behavior required for Device Shadowing is located close together: It is in the Actor microservice itself or in the resource accessor used to provide system wide access to an Actor. Not much of this behavior is spread widely across the entire system.  This reduces development and maintenance time.
  • There is no need to deal with complex multi-threaded code. Actors are single threaded with “turn based concurrency”.  In other words, all the code in one of an Actor’s methods will be completely executed before any of the other methods can begin execution.  This makes development and debugging much, much easier and faster than when dealing with multi-threading.
  • One does not have to worry about data persistence concurrency issues. All of the state of an Actor (i.e. the telemetry and command state in a Device Shadow) is persisted in an Actor’s own private state store.  And, that state can only be accessed via an Actor method or property which is guaranteed to use “turn based concurrency”.  So, there is no worry about data access exceptions due to “record locking”, which can happen with a busy traditional database.  And there is no need to use transactions since they are “built in” to Actor state access.
  • One does not have to worry about slow or intermittent data access to an Actor’s private state. Service Fabric Stateful Actors are built on top of Service Fabric Reliable Services.  Behind the scenes, Stateful Reliable Services save their state on the hard disk of the virtual machine in the SF cluster that is running the Actor instance.  Thus, there is no network access involved in state access by an Actor.  This makes data access much faster than over a network, and it eliminates the ever present network transient faults of cloud computing than serve to complicate data access.
  • There is no need to write code to deal with special considerations to gain massive scalability. By the inherent nature of Service Fabric, its clusters, and Actors (each instance is running as a separate microservice) there is built-in scalability of vast proportions!  The focus shifts from providing scalability in code, to configuring the Service Fabric cluster to provide the required scale.  And configuration takes a whole lot less time and effort.
  • The learning curve is quite reasonable. The documentation is good.  And learning to use Actors in the role of Device Shadows is just not that hard!  However there are some advanced scenarios involving complex parallel computations with Actors that are not for the rank beginner.
  • The development environment is good. A “development cluster” can be set up on your development system, and it runs exactly the same binaries that a production Service Fabric cluster does.  This is not emulation!  Working with the development cluster through Visual Studio is a very time efficient way to develop and debug code before running it on a non-development cluster on Azure or in your own data center.
  • SF provides amazing time saving upgrade capabilities when Actors must have code changes. Upgrades can happen without taking the system down or out of service.  And if an upgrade has problems, the code can be automatically rolled back.  Again, without taking the system out of service.  In general, Service Fabric adds great value in the area of operations, in this and many other ways.

Links Concerning Service Fabric, Actors, and Code Samples Showing Their Use in IoT Solutions

Microsoft Azure – Azure Service Fabric and the Microservices Architecture”, MSDN Magazine, December 2015.  This is an excellent overview of Service Fabric, plus has a good section on Actors as well.

Overview of Service Fabric.  This Microsoft article provides a broad view of Service Fabric development and operational features, plus contains a few good videos for more in depth knowledge.

Introduction to Service Fabric Reliable Actors.  This Microsoft document is a good place to start learning about Actors.  It also includes links to other detailed documentation for Actors, in the panel on the left.

Getting to Know Actors in Service Fabric. This is a really good in-depth blog on the details of Service Fabric Actors, plus graphically showing how they fit in with other microservices running under Service Fabric.  This is a must read to get up to speed on Actors quickly, IMO.

IoT Actor Code Example:  Service Fabric IoT Sample, from Microsoft by Vaclav Turecek (Sr. Program Manager on the Service Fabric team).

IoT Actor Code Example:  Paolo Salvatori (Microsoft, Italy) provides IoT code examples using Actors, plus several other Service Fabric examples as well.

Azure Code Samples for Service Fabric provides some IoT related Actor samples, plus others as well.

I hope this introduction to Service Fabric Stateful Actors used as Device Shadows spurs you to further investigate their capabilities and how they can be useful to you.

George Stevens

Creative Commons License

dotnetsilverlightprism blog by George Stevens is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at dotnetsilverlightprism.wordpress.com.

Internet-of-Things Security — Info Sources

The security of distributed systems, whether cloud based, on-premises, or hybrid cloud/on-premises, is a complex subject by itself.  Add securely connecting a bunch of hardware things to a distributed software system and you have more complexity, new requirements, new techniques, and new technologies to deal with.  Hopefully this article will shed some light on some of the current best mental models, best practices, and technologies to use in designing and building secure Internet-of-Things (IoT) systems.

Please keep in mind the key points made in my previous blog article “Reinventing the Wheel is Not Necessary for IoT Software Architecture”:

  1. It’s best to use an end-to-end system perspective when thinking about IoT Systems. They are much more complex that just the internet and some things.
  2. “When developing IoT Systems we can use all of the software structural (aka software architecture) knowledge we’ve gained over the past decade from developing secure, mission critical distributed systems, and Service Oriented Architectures (SOA), and Cloud Systems.”

The info sources listed below often apply the above perspective and techniques since they generally serve to facilitate the timely development of secure IoT systems, as well as high quality IoT systems.

To get you started, consider what happened with weak IoT security on October 16, 2016 — Hacked Cameras, DVRs Powered Today’s Massive Internet Outage, by Brian Krebs.  We can do better than that!  Below are the most useful sources of information on security of IoT systems that I’ve encountered in 2016.

General IoT Security Info Sources

First, if you only have time to consult one of the info sources listed in this blog, make sure it is viewing the recommended parts of the following video.  That is where you will initially get the greatest return for the time you spend.  This video provides an excellent overview of key technology agnostic concepts and techniques in IoT security: Secure your IoT with Azure IoT by Arjmand Samuel of Microsoft.  It shows a presentation at Microsoft’s Ignite conference in September 2016.  The first quarter of the video (about 10 minutes) is an overview of the key general security issues in IoT, including the roles and concerns of various stakeholders.  I found it most helpful, identifying the specific challenges of why IoT security is hard.

Then it presents an excellent mental model of a “Trustworthy Internet of Things”, with pressure put on any IoT system by the Environment, Security Threats, Human Error, and System Faults.  Counteracting these pressures are the design and implementation of the IoT system’s aspects of Security, Privacy, Safety, and Reliability throughout the entire system.  I believe this mental model, along with the roles of various stakeholders, are key concepts to drive the effective design and execution of the planning, development and operation of a solid, secure IoT system.

The middle part of the video outlines specifically how various Microsoft technologies fit into this model.  It spans the Windows 10 IoT operating system down at the “things” level, to all the way up to the preconfigured Azure cloud IoT Suites available.  These IoT Suites are full cloud software systems specifically targeted at remote monitoring, or predictive maintenance, etc.

The last part of the video is a “must see”.  Starting at around 28 minutes is a super valuable description of the concept of “Defense in Depth”.   Plus, it shows how to use the STRIDE threat analysis model to systematically identify security threats and then counteract each one with a “Defense in Depth” approach.  I found the 10 minutes spent walking through an example of how to apply the STRIDE threat analysis model to be vital to being able to build strong security into an IoT system.  STRIDE is part of Microsoft’s long standing “Security Development Lifecycle” (SDL).  They use it internally on the software products and services they sell, plus they support their customers using it as well with free tools, videos and tutorials at SDL.  The SDL concepts and practices around STRIDE (as well as other areas in SDL) are largely technology agnostic.

Second is Clemens Vasters’ excellent blog article “Service Assisted Communication” for Connected Devices. This is a must read article since it shows specific ways how using services to communicate with devices can both dramatically simplify an IoT solution, while also significantly increasing its security.  One key concept used is that devices always act in the role of a client, calling services.  “Devices do not accept unsolicited network information.  All connections and routes are established in an outbound-only fashion.”  Mr. Vasters presents 7 fundamental principles.  They are backed up with a detailed exploration into device connectivity through a number of the network layers, exploring the strengths and weaknesses of various security techniques.

Third, the Microsoft article Internet of Things Security Architecture – This is mainly about technology agnostic security techniques.  It has a detailed example of using the STRIDE threat modeling analysis technique as a starting point to secure an IoT system.  It goes on to show how to design the architecture of various portions of an IoT system to counteract threats at each level, with a “defense in depth” perspective.  I consider this a must read article.

Fourth, the Microsoft article Internet of Things Security Best Practices – This deals with “Defense in Depth” and outlines the best practices of various roles in the IoT world.  For example, the roles of the IoT hardware manufacturer/integration, the IoT solution developer, etc.  This role based approach is useful in being able to focus on security concerns specific to key participants involved in developing and operating an IoT system.

Fifth, in June 2015 the Industrial Internet Consortium released its Industrial Internet Reference Architecture (IIRA) document (click to download a pdf).  It outlines the requirements and the conceptual system architecture needed to build industrial strength IoT systems.  This is about a lot more than hooking up your toaster to the internet!  The 5 founding members of IIC are AT&T, Cisco, GE, Intel, and IBM.  Note that most of them have deep experience in distributed systems and/or Cloud Systems.

Section 9 of this IIRA document, “Security, Trust, and Privacy”, gives extensive coverage to all aspects of IoT security.  Being familiar with the ideas, terms and techniques presented in Section 9 will give you a strong base in what is recommended by many of the leading, highly experienced companies in the IoT realm.  You can greatly advance your knowledge from their experience as expressed in this section.

Microsoft Specific IoT Technology Info Sources

Here are useful links to Microsoft IoT Security documentation generally focused on specific Microsoft technologies.  However, many of them also contain valuable general IoT security concepts that are technology agnostic.

First, the 11 page Microsoft white paper “Securing your Internet of Things from the Ground Up – Comprehensive built-in security features of the Microsoft Azure IoT Suite” (click to download a pdf) provides an introduction to Microsoft’s Azure IoT services using most of the concepts outlined above.  Here you will see them in action.  And, roughly the same information is presented in the online document Internet of Things security from the ground up.

Second, the online Microsoft document Securing your IoT deployment provides the details of securing Azure IoT systems in 3 security areas – Device Security, Connection Security, and Cloud Security.  It provides a more fine-grained-detail look at IoT security than most of the other info sources listed above.

Third, the minute details of device authentication and security credentials used by the Azure IoT Hub service are presented in Control access to IoT Hub.  This shows exactly how robust device security is achieved.

Finally, Azure IoT Hub Developer Guide provides a list of references to documents on over 15 topic areas concerning the use of the Azure IoT Hub.  You can use this as a guide to perusing the IoT Hub documentation.

I hope you benefit as much from the above info sources as I have.

George Stevens

Creative Commons License

dotnetsilverlightprism blog by George Stevens is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at dotnetsilverlightprism.wordpress.com.