Skip to content

Azure Application Insights – Quick and Easy Service Performance and Health Monitoring

September 11, 2016

I am getting solid productivity increases in my microservice development process by using Azure Application Insights (AAI) to monitor the performance and health of services running in the cloud or on-premises.  Not only can I quickly put AAI to productive use in the code-test-debug cycle to see where performance bottlenecks are.  I can also use it when my services are in normal operation to monitor their health daily (and send me email alerts), and over weeks and months via graphs and charts.  These capabilities are easily available, and require very little code be written.  Indeed, many health indicators have their telemetry data automatically generated by the AAI dlls added to a service project.

Microsoft describes Applications Insights as follows — “Visual Studio Application Insights is an extensible analytics service that monitors your live“ service or “web application.  With it you can detect and diagnose performance issues, and understand what users actually do with your app.  It’s designed for developers, to help you continuously improve performance and usability.  It works for apps on a wide variety of platforms including .NET, Node.js and J2EE, hosted on-premises or in the cloud”, from Application Insights – introduction.

In summary, Azure Application Insights is a software developer/dev-ops Business Intelligence (BI) package.  Similar to BI in other realms, AAI allows one to easily and quickly compose and visualize charts and graphs of key indicators of the “business” of software development and operations, and also drill down into the minute details with other charts and lists of detailed data.  I am really impressed with how quickly one can come up to speed and productively use it.

AAI has charts, searches, and analyses available both in Visual Studio and the Azure Portal.  When you need health alerts sent by email, long term charts and graphs, and an easy to use query language to search through your health telemetry data, use the Azure Portal.  Visual Studio’s Application Insights capabilities provide good performance and usage oriented charts (with drill down capabilities) and searches available during debug test runs without leaving Visual Studio. 

The following are examples of some of the basic Visual Studio AAI displays involving performance analysis that can be had without very much work on your part writing the code to generate the telemetry data and/or to display it in a useful way.

Below, Figure 1 is an example of an AAI chart I’ve found highly useful in pinpointing the source of performance bottle necks.   This chart is available via the Visual Studio Application Insights toolbar by clicking on the “Explore Telemetry Trends” menu item which displays an empty chart.  You then must click the “Analyze Telemetry” button to generate the display.  Note how you can set up the chart to display various “Telemetry Types” and “Time Ranges”, etc.

aai-figure1Figure 1 – Visual Studio’s Explore Telemetry Trends:  The Analyze Telemetry display.

If you double click on one of the blue dots in Figure 1, you’ll start a “drill down” operation that will open up a “Search” display shown below in Figure 2.  This display lists all the individual measurements that have been aggregated into the dot you double clicked on.  And in a pane to the right (not shown) it lists the minute details of the item your cursor is on.   Also note that you can use check boxes to the left and above to further refine your search.   Figure 2 below shows the drill down display you get from double clicking on the 1sec – 3sec small blue dot at the Event Time of 4:48 in Figure 1.

aai-figure2Figure 2 – Visual Studio’s Explore Telemetry Trends:  The Drilldown Search display.

The displays in Figure 1 and 2 show the aggregation and breakdown of the elapsed time it takes for a single WCF service to complete about 100 dequeue operations from an Azure Service Bus Queue using the NetMessagingBinding in ReceiveAndDelete mode.  After the service dequeues a single item, it checks to see if the item is valid, and then saves it in Azure Table Storage.  You can get a link to the service code from this blog article SO Apps 2, WcfNQueueSMEx2 – A System of Collaborating Microservices.  This code does not have the telemetry generating code present.

Therefore, from the point of view of Application Insights there are a couple relevant things to measure in this service:

  • The total elapsed time of the “request”, from the start of the service operation until it executes its return statement. This data is generated by a few lines of telemetry code that I had to write.  The telemetry code uses the TelemetryClient.Context.Operation, TelemetryClient.TrackRequest(), and TelemetryClient.Flush() provided by the AAI dlls added to the service project.  These are described in Application Insights API for custom events and metrics in the “Track Request” section.  The telemetry code also uses the System.Diagnostics.StopWatch to record total elapsed time of a service operation.
  • The elapsed time it takes for each of the 2 “dependencies” (aka external service calls) to execute. The external dependencies are the Azure Service Bus and Azure Table Storage.  Specifically one dependency is the Service Bus Dequeue operation.  The other dependency is the Table Storage Save operation.  In both cases the dependency elapsed time is automatically measured by the Application Insights dlls, and this data is automatically sent as telemetry as well.  I did not have to write any code to support dependency analysis.  All the work is done by the 5 or 6 Application Insights dlls that are added to a service project via NuGet. This “automatic telemetry” may or may not require .NET 4.6.1.  Many of the “automatic” performance monitoring features require that the “Target Framework” of a service’s project be set to .NET 4.6.1.  You can use lower versions as well, but may not get so many automatic measurements.  Note that many .NET Performance Counters are automatically generated and sent out as telemetry as well.

Figure 1 measures the first item, the total elapsed time of the request, from start to finish including the elapsed time of any dependencies.  Figure 1 shows 2 performance test runs – One at Event Time of 4:23 and the other at Event Time of 4:48.  It is obvious that the run at 4:48 (at the right of the chart) had the vast majority of the service requests complete in <= 250 milliseconds.  That is fast!

In the 4:23 run (at the left of the chart) the majority of the service requests took between 500 milliseconds and 1 second to complete.  That is much longer.  Why?  The 4:23 run (at the left) had the WCF service running on my development system, while the 4:48 run (at the right) had the service running in an Azure WorkerRole.  It is not surprising to see much faster elapsed times in the cloud since the overall network latency is much, much less there  when the service does Service Bus and Table Storage operations.  Plus, there is more CPU power available to the Azure based service since the WorkerRole host did not also have to run the test client as well.  Both runs had the test client running on my desktop development system in my office, using a single thread enqueuing 100 items one after another.

Being able to quickly separate the execution time of the service code from the code it depends upon is key to rapidly pin pointing the source of performance problems.  From Figure 2 ‘s drill down display you get from double clicking on the 1sec – 3sec small blue dot at the Event Time of 4:48, you can clearly see where the slowness in these two independent dequeue and save operations came from – One was entirely due to the slowness of the service code, while another was largely due to the slowness of the Service Bus during that service operation.

Figure 3 below shows the drill down display you get from double clicking on the 3sec – 7sec small blue dot at the Event Time of 4:48.  Note that the source of slowness this time is NOT due to the Service Bus nor Table Storage dependencies, but rather solely due to the service code itself.  Perhaps there was some thread or resource contention going on between service instances here that deserves further investigation.  AAI has the capability to aid in pinpointing these sort of things as well, but is not covered here.

aai-figure3Figure 3 – Visual Studio’s Explore Telemetry Trends:  Another Drilldown Search display.

The above displays (and more) are available in Visual Studio.  And you get even more displays and capabilities in the Azure Portal via the Application Insights resource that collects and analyzes the telemetry sent from services and clients. 

Please see the following links for more info on AAI:

Visual Studio Application Insights Preview

Application Insights – introduction

Application Insights for Azure Cloud Services

WCF Monitoring with Application Insights — With the dll that comes with this you do not have to write the request tracking code yourself.  It takes care of that for you, providing code-less performance telemetry data.

More telemetry from Application Insights

Application Insights API for custom events and metrics

I hope this introduction to Application Insights spurs you to further investigate it’s capabilities and how it can be useful to you.

George Stevens

Creative Commons License

dotnetsilverlightprism blog by George Stevens is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at dotnetsilverlightprism.wordpress.com.

Advertisements
Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: