Monday, February 29, 2016

Microsoft 70-486: Debug an Azure application

Exam Objectives


Collect diagnostic information by using Azure Diagnostics API and appropriately implement on demand versus scheduled; choose log types (for example, event logs, performance counters, and crash dumps); debug an Azure application by using IntelliTrace, Remote Desktop Protocol (RDP), and remote debugging; interact directly with remote Azure websites using Server Explorer.

Quick Overview of Training Materials


Cloud Diagnostics - Take Control of Logging and Tracing in Windows Azure

This post has about 4 billion screenshots, so I apologize in advance...

First, a word of warning


I created a few cloud services in my Azure account for the purposes of writing this post, and when I was done I left one of them up.  Big mistake.  Burned through a good chunk of my monthly credit, so now I'm on life support until next months credit comes.  So, just a friendly reminder to clean up the mess if you try any of this....  ᕙ(⇀‸↼‶)ᕗ


Collect diagnostic information (MSDN version)


In classic fashion, the Exam Ref gives a very high level summary of this topic without any real code to play with.  So I turned to the Microsoft article Enabling Diagnostics in Azure Cloud Services and Virtual Machines which provides a detailed, step by step walkthrough on how to actually do this. What a concept.  Though to be fair, the comments aren't kind to that walkthrough either, but it's still better than nothing, which is what the Exam Ref has to offer here...

So right out of the gate I stumbled a bit on this one.  Installed the Azure SDK (I'm using version 2.8 as of this writing), created the Azure Cloud Service, created the worker role... then I got stuck on the configuration step.  The instructions say "In Solution Explorer, double click on the WorkerRole1 properties file.  Ok, simple enough.  Now go to the "Configuration" tab... hmm, seeeeeem to have a problem here:

What configuration tab???

After a bit of sleuthing I discovered my mistake...

Oh... THAT tab...

So, first gotcha... the diagnostic configuration properties are in the cloud service, NOT the worker role project itself.  Course, I go through all the trouble of figuring that out only to notice after the fact that you only disable diagnostics if you are on Azure SDK 2.4 or earlier... *facepalm*... moving on.

Walked through the steps for deploying the worker role, got as far as this screen, and all seemed well...


Brief hiccup (didn't like the storage account, probably just a comm failure), republished and everything started as expected.  Below are the properties for the service (be sure to change to the "staging" slot):



The next couple steps involved doing some PowerShell, which I found rather annoying (the commenters on the article weren't too happy about it either).  I had to install Azure Powershell using the Web Platform Installer (thankfully the article links to another article with a link to the installer).  The real fun was Step 5 which requires creating a PowerShell script to install diagnostics on the worker role.  For both of these, in case you are a bit rusty on your Azure PowerShelling... you have to do a couple preliminary steps:
  • Add-AzureAccount
    • Lets you login to your Microsoft account
  • Get-AzureSubscription
    • Gets you all the metadata about your subscriptions... If you have more than one subscription you'll need the name to select which one to use
  • Select-AzureSubscription
    • If you have multiple, this picks which one to use
It took me a couple iterations of copy-paste corrections to get the PS script just right, but once it ran, I got the expected results.  I can see the log files through the management portal, but looking at the contents requires Visual Studio server explorer:






Collect diagnostic information (Built in way)


Based on all the belly aching and hand wringing over the MSDN article related to installing and using the new version of Azure Diagnostics, I did some more poking around and found a different article: Microsoft Azure Diagnostics.  A quick glance through it looked promising as there was no "Disable this then install that then fire up PowerShell" kind of hacky shenanigans going on.  So I figured I would work through that one, if for no other reason than to get a different perspective.  The article is from 2014, so it's a little older, and it shows it's age early.

Part 1 is basically a conceptual introduction, and Part 2 actually includes some code.  I dropped the first chunk of code into a fresh Azure Cloud Service project, and immediately ran into a bit of a problem...



The DiagnosticMonitor class was deprecated.  MSDN article confirmed that this was the case as of Azure Diagnostics 1.3 (the version we painstakingly installed above), in favor of configuration files.  What I found interesting is that I didn't have to go through the hoops described in the other article, so I wonder if you only have to install once... hmm...

Since it's deprecated anyway and the article author argues pretty heavily against it anyway, I didn't spend any more time on "imperative" or programmatic configuration.

The configuration file method (declarative style) was actually very easy to understand and implement.  For a worker role, simply put the file "diagnostics.wadcfg" in the root of the WorkerRole project.  It can also be configured in the Cloud Service project by going into the "Role" folder then changing the properties.  I threw in a bit of code from the Microsoft example to make sure it was generating events, and published to Azure.

Moments later this completed successfully

I threw my code onto Github: https://github.com/sabotuer99/AzureDiagnosticsExample, you'll just need to insert your own Subscription ID, Storage account name (instead of "deamontest") and storage account key.  But it worked for me out of the gate;  moments after the role started, the logging tables appeared in the storage account and I could browse them from Server Explorer in Visual Studio just as with the Microsoft example:



On Demand vs Scheduled


Scheduled transfers of log data are set up as part of the diagnostic configuration.


The transfer intervals can also be set directly in the configuration files.  Intervals are specified using ISO 8601 notation (credit to the Simple Talk article for this one).  Here is a code snippet from the above example:

  <PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT0M">
    <PerformanceCounterConfiguration 
      counterSpecifier="\Memory\Available MBytes" sampleRate="PT3M" />
    <PerformanceCounterConfiguration 
      counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT3M"/>
    <PerformanceCounterConfiguration 
      counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT3M"/>
  </PerformanceCounters>

"P" means a period of time, "T" means a time component, and "0M" means zero minutes.  So the data collected by the performance counters is transfered immediately.  We can see that the performance counter data is sampled every three minutes based on the "PT3M" designation.

"On Demand" transfers, on the other hand, are less well defined.  The Exam Ref sites programmatic methods but doesn't give any examples.  The article on Storing diagnostic data makes passing mention that transfers can be initialized programmatically for SDK 2.4 and earlier, but seems to imply that only the configuration (scheduled) method is supported after this version.  If programmatic transfers require the DiagnosticMonitor class this would make sense (since it was deprecated).

Part 3 of the Microsoft Azure Diagnostics articles on the Simple Talk blog covers three different ways that diagnostic configuration can be changed remotely, the first of which is a code based method.  The author created a console application that utilizes several classes from the Microsoft.WindowsAzure.Diagnostics namespace (DeploymentDiagnosticManager, PerformanceCounterConfiguration, BasicLogsBufferConfiguration, etc.) and all of them are currently marked as deprecated.

The second method mentioned is using Server Explorer in Visual Studio to connect to the role:


The last method the article mentions is third party tools, which I'm not going in to here.


Choose log types


Choosing the types and granularity of diagnostic data your application collects is going to depend on your particular circumstances.  The value of the information that can be gleaned from this data must be balanced with performance and resource considerations.  At the most verbose logging levels, the volume of data can be both expensive to store and difficult to interpret.

Logging data can come from a number of different sources.  I'll look at Application Logs, Windows Event Logs, Performance Counters, Infrastructure Logs, Event Tracing for Windows (ETW) Logs, as well as crash dumps and other file based logs. Application, Windows Events, and Infrastructure have the same log levels available:

  • Critical (1)
  • Error (2)
  • Warning (3)
  • Information (4)
  • Verbose (5)


Application Logs


The application logs are written by the trace listener active in the application.  The Cloud Service example project uses a template that automatically configures the DiagnosticMonitorTraceListener class to listen for trace events.  This is the app.config file in the WorkerRole:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.diagnostics>
    <trace>
      <listeners>
       <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, 
                  Microsoft.WindowsAzure.Diagnostics, Version=2.8.0.0, Culture=neutral, 
                  PublicKeyToken=31bf3856ad364e35"
            name="AzureDiagnostics">
        </add>
      </listeners>
    </trace>
  </system.diagnostics>
</configuration>

Calls to System.Diagnostics.Trace will populate this log.  This log is ultimately written to a table called WADLogsTable.  In the example app, a trace message is emited when the role instance is created, and every time the loop is executed:

public override void Run()
        {
            // This is a sample worker implementation. Replace with your logic.
            Trace.TraceInformation("WorkerRole1 entry point called");
 
            int value = 0;
 
            while (true)
            {
                Thread.Sleep(10000);
                Trace.TraceInformation("Working");

When I opened the table up in Server Explorer there were something like 1000+ pages of results, so you'll probably want to be judicious in what you save to the tracing logs.  Between this and the copious volumes of ETW logs generated by the example, I believe I maxed out my configured quota for table size (something like 2.7Gb).  So it's easy for logging data to get stupid big really fast.



Windows Event Logs


Windows events are emitted by the operating system.  Locally, these can be browsed using the Windows Event Viewer.  The following image illustrates the relationship between these local events and the types that can be configured through visual studio (Windows Azure obviously doesn't have a local analog):


The example application produced a handful of level 2 (Error) events from the Application Channel (exceptions, DLL load failures), while the Windows Azure channel produced some level 4 (Information) events related to role life cycle.



Performance Counters


The Azure documentation contains an entire article on creating and using performance counters.  Performance counters track the health of the application and the vm it lives on; information such as CPU usage, memory availability, number of successful and failed requests, etc.  Unlike some other logging data, there are not logging levels with performance counters.  It is simply a matter of selecting the resources you care about (or that are acting up).  This screenshot shows the resources that can be tracked:




Infrastructure Logs


From MSDN: ...logs generated by the underlying diagnostics infrastructure. The diagnostic infrastructure logs are useful for troubleshooting the diagnostics system itself. 

Log messages from the Diagnostic Infrastructure are saved to the table WADDiagnosticInfrastructureLogsTable.  My sample app produced a few, most of which were complaints about a counter not being found, plus a couple related to storage quotas.  I think these storage quota errors may have been pointing me to a genuine problem that was preventing new Cloud Services from saving any diagnostic data at all... but it may have just been passing wonkiness.  Hard to say.



Log Directories


The "Log Directories" tab on the diagnostics configuration allows you to copy the contents of logging directories to cloud storage.  IIS and Failed Request Logs are generated by IIS, and crash dumps also appear in this category in some screens (though in the configuration screen, crash dumps get their own tab).  Here is an example of some IIS logging:


If I had any idea how to force some failed requests I'd try that but alas, this will have to do...



ETW Logs


"ETW" is Event Tracing for Windows.  The MSDN article on Cloud Diagnostics provides a good overview of how it works.  It provides a vehicle for logging and tracing for Windows applications.

In the sample application, a simple EventSource is implemented that emits several kinds of events:

sealed class SampleEventSourceWriter : EventSource
 {
     public static SampleEventSourceWriter Log = new SampleEventSourceWriter();
     public void SendEnums(MyColor color, MyFlags flags)
     {
         if (IsEnabled())
             WriteEvent(1, (int)color, (int)flags);
     }// Cast enums to int for efficient logging.
     public void MessageMethod(string Message)
     {
         if (IsEnabled())
             WriteEvent(2, Message);
     }
     public void SetOther(bool flag, int myInt)
     {
         if (IsEnabled())
             WriteEvent(3, flag, myInt);
     }
     public void HighFreq(int value)
     {
         if (IsEnabled())
             WriteEvent(4, value);
     }
 
 }

The application can call these methods, which then saves events to several corresponding tables.  The exact dance that has to take place to make this all work isn't 100% clear to me, but I do know that it wasn't enough just to have the WadExample.xml in the worker role when I redeployed the app after monkeying around in the diagnostics config file.  I had to re-enable transfer of ETW logs in the Role's config (via Server Explorer):


The events generated were stored to tables named according to the configuration in the WadExample.xml file:


The "id" attribute in the <Event> element corresponds to the value passed as the first argument to WriteEvent in the example event writer class.



Crash Dumps


Dump files are snapshots of the app at the time the dump was created.  They can include what processes were running, what modules were loaded, and what was contained in the heap (memory). The MSDN article on the subject compares opening a dump file in Visual Studio to stopping at a breakpoint in the debugger (without the option to continue execution).  Such a snapshot of the application a the time of a crash has obvious value for determining the cause of a crash.  If enabled, crash dump files will be saved to a container called "wad-crashdumps" on the cloud storage instance being used to save diagnostic data.

As an experiment, I added the following method onto the Home controller for an MVC WebRole:

public void StackOverflow()
{
    var hurryup = "<4KB OF RANDOM STRING DATA>";
    hurryup = hurryup.ToLower();
    StackOverflow();
}

After waiting a friggin half an hour for the redeployment to finish, I blew it all up.  Navigated to example.cloudapp.net/Home/StackOverflow... waited a bit... then just got a "Connection Reset" error.  At first I wasn't sure if it even worked as intended... did the deployment finish completely?  I checked the diagnostic data and was not disappointed:


The infinite recursive nastiness not only created some crash dumps, but a few Windows Events as well (which were appropriately cryptic and unhelpful).  Visual Studio allowed me to download and examine the dump files, noting that "The thread used up its stack." Now that's what I'm talkin bout!




Debug an Azure application


In order to debug a deployed application remotely, it has to be configured to allow that.  This is set up as part of the publishing settings:


The next step is to attach a debugger to the Role instance.  This is done through server explorer:


Now you can set break points in Visual Studio, and hit code in Azure.  Here I hit the "Contact" method on the home controller and the breakpoint caught it:


Microsoft's article on remote debugging cautions against staying on a breakpoint for too long, as it can result in Azure shutting down the Role instance (thinking it had hanged). While the article makes it sound like the debugger will attach to the right process automatically, you must select the right process.  When I attached the debugger to WaIISHost.exe, it loaded the symbols for my app (so my breakpoint looked "live"), but when I hit the controller methods, nothing happened.  When I reattached to w3wp.exe, the breakpoint stopped execution as expected.  For Worker Role, attach to WaWorkerHost.exe.  The example app was easy enough to test since it ran through the loop every ten seconds:


While not required, it would probably be a good idea to use the "Staging" environment for a real application.  Running with the debugger enabled can incur a performance hit, nevermind stopping execution at breakpoints and such.

The Exam Reg and Microsoft's documentation indicates that it is also possible to download Intellitrace logs from each role instance, but despite apparently having the correct configuration, I don't seem to see the option to download the logs in Server Explorer.... scratch that, you have to go alllll the way down to the instance to get the option to download Intellitrace logs:


Intellitrace logs can be collected and downloaded even with Diagnostics disabled (a fact I stumbled on right as I found where to download them lol).  When you trigger an Intellisense log download, they will be copied to the cloud storage account specified in the publish settings (along with all the rest of the diagnostic data), into a container in Blobs called "vsdiagnostics".  In my logs you can see a couple exceptions I threw from a test method on the home controller (all it does is throw this exception):


It is also possible to connect to the Role instance via Remote Desktop (from the same right click menu we download the Intellisense logs from).  Of course, I hadn't actually enabled Remote Desktop connections in the publish settings, so when I first tried this I got an error.  After enabling it (and setting a username, password, and account expiration) and redeploying the app (and the requisite half an hour wait... uuugggghhhhh...), I was... still unable to successfully connect. If I cared I might beat this part to death, but since all the logs from the local instance are transfered to cloud storage anyway I see this as being a pretty low value proposition.  If it ever proves to be absolutely vital to something, I'll figure it out and share.  Moving on...


Interacting with Server Explorer


I touched on this a bit already, and nearly omitted a stand alone section on this, but I thought it was a good idea to visit this just a bit.  Using Server Explorer to interact with the application instances allows you to manipulate the level of diagnostics you are capturing without having to redeploy the code or even restart instances.  Diagnostic data can be viewed by navigating to the instance:

Cloud Services -> {SERVICE NAME} -> Production/Staging -> {ROLE NAME} -> {INSTANCE}

... probably easier to see on a screenshot:


This allows you to easily explore some of the blob data from IIS and failed requests, as well as ETW, Application, Infrastructure, and Event logs, as well as Crash dumps.  As you can see, it is also possible to update or disable diagnostics.  I tried updating the Application logs for my WorkerRole in this example, which had been emiting a Warning, Error, and Information trace on each iteration of the run loop.  After setting the log level to "Error", only the Error messages were saved:



It's worth noting that the diagnostic data in the summary page is at the "Error" level, though the data collected may be at a lower level.  


No comments:

Post a Comment