Identifying a Rogue Azure Web Role Instance

When dealing with a web “farm” situation, there are various reasons a developer needs to identify which machine responded to a request, especially when dealing with particularly nefarious bugs.

Recently, an Azure production application I was monitoring was experiencing a problem with stale/old data showing up intermittently as users navigated the website. I began to suspect a single instance of the application did not have a properly updated cache on it. I needed to know which Web Role Instance was causing the problem so I could restart it. Obviously, there was a bug that would need to be tracked down in the future, but the immediate need was to stop the problem.

For lack of other information, I had to restart each instance of the web role individually, waiting for that one to come back up and move on to the next. I couldn’t trust the situation until every single instance was restarted.

I eventually found that bug and fixed it, but I wanted to mitigate this type of situation in the future. At first, I thought about adding an additional “standard” field to our JSON structures that showed which role instance handled the request, but realized that wouldn’t help us if a regular web call or failed WebAPI call was made. In order to address every kind of HTTP request, I chose to add an HTTP Header called “Azure-WebRole-Instance” to every web response.  This way, we’re covered in every scenario, since all the calls are HTTP calls.

I wrote a simple HttpModule to add this header.  The code, in its entirety, follows:

using System.Web;
using Microsoft.WindowsAzure.ServiceRuntime;

namespace AppliedIS.Web.Modules
    /// <summary>
    /// Append "Azure-WebRole-Instance-ID" HTTP Header to all responses.
    /// </summary>
    public class WebRoleInfoModule : IHttpModule
        static bool _isAzure;
        static string _instanceID;

        static WebRoleInfoModule()
            _isAzure = RoleEnvironment.IsAvailable;
            if (_isAzure)
                _instanceID = RoleEnvironment.CurrentRoleInstance.Id;

        public void Init(HttpApplication context)
            if (_isAzure == true)
                context.PostRequestHandlerExecute += (sender, e) =>
                    HttpContext httpContext =
                    HttpResponse response = httpContext.Response;
        public void Dispose() { /* Not needed */ }

Adding this module into a project is simple. Just add this to your web.config:

 <add name="WebRoleInfo"
 type="AppliedIS.Web.Modules.WebRoleInfoModule, AppliedIS.Web"/>

Since the code also checks to make sure that we’re running in Azure, this won’t adversely affect the application when it’s running in a non-Azure environment.

Here is the result running inside the Chrome development tools:

Now you can track down that rogue instance in order to keep your production sites running properly, so you can then go track down the real problem in your code with proper instrumentation of your code….but that’s a whole separate blog post!