Gaurav Mantri's Personal Blog.

Keeping Your Azure OpenAI App Running Smoothly During Service Interruptions

In this post, I’ll walk you through a simple yet effective approach we use at Purple Leaf to ensure our application stay online, even when Azure OpenAI service faces throttling or downtime. By deploying Azure OpenAI in multiple regions and implementing a smart failover strategy, we’re able to provide a seamless experience for our users, regardless of unexpected service disruptions.

Here’s how it works:

1. Deploy Azure OpenAI Across Multiple Regions

We deploy Azure OpenAI service in at least two regions — one as the primary and another as the secondary.

2. Failover Logic

With both regions set up, our application needs a way to switch smoothly between them. For example, if primary region fails to process a request, the system should automatically retry the request with the secondary region. This approach ensures that, even if one service instance is down or throttled, your application can continue running without significant interruptions.

By deploying Azure OpenAI service across multiple regions and integrating this failover strategy, we protect our application from unexpected downtime and keep our users happy.

In the next sections, I’ll provide sample code to demonstrate how to implement this approach effectively.

Code

export default class AzureOpenAIHelper {
  private readonly _azureOpenAIPrimaryClient: AzureOpenAI;
  private readonly _azureOpenAISecondaryClient: AzureOpenAI;

  constructor() {
    this._azureOpenAIPrimaryClient = new AzureOpenAI({
      endpoint: "primary-end-point",
      apiKey: "primary-end-point-api-key",
      deployment: "primary-deployment-id",
      apiVersion: "primary-api-version",
    });
    this._azureOpenAISecondaryClient = new AzureOpenAI({
      endpoint: "secondary-end-point",
      apiKey: "secondary-end-point-api-key",
      deployment: "secondary-deployment-id",
      apiVersion: "secondary-api-version",
    });
  }

  async processChatRequest(systemMessage: string, userMessage: string, isPrimary: boolean = true, attempt: number = 1) {
    try {
      const client = isPrimary
        ? this._azureOpenAIPrimaryClient
        : this._azureOpenAISecondaryClient;
      const messages = [
        { role: 'system', content: systemMessage },
        { role: 'user', content: userMessage}
      ]  
      const result = await client.chat.completions.create({
          messages,
          model: model,
          response_format: { type: 'text' },
        },
        {},
      );        
      return result;
    } catch (error: unknown) {
      if (attempt < 4) {
        return await this.processChatRequest(
          systemMessage,
          userMessage,
          !isPrimary,
          attempt + 1,
        );
      } else {
        throw err;
      }
    }
  }
}

In the class constructor, we create instances of both primary and secondary Azure OpenAI services.

processChatRequest method makes the call to the Azure OpenAI service. By default this method connects to the primary service.

If the call fails for whatever reason, we execute the same method but this time we are connecting to the secondary service. If the call fails again, we fall back to primary.

We do this a limited number of times (4) before we give up and throw an exception.

Summary

When working with services like Azure OpenAI, throttling and downtime can potentially bring your entire application down. That’s why having a plan for redundancy isn’t just nice to have — it’s essential.

By deploying Azure OpenAI in multiple regions and using a simple failover strategy, you’re not only adding a layer of resilience but also ensuring your app keeps running smoothly, even when things go wrong.

This setup has been a super helpful for us at Purple Leaf, helping us deliver a stable, reliable experience without leaving our users hanging.

I hope you have found this blog post useful. Please do share your thoughts on how you are handling Azure OpenAI service issues.

Is SaaS Dead? I Don’t Think So!

Writing software has become incredibly easy today, thanks to the proliferation of AI-powered code-generation tools like GitHub Copilot, Kursor, Vercel's V0, and GPT Engineer. Combine that with the rise of low-code/no-code platforms, and it's no … [Continue reading]

Smart To-Do Creator: Combining the ease of Azure Logic Apps and the power of Generative AI

In this post, I am going to talk about how I built a smart to-do creator using Azure Logic Apps and Generative AI (Azure OpenAI service). I recently took a course of LinkedIn Learning about Azure Logic Apps (called Azure Logic Apps - Building … [Continue reading]

Azure Sidekick – An AI Assistant to Answer Questions About Your Azure Resources (Part III – Lessons Learned)

Best way to learn a new technology is by building something (regardless of how big or small it is) with it. This was my primary intent behind building Azure Sidekick. I had so much fun building this and learnt a lot of things along the way. Not … [Continue reading]

Azure Sidekick – An AI Assistant to Answer Questions About Your Azure Resources (Part II – Prompt Patterns & More)

In my previous post about Azure Sidekick, I gave a general introduction about the tool and its capabilities. If you have not read that post so far, I would strongly encourage you to read that first. You can read that post here. In this post, I … [Continue reading]

Azure Sidekick – An AI Assistant to Answer Questions About Your Azure Resources (Introduction)

I am pleased to present to you Azure Sidekick, an AI assistant that can answer questions about the resources running in your Azure Subscriptions. Unlike my other posts which are quite code heavy, in these series of posts about this tool, I will … [Continue reading]

Microsoft Semantic Kernel – Some Tips & Tricks To Get Prompt & Completion Tokens

In my previous post, I talked about how you can get rendered prompts. In this post, I am going to talk about ways to get prompt and completion tokens when using Microsoft Semantic Kernel. What are Tokens? Let's first begin with what tokens … [Continue reading]

Microsoft Semantic Kernel – Some Tips & Tricks To Get Rendered Prompts

When you start building a new AI application, most likely you start with a very simple prompt where you write everything you need to do in that prompt only. However, as the application grows, you write more prompts and that's when you start … [Continue reading]

Writing prompts is hard. Luckily, there’s an easy way out!

In any Generative AI application, prompts are the heart and soul of the application. To get the most out of an LLM, every Gen AI developer must write effective prompts. Problem But writing prompts is hard! Believe me, it is hard :). In fact, if … [Continue reading]

Using OpenAI Function Calling with Microsoft Semantic Kernel

In this post we are going to see how we can use OpenAI's Function Calling feature with Microsoft Semantic Kernel. Context To explain the concepts in this post, let's set the context. Let's say that you are building an AI application that helps … [Continue reading]