GauravMantri.com

Smart To-Do Creator: Combining the ease of Azure Logic Apps and the power of Generative AI

Gaurav — Sun, 17 Mar 2024 18:52:42 +0000

In this post, I am going to talk about how I built a smart to-do creator using Azure Logic Apps and Generative AI (Azure OpenAI service). I recently took a course of LinkedIn Learning about Azure Logic Apps (called Azure Logic Apps – Building solutions for data and integration) and thought I would put my learning to a good use by building something small yet useful (best way to learn new is by building something using it, right?) and that’s how I ended up creating this tool.

What is Smart To-Do Creator?

Simply put, this tool basically creates tasks out of your emails. It reads the contents of your email and then somehow infers that the sender wants you to do some task for them, creates a task, and then saves it in Microsoft To-Do application.

Architecture

Architecture for this application is fairly simple and is shown in the picture below.

It makes use of Azure Logic Apps and Azure OpenAI Service.

Azure Logic App is connected to your Office 365 account using Outlook connector. As soon as an email comes in, it extracts metadata about the email (like sender, subject and body) and send it to an HTTP connector.

HTTP connector is nothing but an HTTP triggered Azure Function. When it is triggered, it sends the input data to Azure OpenAI and asks it to understand the email and see if a task can be created using the email.

It relies on a Large Language Model (LLM) text comprehension and reasoning capabilities. Through clever prompt engineering, it outputs a JSON object containing the task details like the task subject, description and due date and feeds that to a To-Do connector.

To-Do connector basically takes this data and creates a task for the user.

Test

To test it, I sent myself an email asking me to get some stuff (for me ) during my upcoming trip to India and surprisingly it worked really well. When the workflow finished, I had a task assigned to me with a meaningful title, a concise description of the task and an expected end date (even though I did not specify an exact date).

Pretty neat, huh!

Code

I stitched together the whole solution in less than 4 hours (out of which I spent about an hour getting Logic App service configured correctly). I am pretty sure that this code can be improved considerably but sharing it nonetheless.

Workflow Code

{
    "definition": {
        "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
        "actions": {
            "Add_a_to-do_(V3)": {
                "inputs": {
                    "body": {
                        "body": {
                            "content": "@{body('Parse_JSON')['task']}",
                            "contentType": "html"
                        },
                        "dueDateTime": {
                            "dateTime": "@body('Parse_JSON')['dueDate']",
                            "timeZone": "UTC"
                        },
                        "reminderDateTime": {
                            "timeZone": "UTC"
                        },
                        "title": "@body('Parse_JSON')['title']"
                    },
                    "host": {
                        "connection": {
                            "referenceName": "todo"
                        }
                    },
                    "method": "post",
                    "path": "/lists/@{encodeURIComponent('Some Base64 encoded string')}/tasks"
                },
                "runAfter": {
                    "Parse_JSON": [
                        "SUCCEEDED"
                    ]
                },
                "type": "ApiConnection"
            },
            "HTTP": {
                "inputs": {
                    "body": {
                        "body": "@{triggerBody()}",
                        "from": "@{triggerBody()?['from']}",
                        "subject": "@{triggerBody()?['subject']}"
                    },
                    "method": "POST",
                    "uri": "https://myfunctionapp.azurewebsites.net/api/HttpTrigger1"
                },
                "runAfter": {},
                "runtimeConfiguration": {
                    "contentTransfer": {
                        "transferMode": "Chunked"
                    }
                },
                "type": "Http"
            },
            "Parse_JSON": {
                "inputs": {
                    "content": "@body('HTTP')",
                    "schema": {
                        "$schema": "http://json-schema.org/draft-04/schema#",
                        "properties": {
                            "dueDate": {
                                "type": "string"
                            },
                            "task": {
                                "type": "string"
                            },
                            "title": {
                                "type": "string"
                            }
                        },
                        "required": [
                            "title",
                            "task",
                            "dueDate"
                        ],
                        "type": "object"
                    }
                },
                "runAfter": {
                    "HTTP": [
                        "SUCCEEDED"
                    ]
                },
                "type": "ParseJson"
            }
        },
        "contentVersion": "1.0.0.0",
        "outputs": {},
        "triggers": {
            "When_a_new_email_arrives_(V3)": {
                "inputs": {
                    "fetch": {
                        "method": "get",
                        "pathTemplate": {
                            "template": "/v3/Mail/OnNewEmail"
                        },
                        "queries": {
                            "fetchOnlyWithAttachment": false,
                            "folderPath": "Inbox",
                            "from": "email@domain.com",
                            "importance": "Any",
                            "includeAttachments": false
                        }
                    },
                    "host": {
                        "connection": {
                            "referenceName": "office365"
                        }
                    },
                    "subscribe": {
                        "body": {
                            "NotificationUrl": "@{listCallbackUrl()}"
                        },
                        "method": "post",
                        "pathTemplate": {
                            "template": "/GraphMailSubscriptionPoke/$subscriptions"
                        },
                        "queries": {
                            "fetchOnlyWithAttachment": false,
                            "folderPath": "Inbox",
                            "importance": "Any"
                        }
                    }
                },
                "splitOn": "@triggerBody()?['value']",
                "type": "ApiConnectionNotification"
            }
        }
    },
    "kind": "Stateful"
}

Function Code

Here’s the code for Azure Function. It’s really crappy code so please do not use it as is .

using System.Collections.Generic;
using System.Net;
using Azure;
using Azure.AI.OpenAI;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Azure.Functions.Worker.Http;
using Microsoft.Extensions.Logging;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.PromptTemplates.Handlebars;
using Newtonsoft.Json;

namespace ToDoListFunctionApp;

public static class HttpTrigger1
{
    private const string _azureOpenAIEndpoint = "https://xxx.openai.azure.com/";
    private const string _azureOpenAIKey = "aff9ad587352c904832fe6ed932ab30f";
    private const string _azureOpenAIDeploymentId = "gpt-4-32k";
    
    [Function("HttpTrigger1")]
    public static async Task Run([HttpTrigger(AuthorizationLevel.Function, "get", "post")] HttpRequestData req,
        FunctionContext executionContext)
    {
        var logger = executionContext.GetLogger("HttpTrigger1");
        logger.LogInformation("C# HTTP trigger function processed a request.");

        IncomingMessage message = null;
        using (var streamReader = new StreamReader(req.Body))
        {
            var requestBody = await streamReader.ReadToEndAsync();
            message = JsonConvert.DeserializeObject(requestBody);
        }

        if (string.IsNullOrWhiteSpace(message.From) || string.IsNullOrWhiteSpace(message.Subject) ||
            string.IsNullOrWhiteSpace(message.Body))
        {
            throw new InvalidOperationException();
        }

        var kernel = GetKernel();
        var path = Path.Combine(Directory.GetCurrentDirectory(), "Prompt.yaml");
        var function = kernel.CreateFunctionFromPromptYaml(await File.ReadAllTextAsync(path),
            new HandlebarsPromptTemplateFactory());
        var openAIPromptSettings = new OpenAIPromptExecutionSettings()
        {
            Temperature = 0
        };        
        var kernelArguments = new KernelArguments(openAIPromptSettings)
        {
            ["current_date"] = DateTime.UtcNow.Date.ToString("yyyy-MM-ddTHH:mm:ssZ"),
            ["sender"] = message.From,
            ["subject"] = message.Subject,
            ["body"] = message.Body,
        };
        var result = (await kernel.InvokeAsync(function, kernelArguments)).ToString();
        var response = req.CreateResponse(HttpStatusCode.OK);
        response.Headers.Add("Content-Type", "text/plain; charset=utf-8");

        response.WriteString(result);

        return response;
        
    }
    
    private static Kernel GetKernel()
    {
        var azureOpenAIClient =
            new OpenAIClient(new Uri(_azureOpenAIEndpoint), new AzureKeyCredential(_azureOpenAIKey));
        var kernelBuilder = Kernel.CreateBuilder();
        kernelBuilder.AddAzureOpenAIChatCompletion(_azureOpenAIDeploymentId, azureOpenAIClient);
        var kernel = kernelBuilder.Build();
        return kernel;
    }
}

class IncomingMessage
{
    [JsonProperty(PropertyName = "from")]
    public string From { get; set; }

    [JsonProperty(PropertyName = "subject")]
    public string Subject { get; set; }
    
    [JsonProperty(PropertyName = "body")]
    public string Body { get; set; }
}

class TaskDetails
{
    [JsonProperty(PropertyName = "title")]
    public string Title { get; set; }
    
    [JsonProperty(PropertyName = "task")]
    public string Task { get; set; }
    
    [JsonProperty(PropertyName = "dueDate")]
    public string DueDate { get; set; }
}

It makes use of Microsoft Semantic Kernel.

Prompt

This is the heart and soul of this entire application.

name: ToDoCreator
description: Use this function to analyze user's ask and create a task for the user .
template_format: handlebars
template: |
  
  Grounding Rules:
  ================
  - You are an AI assistant specializing in comprehending text and extract meaningful information from the text based on user's ask.
  - Maintain honesty. If uncertain of an answer, respond with, "I apologize, but I currently lack sufficient information to accurately answer your question.".
  - Uphold user privacy. Do not ask for, store, or share personal data without explicit permission.  
  - Promote inclusivity and respect. Do not engage in or tolerate hate speech, discrimination, or bigotry of any form. Treat all users equally, irrespective of race, ethnicity, religion, gender, age, nationality, or disability.  
  - Respect copyright laws and intellectual property rights. Do not share, reproduce, or distribute copyrighted material without the appropriate authorization.  
  - Provide precise and concise responses. Maintain a respectful and professional tone in all interactions. 
  - Wait for the user's question before providing information. Stay within your domain of expertise - text comprehension and extracting meaningful information from that text.  
  - Ensure responses are up-to-date and accessible. Avoid unnecessary jargon and technical language when possible.
  
  Rules of Engagement:
  ====================
  - User has provided you the information from the email they have received. The information contains three things:
    - 1. Sender information: Name of the person who has sent the email.
    - 2. Subject: Subject of the email.
    - 3. Body: Body of the email.
  - User is expecting you to analyze the body and the subject of the email and identify if the sender wants the user to do something in that email (task).
  - Your job is to create a task based on the information provided to you.
  - Please do not assume anything and only consider the information provided to you below.
  - ALWAYS produce the output in JSON format. The output JSON format should be: {"title": "task title", "task": "task details", "dueDate": "task due date"}.
  - You will be provided today's date. Based on that and from the task, infer a due date for the task. If due date cannot be inferred from the task, use a date one week from the today's date as task due date.
  - Task due date must always be outputted in ISO-8601 format (yyyy-MM-ddTHH:mm:ssZ). For example, 2024-01-01T15:00:00Z.
  - MOST IMPORTANTLY, If no task can be created based on the information provided, you must return an empty response. Do not make up the task.
  
  
  
  Today's Date
  ============
  {{current_date}}
  
  Sender
  ======
  {{sender}}
  
  Subject
  =======
  {{subject}}
  
  Body
  ====
  {{body}}
  
  
  
  Considering the information provided to you, please create a task and output it in the following format: {"title": "task title", "task": "task details", "dueDate": "task due date"}. If no task can be created, simply return an empty response.
  
input_variables:
  - name: current_date
    description: current date in yyyy-MM-ddTHH:mm:ssZ format.
    is_required: true
  - name: sender
    description: email sender information.
    is_required: true
  - name: subject
    description: email subject
    is_required: true
  - name: body
    description: email body
    is_required: true
execution_settings:
  default:
    temperature: 0

Summary

That’s it for this post. This is the first time I used Azure Logic Apps and I am genuinely impressed! For the first time, I have realized the convenience of No Code/Low Code platform. Combine that with the power of an LLM, we got ourselves a winner!

I will continue to explore this more and share my learnings.

Till then, be well and happy (low/no) coding!

The post Smart To-Do Creator: Combining the ease of Azure Logic Apps and the power of Generative AI appeared first on GauravMantri.com.

Azure Sidekick – An AI Assistant to Answer Questions About Your Azure Resources (Part III – Lessons Learned)

Gaurav — Fri, 16 Feb 2024 00:28:44 +0000

Best way to learn a new technology is by building something (regardless of how big or small it is) with it. This was my primary intent behind building Azure Sidekick.

I had so much fun building this and learnt a lot of things along the way. Not only I learnt more about Microsoft Semantic Kernel and Prompt Engineering, I also picked up a thing or two in .Net as well (AddKeyedSingleton and IAsyncEnumerable to name a few).

In my previous posts about this, I talked about the features and how I built it. You can read those posts here and here.

In this post I will talk about the lessons learned.

So let’s begin!

Lessons Learned

Here are some of the lessons I learned while developing this tool.

Writing prompts is hard

Writing great prompts is really-really hard. An LLM is a black box and we do not know how it is going to respond to our queries. Only solution to this problem is to write really good prompts. That requires one to have subject matter expertise as well as the ability to clearly communicate the ask, which unfortunately not many of us (including yours truly) possess.

Luckily, we can make use of LLMs to help us write great prompts. I wrote about how I used it building this tool in one of my recent posts that you can read here.

You really need to work with the LLM to get the right prompt. I spent countless hours just to get a single prompt working.

For example, when I was working on intent recognition prompt, just to get “Storage” intent returned correctly, I started by asking the question like “Given a question about Azure, how would you know that the question is about Azure Storage?”. LLM gave me some answer and then I asked follow up questions about its answer like “what about this” or “what about that” etc. until it gave me something I felt really comfortable with.

This is what it finally looks like:

Storage

Return this intent if the question:
- Keywords: Has keywords such as 'Azure Storage', 'Blob Storage', 'Queue Storage', 'File Storage', 'Table Storage', 'Disk Storage', 'Data Lake Storage', 'Backup', or 'Azure Storage Explorer'.
- Context: Has context regarding Azure's storage capabilities, types of storage, data management, or other aspects specifically related to Azure Storage.
- Features: Mention of specific features such as security, scalability, redundancy, data replication, or pricing within the Azure Storage ecosystem.
- Operation or Commands: Asking for details about operations or commands in Azure Storage, like creating a storage account, uploading or downloading data, managing data, etc.
- Troubleshooting: Looking for troubleshooting tips, specifically seeking solutions for issues or errors encountered while using Azure Storage.
- Comparisons: Seeking comparison between Azure Storage and other cloud storage services like AWS S3, Google Cloud Storage, etc. and also comparison between various Azure Storage services.
- Rules: Looking for rules, restrictions and naming conventions applicable to storage accounts, blob containers & blobs, queues & messages, tables & entities and file shares, directories & files.
- Storage resources: Finding information about storage accounts, blob containers & blobs, queues & messages, tables & entities and file shares, directories & files in an Azure subscription.
- "How to" questions related to Azure Storage.

Stop thinking like a developer

Us developers are very rational in our thinking. Somehow (through our experience or otherwise) it is hardwired in our brain that if we write code to add two numbers (and wrote it correctly), it will always give us the correct result. This is probably because we know how the programming languages work we kind of implicitly trust them.

Unfortunately, it is not the case when you are building an AI application. We do not anything about underlying LLM and how it would process our question and that creates a problem for us developers.

To solve this dilemma, stop thinking like a developer.

Do not make any assumptions like if I can write code that adds two numbers and give me proper result, LLM can also do the same.

Provide as much details as possible in the prompt even for the most trivial tasks. Do not leave anything behind.

When you are writing prompts, put on your “Business Analyst” hat on and write down each and everything. Think of prompts as writing requirements document for another developer who is going to write an application for you.

Streaming responses are always better

After I built the very first version, I made a small video about the application to share it with some folks. At that time, the application waited to receive complete response from the LLM before sending it to the user. As I was recording the video, I immediately realized that it is a very bad user experience (UX).

My first priority after that was to return the responses in a streaming way. It was much-much better UX as the user is immediately seeing the response from the LLM.

If you are building a chatbot kind of an application, please opt for streamed responses wherever possible. Find a hack if you have to, in order to achieve this. You will make your users very happy!

Don’t make it a chore

It’s more of an advise rather than my learning, but have loads of fun building what you are building (I know, I did). Don’t make it a chore!

Remember, you are building something because you do not have to but because you want to! There is a huge difference between “have to” and “want to”.

Because you are learning along the way, don’t be afraid of making mistakes. But keep on correcting your mistakes and keep on improving as you learn. For example, when I first implemented streaming response, I was not aware of “IAsyncEnumerable” and implemented it using events. When I came to know about it, I changed the code to make use of it. The code was much cleaner that way.

Summary

That’s it for this post and this series about Azure Sidekick. By no means I am done with this project (will be implementing planners shortly). As I make significant changes or learn new things, I will share them here.

In the next post, I am going to talk about a nifty little trick with Semantic Kernel I learned recently that could potentially lower your Azure OpenAI bill.

Keep learning and keep building!

The post Azure Sidekick – An AI Assistant to Answer Questions About Your Azure Resources (Part III – Lessons Learned) appeared first on GauravMantri.com.

Azure Sidekick – An AI Assistant to Answer Questions About Your Azure Resources (Part II – Prompt Patterns & More)

Gaurav — Wed, 14 Feb 2024 01:47:46 +0000

In my previous post about Azure Sidekick, I gave a general introduction about the tool and its capabilities. If you have not read that post so far, I would strongly encourage you to read that first. You can read that post here.

In this post, I will talk about the general flow of the application, prompt patterns used there and some other things.

So, let’s start!

Application Flow

From the outside, it looks like a very simple application: you ask it a question (about Azure or Storage accounts in your subscription) and then it answers it. However, a lot is happening behind the scenes and the application goes through multiple steps to answer your question.

1. Rephrase/Reword Step: When you ask a question, first thing the application does is that it tries to rephrase/reword the question by sending your question to LLM. LLM takes in the grounding rules, context and chat history (if available) into consideration and replies back with a rephrased question.

2. Intent Recognition Step: Next, the rephrased question is sent to the LLM for recognizing the intent of the question. It is a very important step because the routing happens based on the result of this and it also kind of serve as “input” guardrails.

3. Answering Step: This is the final step where based on the intent, appropriate plugins and tools are loaded and LLM provides and answer to the question.

However, depending on the intent there could be multiple steps involved in this step.

For example, if the question is about storage accounts, then the application will fetch the list of storage accounts from the selected subscription and then send that data to the LLM to generate an answer.

Similarly, if the question is about a specific storage account, then the application will first try to extract the storage account name from the question, fetch its properties and then send that data to the LLM to generate an answer.

I must confess that the code currently does not utilize the full potential of the LLMs especially its reasoning and planning capabilities. This is something I have started looking into recently and hopefully in next few releases, I will incorporate those.

Prompt Patterns

Now that you know how the application works, let’s talk about some of the prompt patterns I have used to build this application. I wrote a long post about prompt patterns sometime back which you can read here.

Here are the prompt patterns I have used so far:

Rephrase/Reword

What it does?

As the name suggests, it rephrases or rewords a user’s question so that it becomes easier for an LLM to understand and process. It makes use of grounding rules, context and chat history (if available) to rephrase a question.

Why it is needed?

During the course of development, I found many reasons as to why it is needed:

Incomplete question: A user may have asked an incomplete question. For example, the user may just write “Lighthouse” (when they wanted to ask about Azure Lighthouse). Rephrase prompt will produce an output like “Please tell me about Azure Lighthouse and its features”.

Lacking context: When we converse amongst ourselves, we can somehow understand the context of the question based on the previous questions however LLMs lack that capability. Chat history helps but relying on chat history alone is just not sufficient. Using this prompt pattern helps. For example, let’s say the user has first asked a question like “How many storage accounts are there in my subscription?” and then asks “Can you name them?”. Rephrase prompt will produce an output like “Please give me the names of the storage accounts that are there in my subscription?”.

Intent Recognition

What it does?

This prompt pattern is used to recognize the intent of the question i.e. to find out what the question is about. Is it about Azure, Azure Storage or something else, that’s what is determined using this prompt pattern.

Why it is needed?

There are many use cases for using this pattern. In context of this application, this pattern is used for the following things:

Routing: A user may be asking a general question about Azure, or a question concerning their storage accounts, or a question related to a specific storage account. The user’s question may be ambiguous or may not even be related to Azure. Using this prompt pattern will help the application identify the intent of the question and route it accordingly.
Input Guardrails: This pattern also help us enforce input guardrails. For example, let’s say you ask a general question like “What is the capital of India?”. Even though the LLM knows the answer to the question, because of this pattern, user will be informed that the purpose of this tool is to answer questions about Azure only. This becomes really important if you are building an AI application for your enterprise. You would want to constrain the application only to answer the questions related to your enterprise. If you need a proof of how bad it could be, please see this news about UK based parcel delivery company – DPD: https://www.bbc.com/news/technology-68025677.
Relevant chat history: Another use case we found very useful in our application is that we filter the chat history based on the intent of the question and pass that to the prompt.

Named Entity Recognition/Extraction

What it does?

This pattern is used to recognize and extract the named entities in a question. For example, if the user asks a question like “Can I host a static website in xyz storage account?”, this pattern will recognize that the question is about “xyz” storage account and extract “xyz” it from there.

Why it is needed?

Native code execution: In context of this application, we need this information to execute native code (get the properties of a storage account).

Grounding Rules

Next, I want to talk about setting the grounding rules. Grounding rules are very important in any AI application, as they set the overall constraints for your application. You can use these grounding rules to:

Give your application a persona;
Set the boundaries;
Enforce legal restrictions;
Overall tone of the responses etc.

For example, here are the grounding rules we use in the application:

- You are Azure Sidekick, an AI assistant specializing in Azure, tasked with providing accurate and knowledgeable responses to user inquiries about Azure.
- Maintain honesty. If uncertain of an answer, respond with, "I apologize, but I currently lack sufficient information to accurately answer your question.
- Uphold user privacy. Do not ask for, store, or share personal data without explicit permission. 
- Promote inclusivity and respect. Do not engage in or tolerate hate speech, discrimination, or bigotry of any form. Treat all users equally, irrespective of race, ethnicity, religion, gender, age, nationality, or disability.  
- Respect copyright laws and intellectual property rights. Do not share, reproduce, or distribute copyrighted material without the appropriate authorization.  
- Provide precise and concise responses. Maintain a respectful and professional tone in all interactions. 
- Wait for the user's question before providing information. Stay within your domain of expertise - Azure and related services.
- Ensure responses are up-to-date and accessible. Avoid unnecessary jargon and technical language when possible.

Chat History

Now, let’s talk about chat history. Simply put, chat history is the history of the conversation between the user and the AI application. Usually, it contains the question asked by the user and the answer given by the application.

Chat history is an important part of any LLM application as it helps setting a proper context for the question being asked and thus it increases the chances of getting better answer to the question.

There are a few things to keep in mind when using chat history in the prompts:

Use context-sensitive chat history: What I mean by that is use only that portion of chat history which is relevant to the question being asked. For example, if the user is asking a question about storage accounts, only include chat history related to storage in the prompt.
Avoid unnecessary chat history: Even with context-sensitive chat history, you may end up with many items in the chat history relevant to that context. However, that does not mean you pass everything to your prompt. Just pick last 3, 5 or 10 context-sensitive chat history items to the prompt.

Following these will ensure that you are not unnecessarily bloating your prompts and thus save on input prompt tokens :), without compromising on the functionality.

Goal Reminder

This is an interesting thing I have found. I noticed that when I reminded LLM about their goal, I got better results. The best way to do so is include a summarized goal at the end of the prompt.

For example, this is how I am ending the rephrase prompt:

Considering the information provided to you, 
please rephrase or reword the current question 
to increase its clarity and specificity for a 
language model. Consider identifying the 
key elements or concepts within the question, 
ensuring the language is precise, and avoiding 
any ambiguity or overly complex language. 
Remember to incorporate the context provided by 
the previous questions and answers. 
Your goal is to create a revised question that 
maintains the original intent, but is more 
easily understood by an LLM when considering 
the conversation history.

Summary

That’s it for this post. It turned out to be somewhat lengthy but I couldn’t help it as I had so much to share :). I hope you have enjoyed this post.

In the next (and concluding post about this), I will talk about the lessons learned while building this. Stay tuned for that.

Until then, be well!

The post Azure Sidekick – An AI Assistant to Answer Questions About Your Azure Resources (Part II – Prompt Patterns & More) appeared first on GauravMantri.com.

Azure Sidekick – An AI Assistant to Answer Questions About Your Azure Resources (Introduction)

Gaurav — Sun, 11 Feb 2024 01:55:06 +0000

I am pleased to present to you Azure Sidekick, an AI assistant that can answer questions about the resources running in your Azure Subscriptions.

Unlike my other posts which are quite code heavy, in these series of posts about this tool, I will take you through the journey I undertook while building this. I will talk about my motivations behind building this and some things I learned along the way.

In this post, I will talk about what this AI assistant is and what can it do today. I will talk about why I built it and then the tools and technology used to build this.

So, let’s begin.

What is Azure Sidekick?

As mentioned above, Azure Sidekick is an AI assistant that can (eventually) answer questions about the resources running in your Azure Subscriptions using natural language.

Currently following scenarios are supported:

General Azure Questions: It can answer your general Azure questions. For example, you can ask it to explain Azure Lighthouse to you or you can ask it to compare different messaging services available in Azure. For this, it relies on the knowledge of the LLM (so the information may be outdated). In future, I might integrate Bing search so that most current information is available when answering general Azure questions.
Questions about Storage Accounts: It can answer questions about storage accounts in an Azure Subscription. For example, you ask it to find storage accounts that are tagged/not-tagged/tagged with specific name/value pairs, or find storage accounts of certain kind etc. Basically, you can ask questions about storage accounts in your Azure Subscription based on its properties.
Questions about a specific Storage Account: It can answer questions about a specific storage account in an Azure Subscription. For example, you can ask it to find if a specific storage account can host static website, or find its location etc. Basically, you can ask questions about a specific storage account in your Azure Subscription based on its properties.
General Azure Storage Questions: Lastly, you can use it to get answers to general questions related to Azure Storage. For example, you can ask it to provide naming rules for a storage account, or limitations on the number of blob containers you can create etc.

Currently, the scope of this assistant is very limited but one thing to understand is that Azure is very vast with over 200+ services and support for more services will come in due course of time.

I prepared a small video (nothing fancy by any means) about it, that you can watch.

Why did I build it?

Simple answer, to scratch the itch to build an AI assistant from start to end :). I have been reading a lot about Generative AI and wanted to go through the “pain” of building an AI assistant to apply my learnings and to learn new things.

Before building this tool, I had some beliefs (like, what’s the big deal with prompts) and I was happy to see that some of them were a lot misplaced :). I also got an opportunity to test some of my hypothesis.

Building this tool taught me a lot and those were reflected in some of my recent posts and I will share those in my future posts as well.

Tools & Technologies

In its current form, Azure Sidekick is a console application written using .Net 8. For Generative AI, it makes use of Azure OpenAI (I used a GPT-4 model) and makes use of Microsoft Semantic Kernel for AI orchestration.

Sometime back I wrote an introductory post about Microsoft Semantic Kernel and at that time I thought what is the need for it when you have more robust and feature rich orchestration engines like LangChain available (one of my beliefs that got shattered :)).

After using it for some time now, I am pleasantly surprised with the progress it has made and how robust it is. If you are building a Gen AI application using .Net, please give it a serious try. I think you will not be disappointed.

Source Code

I am very happy to say that the tool is open source and is available on GitHub:

https://github.com/gmantri/azure-sidekick

My humble request is to try it out and let me know the feedback. It would be even more awesome, if we can collaborate on this.

Conclusion

That’s it for this post. I sincerely hope that you will use this. In the next post, I will talk about some things I learned while building this, so stay tuned for that.

So long!

The post Azure Sidekick – An AI Assistant to Answer Questions About Your Azure Resources (Introduction) appeared first on GauravMantri.com.

Microsoft Semantic Kernel – Some Tips & Tricks To Get Prompt & Completion Tokens

Gaurav — Wed, 07 Feb 2024 01:38:52 +0000

In my previous post, I talked about how you can get rendered prompts. In this post, I am going to talk about ways to get prompt and completion tokens when using Microsoft Semantic Kernel.

What are Tokens?

Let’s first begin with what tokens really are. In very simple terms, a token is the smallest unit of data that a Large Language Model (LLM) can understand and process. The data could be text, an image, a video, a sound clip or any other data.

When it comes to text, a token could be a single character (like “a”), a partial word (like “happiness” in “unhappiness”), a full word (like “apple”) or even a combination of words. The key thing to remember is that a token is the smallest unit of data that an LLM can understand.

When dealing with tokens, you will come across two terms: Prompt tokens and Completion tokens.

Prompt tokens are the tokens representing input prompt (i.e. the data being fed to an LLM)
Completion tokens are the tokens consumed by an LLM to process the prompt. It represents the data outputted by an LLM.

Why you should care about Tokens?

So, the next question is why you should care about them. Simple answer is, you pay for these tokens :). This is how the service providers make money. You are charged for both prompt and completion tokens. The pricing varies based on the service provider and the model you are using.

Because you are paying for the tokens, you have to be extra careful with them.

You should ensure that your prompts are complete (otherwise you will not get a proper result) but concise and to the point so that you don’t end up breaking the bank.

Through your prompts, you should also put some constraints on the desired output. You have to be very explicit in your prompts about the size and kind of data you expect an LLM to produce. For example, if you expect that your output should not be more than 500 characters, explicitly mention that.

How to get prompt and completion tokens?

So, how do you get prompt and completion tokens in your LLM application built using Microsoft Semantic Kernel?

I will show you fours ways to accomplish that. So, let’s start!

1. Hooking into Kernel Events (Obsolete, Not Recommended)

First way is by hooking into (Semantic) Kernel events. Kernel in Semantic Kernel exposes a FunctionInvoked event which gets fired when a function is invoked. You can consume this event to get the consumed tokens.

My code would be something like the following:

private Kernel GetKernel()
{
    var kernelBuilder = Kernel.CreateBuilder();
    var deploymentId = "your-azure-openai-deployment-id";
    AzureOpenAIClient client = GetAzureOpenAIClientSomehow();        
    kernelBuilder.AddAzureOpenAIChatCompletion(deploymentId, client);
    var kernel = kernelBuilder.Build();
    kernel.FunctionInvoked += (sender, args) =>
    {
        var metadata = args.Metadata;
        if (metadata == null || !metadata.ContainsKey("Usage")) return;
        var usage = (CompletionsUsage)metadata["Usage"];
        if (usage == null) return;
        var promptTokens = usage.PromptTokens;
        var completionTokens = usage.CompletionTokens;
        // do something with the tokens
    };
    return kernel;
}

However, you should not be using this approach as it has being marked as obsolete in the latest version. In fact, if you use this approach with version 1.3.0 of Semantic Kernel (which is the most current version at the time of writing this post), you will get a warning about not to use it.

2. Using Filters (Experimental)

This is another approach that you can take. I believe this feature was introduced recently and is recommended to use this approach over using kernel events.

Using filters is really easy. You basically create a custom filter class that implements IFunctionFilter interface and then implement OnFunctionInvoking and OnFunctionInvoked methods to suit your requirements. For example, I could simply write the prompt and completion tokens to console.

So my code would be something like:

private class FunctionFilter : IFunctionFilter
{
    public void OnFunctionInvoking(FunctionInvokingContext context)
    {
    }

    public void OnFunctionInvoked(FunctionInvokedContext context)
    {
        var metadata = context.Result.Metadata;
        if (metadata == null || !metadata.ContainsKey("Usage")) return;
        var usage = (CompletionsUsage)metadata["Usage"];
        if (usage == null) return;
        var promptTokens = usage.PromptTokens;
        var completionTokens = usage.CompletionTokens;
        // do something with the tokens
    }
}

And this is how I would wire up the filter in the Kernel:

kernel.FunctionFilters.Add(new FunctionFilter());

My complete code for kernel would be:

private Kernel GetKernel()
{
    var kernelBuilder = Kernel.CreateBuilder();
    var deploymentId = "your-azure-openai-deployment-id";
    AzureOpenAIClient client = GetAzureOpenAIClientSomehow();        
    kernelBuilder.AddAzureOpenAIChatCompletion(deploymentId, client);
    var kernel = kernelBuilder.Build();
    kernel.FunctionFilters.Add(new FunctionFilter());
    return kernel;
}

Please note that this is still experimental and may change (or even removed) in the future versions.

3. Using Function Result

If you are invoking functions in a non-streaming way i.e. you are waiting for the complete result to come back, you can make use of FunctionResult to get the tokens.

So my code would be something like:

var function = GetKernelFunctionSomehow();
var kernelArguments = GetKernelArgumentsSomehow();
var result = await kernel.InvokeAsync(function, kernelArguments);
var metadata = result.Metadata;
if (metadata == null || !metadata.ContainsKey("Usage")) return;
var usage = (CompletionsUsage)metadata["Usage"];
if (usage == null) return;
promptTokens = usage.PromptTokens;
completionTokens = usage.CompletionTokens;
// do something with tokens

4. Using 3rd Party Library

Approach 3 above works great if you are waiting for the entire response to come to your application however what I have noticed is that the user experience (UX) is not great in this case.

LLMs sends the response in a streaming way that is they spit out (partial) response as it becomes available and if possible, you should try to stream that response back to the user.

However, approach 3 above would not work in that case. From what I am told, Azure OpenAI does not even return the token usage as part of their response in this case and hence Semantic Kernel also does not provide this information.

In this case, what you can do is make use of 3rd party libraries. One such library that I have used is Tiktoken. You can use it to calculate the token consumption. There are many other libraries available like it.

So my code would be something like:

var prompt = GetPromptSomehow();
var function = GetKernelFunctionSomehow();
var kernelArguments = GetKernelArgumentsSomehow();
var result = kernel.InvokeStreamingAsync(function, kernelArguments);
StringBuilder responseStringBuilder = new StringBuilder();
await foreach (var item in result)
{
    var response = item.ToString();
    // store the partial response. We will use it in the end to calculate prompt and completion tokens.
    responseStringBuilder.Append(response);
}
var answer = responseStringBuilder.ToString();
var encodingForModel = Tiktoken.Encoding.TryForModel("model type e.g. gpt-4, gpt-3.5-turbo or gpt-35-turbo");
promptTokens = encodingForModel.CountTokens(prompt);
completionTokens = encodingForModel.CountTokens(answer);
// do something with tokens

Summary

That’s it for this post. I hope you have found it useful. My recommendation would be to use #3 or #4 approach based on your scenario (non-streaming v/s streaming). You can use #2 but definitely stay away from #1.

I would like to end this post with the same bit of warning I gave in my previous post: Semantic Kernel (and in general AI tools) are changing very rapidly (quite evident from the fact that kernel events are being deprecated within a few minor releases). I would highly recommend referencing official documentation for the most current functionality.

In the next post I will talk about the AI assistant I have been building. A lot of my learning came from building that AI assistant. I am pretty excited about it and can’t wait to share more with you.

Until then, Happy Coding!

The post Microsoft Semantic Kernel – Some Tips & Tricks To Get Prompt & Completion Tokens appeared first on GauravMantri.com.

Microsoft Semantic Kernel – Some Tips & Tricks To Get Rendered Prompts

Gaurav — Mon, 05 Feb 2024 04:01:49 +0000

When you start building a new AI application, most likely you start with a very simple prompt where you write everything you need to do in that prompt only.

However, as the application grows, you write more prompts and that’s when you start templatizing your prompts by extracting things that are common across all your prompts and pass them as template variables to the prompts.

This leads to better manageability of the prompts but you lose the clarity as to what gets sent to the LLM.

Because now the prompt is a template with a number of template parameters, simply by looking at the prompt text will not tell you, what is being actually sent to the LLM. Furthermore, you may want to log the prompts (and response from LLM) somewhere in your system so that you can analyze the performance of these prompts. For these reasons, you would want to have access to the prompts that are being sent to the LLM,

In this post, I will show you three ways in Microsoft Semantic Kernel using which you find out the exact prompt that is being sent to the LLM for processing.

Sample Prompt

Let’s consider the following prompt that I wrote for an application I am building. BTW, I wrote this prompt with the help of an LLM (you can read all about it here). I am writing my prompts in YAML format.

name: Rephrase
description: Use this function to reword an unclear question, considering previous context, for better comprehension.
template_format: handlebars
template: |
  
  Grounding Rules:
  ================
  {{#each grounding_rules}}
  - {{this}}
  {{/each}}
  
  The user has asked a question that may not be clear in its current form and may rely on the context of multiple 
  previous questions and answers. Your task is to rephrase or reword the question, taking into account the conversation history if available, 
  to improve its clarity for a Language Model (LLM) to answer it.
  
  Conversation History:
  ====================
  {{#each chat_history}}
  Question: {{Question}}
  Answer: {{Answer}}
  {{/each}}
  
  
  Current Question:
  =================
  {{question}}
  
  
  Considering the information provided to you, please rephrase or reword the current question to increase its clarity and 
  specificity for a language model. Consider identifying the key elements or concepts within the question, ensuring the 
  language is precise, and avoiding any ambiguity or overly complex language. Remember to incorporate the context provided 
  by the previous questions and answers. Your goal is to create a revised question that maintains the original intent, 
  but is more easily understood by an LLM when considering the conversation history.
  
input_variables:
  - name: question
    description: user question
    is_required: true
  - name: grounding_rules
    description: grounding rules for AI model to behave
    is_required: true
  - name: chat_history
    description: chat history
    is_required: true
execution_settings:
  default:
    temperature: 0

As you can see, my prompt template contains some template variables like question, grounding_rules, and chat_history that I am passing to the prompt. Semantic Kernel in turn parses the YAML, replaces these template variables with values that I pass in and then sends that prompt to the LLM.

Solution

So, how do we get the prompts. As I mentioned above, I will show you three ways by which you can get this information.

1. Hooking into Kernel Events (Obsolete, Not Recommended)

First way is by hooking into (Semantic) Kernel events. Kernel in Semantic Kernel exposes a PromptRendered event which gets fired when a prompt is rendered. You can consume this event to get the rendered prompt.

Your code would be something like the following:

private Kernel GetKernel()
{
    var kernelBuilder = Kernel.CreateBuilder();
    var deploymentId = "your-azure-openai-deployment-id";
    AzureOpenAIClient client = GetAzureOpenAIClientSomehow();        
    kernelBuilder.AddAzureOpenAIChatCompletion(deploymentId, client);
    var kernel = kernelBuilder.Build();
    kernel.PromptRendered += (sender, args) =>
    {
        Console.WriteLine($"Rendered prompt: {args.RenderedPrompt}");
    };
    return kernel;
}

2. Use Filters (Experimental)

This is another approach that you can take. I believe this feature was introduced recently and is recommended to use this approach over using kernel events.

Using filters is really easy. You basically create a custom filter class that implements IPromptFilter interface and then implement PromptRendering and PromptRendered methods to suit your requirements. For example, I could simply write the rendering and rendered prompts to console.

So my code would be something like:

private class PromptFilter : IPromptFilter
{
    public void OnPromptRendering(PromptRenderingContext context)
    {
    }

    public void OnPromptRendered(PromptRenderedContext context)
    {
        var prompt = context.RenderedPrompt;
        Console.WriteLine($"Rendered prompt: ${prompt}");
    }
}

And this is how I would wire up the filter in the kernel:

kernel.PromptFilters.Add(new PromptFilter());

My complete code for kernel would be:

private Kernel GetKernel()
{
    var kernelBuilder = Kernel.CreateBuilder();
    var deploymentId = "your-azure-openai-deployment-id";
    AzureOpenAIClient client = GetAzureOpenAIClientSomehow();        
    kernelBuilder.AddAzureOpenAIChatCompletion(deploymentId, client);
    var kernel = kernelBuilder.Build();
    kernel.PromptFilters.Add(new PromptFilter());
    return kernel;
}

Please note that this is still experimental and may change (or even removed) in the future versions.

3. Manual Way

Above 2 approaches would work great (though use only the 2nd approach and not the 1st one) however at times you would want to get the prompt inline in your application flow and not outside of it.

For example, the application I am building required me to calculate the prompt and completion tokens and send those back to the user as part of the response.

If your application has this kind of requirement, you can manually create the prompt from the prompt template by passing the arguments.

Here’s the code to do so:

var promptFileContents = await File.ReadAllTextAsync(promptFilePath);
var promptTemplateConfig = KernelFunctionYaml.ToPromptTemplateConfig(promptFileContents);
var factory = new HandlebarsPromptTemplateFactory();
if (!factory.TryCreate(promptTemplateConfig, out var promptTemplate)) throw new InvalidOperationException("Unable to create prompt template.");
var prompt = await promptTemplate.RenderAsync(kernel, kernelArguments);

Here, what I am doing is first reading the entire prompt template YAML file and then creating a PromptTemplateConfig from it. Because my prompt template uses handlebars templating, I am creating a HandlebarsPromptTemplateFactory and extracting the prompt template out of my prompt template configuration. I would then render the prompt by passing the kernel and the arguments.

The advantage of this approach is that I can get my prompt inline with my code flow and then use it any way I see fit.

However, because this approach parses the raw YAML file, it will not work if your prompt template calls other functions (say for example, calling a native function) inside it. So, please use this approach cautiously.

Summary

That’s it for this post. I hope you have found the information useful. Semantic Kernel (and in general AI tools) are changing very rapidly (quite evident from the fact that kernel events are being deprecated within a few minor releases), I would highly recommend referencing official documentation for the most current functionality.

Happy Coding!

The post Microsoft Semantic Kernel – Some Tips & Tricks To Get Rendered Prompts appeared first on GauravMantri.com.

Writing prompts is hard. Luckily, there’s an easy way out!

Gaurav — Mon, 29 Jan 2024 00:06:18 +0000

In any Generative AI application, prompts are the heart and soul of the application. To get the most out of an LLM, every Gen AI developer must write effective prompts.

Problem

But writing prompts is hard! Believe me, it is hard :). In fact, if you want to torture a developer, ask them to write prompts :D.

In order to write effective prompts, as a prompt writer:

You must have subject matter expertise.
You must think very clearly and
You must be able to communicate your thoughts very clearly as well in your prompts.

And, there in lies the problem.

I read a book on UX some time back and something that mentioned there stayed with me:

The problem with developers designing the UX of an application is that if they think they can use it, everybody else can.

This applies very well to us developers as well when we are building Gen AI applications. When we write prompts, we think as if we are going to answer the user questions and the same gets reflected in our prompts as well.

Recently, I was building an application where I want an LLM to rephrase the question asked by the user. As a developer, this is what I came up with:

You are an AI assistant expert in Azure and language comprehension.

Please rephrase the following question so that it is clear for you to answer.

While rephrasing, please remember the following things:
- It is ok to return the original question if it is already clear.
- Please use chat history given below to better rephrase the question.
 Chat history contains previous questions asked by the user.
- Do not answer the question while rephrasing.

Begin Chat History
==================
Question: User question
Response: LLM response
End Chat History
==================
User question: "How many storage accounts?"

As a developer, I thought this should be good enough for an LLM to answer. Unfortunately, that was not the case. At times, this would work but more than often I saw some random text outputted by the LLM.

So, how do we solve this?

Solution: Use LLM to create the prompt

Then it dawned on me, why not ask LLM to create the prompt. It makes sense to me for multiple reasons:

LLMs are the subject matter expert as they are trained on all kinds of data that we (as developers) may not be aware of.
When properly instructed and controlled, they can not only think clearly but also communicate properly as well.
Most importantly, ultimately it is the LLM that is going to answer the prompt so it would be completely logical for it to write the prompt in a language and include all the necessary details so that it can understand when it comes to executing the prompt.

Steps

Here are the steps I took for asking an LLM to come up with a prompt.

Persona Setup

I started by giving my LLM a persona (and stroked their ego a bit :)). This is what I did:

You are the world's best prompt engineer. 
In fact, no one can write prompts like you do.

Describe the Ask

Then, I clearly described the ask. At first, I provided very basic instruction like “Write a prompt to rephrase a question” and it came up with something like “Craft a prompt that ingeniously transforms a direct query into an artful, indirect expression“.

So, I put some more thoughts into it and came up with the following:

You are helping an AI developer with writing prompts. 
AI developer wants to write a prompt that would rephrase a 
question asked by the user of the application. 
User would ask a question that may or may not be clear. 
What user wants to do is use an LLM to rephrase/reword 
the question so that it becomes clear for the LLM to answer the rephrased/reworded question. 
Please be as comprehensive as possible when writing a prompt for this.

and this is what LLM came up with:

Title: Reframing and Clarifying User Questions for Improved 
LLM Understanding

Prompt:

An application user has asked a question that may not be clear in 
its current form. Your task is to rephrase or reword the question 
in a way that improves its clarity for a Language Model (LLM) 
to answer it.

Given Question: "{user_question}"

Please rephrase or reword this question to increase its clarity 
and specificity for a language model. Consider identifying the 
key elements or concepts within the question, ensuring the 
language is precise, and avoiding any ambiguity or overly 
complex language. Your goal is to create a revised question 
that maintains the original intent, but is more easily 
understood by an LLM.

Revised Question: ""

Much better, right?

However, I forgot to include about chat history so modified the prompt again like this:

Now let's say the question is based on some previous questions. 
For example, the user may have previously asked 
"What is the capital of India" to which LLM has 
responded with "New Delhi". 
Now the user has asked "How many states are in there". 
This question in itself is not complete and the previous 
question will be included in the chat history. 
Based on this information, can you rewrite the prompt?

LLM, replied back with the following:

Title: Contextual Rewording of User Questions for LLM Understanding

Prompt:

An application user has asked a question that may not be clear in 
its current form and also relies on the context of previous 
questions and answers. Your task is to rephrase or reword the question, taking into account the previous conversation history, 
to improve its clarity for a Language Model (LLM) to answer it.

Previous Question and Answer:
Question: "{previous_question}"
Answer: "{previous_answer}"

Current Question: "{user_question}"

Given this context, please rephrase or reword the current question 
to increase its clarity and specificity for a language model. 
Consider identifying the key elements or concepts within the 
question, ensuring the language is precise, and avoiding any 
ambiguity or overly complex language. Remember to incorporate 
the context provided by the previous question and answer. 
Your goal is to create a revised question that maintains the 
original intent, but is more easily understood by an LLM 
when considering the previous conversation.

Revised Question: ""

I realized that there can be multiple questions in the chat history that I did not mention in the previous prompt, so I submitted another prompt:

Now assume there are many previous questions and answers. Please revise the prompt based on this information.

LLM, graciously obliged me with the following:

Title: Contextual Rewording of User Questions Based on Conversation History for LLM Understanding

Prompt:

An application user has asked a question that may not be clear in its current form and relies on the context of multiple previous questions and answers. Your task is to rephrase or reword the question, taking into account the conversation history, to improve its clarity for a Language Model (LLM) to answer it.

Conversation History:
"{previous_questions_and_answers}"

Current Question: "{user_question}"

Given this context, please rephrase or reword the current question to increase its clarity and specificity for a language model. Consider identifying the key elements or concepts within the question, ensuring the language is precise, and avoiding any ambiguity or overly complex language. Remember to incorporate the context provided by the previous questions and answers. Your goal is to create a revised question that maintains the original intent, but is more easily understood by an LLM when considering the conversation history.

Revised Question: ""

So, so much better than my original prompt :).

Test

I was quite satisfied with the prompt. However before I could use it in the application, I wanted to see if the prompt is working fine.

So I used LLM again and fed it some questions and asked it to reply to those questions using the prompt it wrote.

Here’s what I asked:

Now let's test the prompt above with the following questions. 
Your task is to take each question and run the prompt and 
output both the prompt and the result. 
With each question, include the previous questions and 
responses in the chat history. Here are the questions:  
1. Does xyz storage account has tags?  
2. what is its type?  
3. Where it is located?  
4. What are the services supported by it?  
5. Please give me the storage endpoints. 
6. Does abc storage account has tags?  
7. what is its type?  
8. Where it is located?  
9. What are the services supported by it?

And this is how I responded:

Sure, I'll provide you with the prompts and the expected results for all 9 questions. I'll make assumptions for LLM responses to complete the exercise.

1.
Conversation History:
""
Current Question: "Does xyz storage account has tags?"

Prompt:
Does the storage account named 'xyz' have any assigned tags?

2.
Conversation History:
"Question: Does xyz storage account has tags?
Answer: Yes, the 'xyz' storage account has tags."
Current Question: "what is its type?"

Prompt:
What is the type of the 'xyz' storage account that has tags?

3.
Conversation History:
"Question: Does xyz storage account has tags?
Answer: Yes, the 'xyz' storage account has tags.
Question: what is its type?
Answer: The 'xyz' storage account is a 'BlockBlobStorage' type."
Current Question: "Where it is located?"

Prompt:
Where is the 'BlockBlobStorage' type 'xyz' storage account, which has tags, located?

4.
Conversation History:
"Question: Does xyz storage account has tags?
Answer: Yes, the 'xyz' storage account has tags.
Question: what is its type?
Answer: The 'xyz' storage account is a 'BlockBlobStorage' type.
Question: Where it is located?
Answer: The 'xyz' storage account is located in East US."
Current Question: "What are the services supported by it?"

Prompt:
What are the services supported by the 'xyz' storage account, which is a 'BlockBlobStorage' type and located in East US?

5.
Conversation History:
"Question: Does xyz storage account has tags?
Answer: Yes, the 'xyz' storage account has tags.
Question: what is its type?
Answer: The 'xyz' storage account is a 'BlockBlobStorage' type.
Question: Where it is located?
Answer: The 'xyz' storage account is located in East US.
Question: What are the services supported by it?
Answer: The 'xyz' storage account supports Blob, File, Queue, Table, and Data Lake Gen2 services."
Current Question: "Please give me the storage endpoints."

Prompt:
Could you provide the storage endpoints for the 'xyz' storage account which is a 'BlockBlobStorage' type, located in East US, and supports Blob, File, Queue, Table, and Data Lake Gen2 services?

6.
Conversation History:
""
Current Question: "Does abc storage account has tags?"

Prompt:
Does the storage account named 'abc' have any assigned tags?

7.
Conversation History:
"Question: Does abc storage account has tags?
Answer: Yes, the 'abc' storage account has tags."
Current Question: "what is its type?"

Prompt:
What is the type of the 'abc' storage account that has tags?

8.
Conversation History:
"Question: Does abc storage account has tags?
Answer: Yes, the 'abc' storage account has tags.
Question: what is its type?
Answer: The 'abc' storage account is a 'BlockBlobStorage' type."
Current Question: "Where it is located?"

Prompt:
Where is the 'BlockBlobStorage' type 'abc' storage account, which has tags, located?

9.
Conversation History:
"Question: Does abc storage account has tags?
Answer: Yes, the 'abc' storage account has tags.
Question: what is its type?
Answer: The 'abc' storage account is a 'BlockBlobStorage' type.
Question: Where it is located?
Answer: The 'abc' storage account is located in West US."
Current Question: "What are the services supported by it?"

Prompt:
What are the services supported by the 'abc' storage account, which is a 'BlockBlobStorage' type and located in West US?

Not bad, not bad at all!

Summary

That’s it for this post. I hope you have found it useful. Please share your thoughts by providing comments.

Happy Prompting!

The post Writing prompts is hard. Luckily, there’s an easy way out! appeared first on GauravMantri.com.

Using OpenAI Function Calling with Microsoft Semantic Kernel

Gaurav — Sun, 31 Dec 2023 12:22:55 +0000

In this post we are going to see how we can use OpenAI’s Function Calling feature with Microsoft Semantic Kernel.

Context

To explain the concepts in this post, let’s set the context. Let’s say that you are building an AI application that helps users answer basic questions about Maths (e.g. what is 20% of 100). For the sake of argument, let’s assume that our AI model is not capable of answering such questions.

So what we are doing to do is that we are going to write some functions which does mathematical operations like Add, Subtract, Multiply, and Division etc. In Semantic Kernel lingo, we are creating some native functions. In OpenAI lingo, we are creating some tools.

Now the challenge is how do we invoke these tools or functions based on the user prompts (which are in natural language, BTW). This is where OpenAI Function Calling comes in handy and we will see how we can use Semantic Kernel for that purpose.

What is OpenAI Function Calling?

Let’s first briefly talk about OpenAI Function Calling. This is a feature which was recently released by OpenAI.

The way I understand it is that you give a prompt in natural language and a list of functions (tools) to OpenAI. OpenAI then tries to find the best functions (tools) suited to fulfill user’s request and it then returns those function(s) back to the calling program. Not only it returns the function(s), it also tries to extract the arguments from the prompt that are needed to execute that function.

Considering our context, let’s say the user asks “what is 2 + 12” in our AI application. We pass this prompt and the list of functions (Add, Subtract, Multiply, and Division etc.) to OpenAI and it returns Add back to the calling application. If our Add function has 2 arguments say number1 and number2, OpenAI will also return back these arguments with value for number1 argument as 2 and the value for number2 argument as 12. Your application can then execute that function and return the result (14) to the user.

The way it works (I think) is that OpenAI extracts the intent of the prompt and then semantically compares it with the description of the functions.

Here’s the code for our Add function:

    [KernelFunction, Description("Add two numbers")]
    public static double Add(
        [Description("The first number to add")] double number1,
        [Description("The second number to add")] double number2
    )
    {
        return number1 + number2;
    }

Here, the description of the function is very important. You must be concise yet very clear about what a function does for this whole thing to work properly.

Now when OpenAI sees the prompt (what is 2 + 12), it somehow infers that the user wants to add 2 numbers and then compares that with the description of the functions and determines that Add function is the most suited function to answer user’s prompt. It also then maps “2” with “number1” argument and “12” with “number2” argument and returns this information.

You can read more about OpenAI Function Calling here: https://platform.openai.com/docs/guides/function-calling.

What is Semantic Kernel?

Simply put, Semantic Kernel is an open-source SDK that helps you build AI applications. You can use C#, Python or Java to write AI applications using Semantic Kernel.

I wrote a post on overview Microsoft Semantic Kernel a few months back when it was first released that you can read here: https://gauravmantri.com/2023/09/03/microsoft-semantic-kernel-an-overview/.

You can read more about Semantic Kernel here: https://learn.microsoft.com/en-us/semantic-kernel/overview/.

Semantic Kernel makes it super easy to make use of OpenAI Function Calling and that’s what we will see in this post.

Code

Let’s look at the code! I have built a simple console application.

For explanation, I will break the code in small chunks and then we will put the entire code together in the end.

Plugin

Let’s first write our plugin that will contain our functions (tools). Since our AI app deals with Maths, let’s call it MathPlugin. Here’s the code for that which I took from here:

using System.ComponentModel;
using Microsoft.SemanticKernel;

namespace FunctionCallingWithSemanticKernel.Plugins;

public class MathPlugin
{
    [KernelFunction, Description("Take the square root of a number")]
    public static double Sqrt(
        [Description("The number to take a square root of")] double number1
    )
    {
        return Math.Sqrt(number1);
    }

    [KernelFunction, Description("Add two numbers")]
    public static double Add(
        [Description("The first number to add")] double number1,
        [Description("The second number to add")] double number2
    )
    {
        return number1 + number2;
    }

    [KernelFunction, Description("Subtract two numbers")]
    public static double Subtract(
        [Description("The first number to subtract from")] double number1,
        [Description("The second number to subtract away")] double number2
    )
    {
        return number1 - number2;
    }

    [KernelFunction, Description("Multiply two numbers. When increasing by a percentage, don't forget to add 1 to the percentage.")]
    public static double Multiply(
        [Description("The first number to multiply")] double number1,
        [Description("The second number to multiply")] double number2
    )
    {
        return number1 * number2;
    }

    [KernelFunction, Description("Divide two numbers")]
    public static double Divide(
        [Description("The first number to divide from")] double number1,
        [Description("The second number to divide by")] double number2
    )
    {
        return number1 / number2;
    }

    [KernelFunction, Description("Raise a number to a power")]
    public static double Power(
        [Description("The number to raise")] double number1,
        [Description("The power to raise the number to")] double number2
    )
    {
        return Math.Pow(number1, number2);
    }

    [KernelFunction, Description("Take the log of a number")]
    public static double Log(
        [Description("The number to take the log of")] double number1,
        [Description("The base of the log")] double number2
    )
    {
        return Math.Log(number1, number2);
    }

    [KernelFunction, Description("Round a number to the target number of decimal places")]
    public static double Round(
        [Description("The number to round")] double number1,
        [Description("The number of decimal places to round to")] double number2
    )
    {
        return Math.Round(number1, (int)number2);
    }

    [KernelFunction, Description("Take the absolute value of a number")]
    public static double Abs(
        [Description("The number to take the absolute value of")] double number1
    )
    {
        return Math.Abs(number1);
    }

    [KernelFunction, Description("Take the floor of a number")]
    public static double Floor(
        [Description("The number to take the floor of")] double number1
    )
    {
        return Math.Floor(number1);
    }

    [KernelFunction, Description("Take the ceiling of a number")]
    public static double Ceiling(
        [Description("The number to take the ceiling of")] double number1
    )
    {
        return Math.Ceiling(number1);
    }

    [KernelFunction, Description("Take the sine of a number")]
    public static double Sin(
        [Description("The number to take the sine of")] double number1
    )
    {
        return Math.Sin(number1);
    }

    [KernelFunction, Description("Take the cosine of a number")]
    public static double Cos(
        [Description("The number to take the cosine of")] double number1
    )
    {
        return Math.Cos(number1);
    }

    [KernelFunction, Description("Take the tangent of a number")]
    public static double Tan(
        [Description("The number to take the tangent of")] double number1
    )
    {
        return Math.Tan(number1);
    }

    [KernelFunction, Description("Take the arcsine of a number")]
    public static double Asin(
        [Description("The number to take the arcsine of")] double number1
    )
    {
        return Math.Asin(number1);
    }

    [KernelFunction, Description("Take the arccosine of a number")]
    public static double Acos(
        [Description("The number to take the arccosine of")] double number1
    )
    {
        return Math.Acos(number1);
    }

    [KernelFunction, Description("Take the arctangent of a number")]
    public static double Atan(
        [Description("The number to take the arctangent of")] double number1
    )
    {
        return Math.Atan(number1);
    }
}

OpenAI Client

Next, we will create an instance of OpenAI client. You will need the endpoint of your Azure OpenAI service and the key which you can get from Azure Portal (you can also use Azure AD credentials instead of key if you want).

const string AZURE_OPEN_AI_ENDPOINT = "https://xyz.openai.azure.com/";
const string AZURE_OPEN_AI_KEY = "00000000000000000000000";
const string AZURE_OPEN_AI_MODEL_ID = "gpt-4-32k";

// create an instance of OpenAIClient.
var openAIClient = new OpenAIClient(new Uri(AZURE_OPEN_AI_ENDPOINT), new Azure.AzureKeyCredential(AZURE_OPEN_AI_KEY));

Get Kernel

Kernel GetKernel()
{
    var kernelBuilder = Kernel.CreateBuilder()
        .AddAzureOpenAIChatCompletion(AZURE_OPEN_AI_MODEL_ID, openAIClient);
	
    var kernel = kernelBuilder.Build();
    kernel.Plugins.AddFromType();
    return kernel;
}

What we are doing above is creating an instance of Semantic Kernel and then adding our plugin (with all the functions/tools) to that kernel.

OpenAI Prompt Execution Settings

Next, we will set the OpenAI prompt execution settings.

var promptExecutionSettings = new OpenAIPromptExecutionSettings()
{
    ToolCallBehavior = ToolCallBehavior.EnableKernelFunctions,
    Temperature = 0
};

There are two things I want to highlight in the code above:

Temperature: Temperature parameter sets the creativity of the model. Since we want our model to be constrained and give predictive results, we would want this setting to be 0. Try removing this parameter and you will get some really funny results back!
ToolCallBehavior: This parameter controls how tools will be called. By setting the value to ToolCallBehavior.EnableKernelFunctions, we want Semantic Kernel to just pass the registered tools to OpenAI and return the matching tool. If we want Semantic Kernel to automatically invoke the matching tool returned by OpenAI, we would want to change this setting to ToolCallBehavior.AutoInvokeKernelFunctions. There are other settings as well for this parameter that you can read here.

Get Tool

We are going to focus on invoking the function manually, so the code below gets the appropriate tool for the input prompt by using OpenAI Function Calling:

// select the tool best suited to execute our prompt.
async Task SelectTool(string prompt)
{
    try
    {
        var chatCompletionService = new AzureOpenAIChatCompletionService(AZURE_OPEN_AI_MODEL_ID, openAIClient!);
        var result = await chatCompletionService.GetChatMessageContentAsync(new ChatHistory(prompt),
            promptExecutionSettings, kernel);
        var functionCall = ((OpenAIChatMessageContent)result).GetOpenAIFunctionToolCalls().FirstOrDefault();

        return functionCall;
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
        Console.WriteLine(ex.StackTrace);
        return null;
    }
}

Here, Semantic Kernel takes our prompt and the functions loaded in the kernel and send them to Open AI. It then parses the response from Open AI and returns the first tool. Simple, isn’t it?

However, please keep in mind that Open AI may not find a matching tool. This could happen when a user asks an irrelevant question like “What is the capital of France?”.

Execute Function

Last step would be to execute the function and get the result!

var function = await SelectTool(prompt);
if (function != null)
{
    // now we try to get the plugin function and the arguments.
    kernel.Plugins.TryGetFunctionAndArguments(function, out KernelFunction? pluginFunction,
        out KernelArguments? arguments);
    Console.WriteLine($"Plugin function: {pluginFunction!.Name}");
    if (arguments!.Any())
    {
        Console.WriteLine("Function arguments:");
        foreach (var argument in arguments!)
        {
            Console.WriteLine($"Argument name: {argument.Key}; Argument value: {argument.Value}");
        }
    }
    // execute the plugin function.
    var result = await kernel.InvokeAsync(pluginFunction!, arguments);
    Console.WriteLine($"{prompt}: {result.ToString()}");
}
else
{
    Console.WriteLine("I'm sorry but I am not able to answer your question. I can only answer simple mathematical questions.");
}

Here, first we are getting the kernel function to invoke along with the arguments (kernel.Plugins.TryGetFunctionAndArguments(function, out KernelFunction? pluginFunction, out KernelArguments? arguments)) and then executing the function to get the result (var result = await kernel.InvokeAsync(pluginFunction!, arguments)).

That’s it!

Complete Code

Here’s the complete code:

using Azure.AI.OpenAI;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.ChatCompletion;
using FunctionCallingWithSemanticKernel.Plugins;

// Azure OpenAI settings. You can get these settings from portal.
const string AZURE_OPEN_AI_ENDPOINT = "";
const string AZURE_OPEN_AI_KEY = "";
const string AZURE_OPEN_AI_DEPLOYMENT_ID = "";

// create an instance of OpenAIClient.
var openAIClient = new OpenAIClient(new Uri(AZURE_OPEN_AI_ENDPOINT), new Azure.AzureKeyCredential(AZURE_OPEN_AI_KEY));

// get the kernel.
var kernel = GetKernel();

// set OpenAI prompt execution settings.
var promptExecutionSettings = new OpenAIPromptExecutionSettings()
{
    ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions,
    Temperature = 0
};
Console.WriteLine("Hello, I am an AI assistant that can answer simple math questions.");
Console.WriteLine("Please ask me questions like \"What is 2 x 2\" or \"What is sqaure root of 3\" etc.");
Console.WriteLine("To quit, simply type quit.");
Console.WriteLine("");
Console.WriteLine("Now ask me a math question, I am waiting!");
do
{
    var prompt = Console.ReadLine();
    if (!string.IsNullOrWhiteSpace(prompt))
    {
        if (prompt.ToLowerInvariant() == "quit")
        {
            Console.WriteLine("Thank you! See you next time.");
            break;
        }
        else
        {
            // get the tool/function best suited to execute the function.
            var function = await SelectTool(prompt);
            if (function != null)
            {
                // now we try to get the plugin function and the arguments.
                kernel.Plugins.TryGetFunctionAndArguments(function, out KernelFunction? pluginFunction,
                    out KernelArguments? arguments);
                // execute the plugin function.
                var result = await kernel.InvokeAsync(pluginFunction!, arguments);
                Console.WriteLine($"{prompt}: {result.ToString()}");
            }
            else
            {
                Console.WriteLine("I'm sorry but I am not able to answer your question. I can only answer simple math questions.");
            }
        }
    }
} while (true);


// select the tool best suited to execute our prompt.
async Task SelectTool(string prompt)
{
    try
    {
        var chatCompletionService = new AzureOpenAIChatCompletionService(AZURE_OPEN_AI_DEPLOYMENT_ID, openAIClient!);
        var result = await chatCompletionService.GetChatMessageContentAsync(new ChatHistory(prompt),
            new OpenAIPromptExecutionSettings()
            {
                ToolCallBehavior = ToolCallBehavior.EnableKernelFunctions,
                Temperature = 0
            }, kernel);
        var functionCall = ((OpenAIChatMessageContent)result).GetOpenAIFunctionToolCalls().FirstOrDefault();

        return functionCall;
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
        Console.WriteLine(ex.StackTrace);
        return null;
    }
}

// create an instance of Kernel and load all plugins and functions in the Kernel.
Kernel GetKernel()
{
    var kernelBuilder = Kernel.CreateBuilder()
        .AddAzureOpenAIChatCompletion(AZURE_OPEN_AI_DEPLOYMENT_ID, openAIClient);
	
    var kernel = kernelBuilder.Build();
    kernel.Plugins.AddFromType();
    return kernel;
}

You can download the complete solution from GitHub: https://github.com/gmantri/function-calling-with-semantic-kernel.

Conclusion

That’s it for this post. I hope you will find it useful. If you have any feedback or questions, please feel to provide them.

Happy coding and Happy New Year 2024!

The post Using OpenAI Function Calling with Microsoft Semantic Kernel appeared first on GauravMantri.com.

Prompt Patterns Every Generative AI Developer Must Know

Gaurav — Wed, 04 Oct 2023 21:17:43 +0000

In this post we will talk about some of patterns that you should know in order to write effective prompts. This is in continuation with my previous post about Prompts and Prompt Engineering. If you have not done so, I would strongly encourage you to read that post before continuing with this post.

We start this post by describing prompt patterns and then briefly talk about some of the commonly used prompt patterns.

So, let’s start!

What are Prompt Patterns?

If you are coming from software development background, I am pretty sure that you are aware of software design patterns that provide reusable solutions to solve common software problems and help write good and maintainable software.

Prompt patterns are very much like software design patterns. They focus on controlling the output of the text generated by Large Language Models (LLMs). Part of prompt engineering, they provide reusable solutions to write effective prompts.

If you are building Generative AI apps, chances are that you are using some of the patterns described in this post (without realizing that you are using prompt patterns ).

Also, as you start building Generative AI apps, you will realize that when writing prompts, you are actually using more than one pattern in a single prompt (and that’s completely ok!).

Prompt Patterns

Here are some of the commonly used prompt patterns.

One-Shot/Few-Shot Learning (or Prompting) Pattern

In this pattern, the LLM is trained (kind of) by including examples of input the user would give and the output they expect. When exactly one example is given, it is called “One-Shot Prompting” and when more than one examples are given, it is called “Few-Shot Prompting”. For example:

Input: The movie was good but it was a bit long.
Sentiment: Neutral
Input: I did not like this book.
Sentiment: Negative
Input: The food at the restaurant was yummy.
Sentiment:

Chain Of Thought Reasoning Pattern

In this pattern, the LLM is asked to to proceed step-by-step and present all the steps involved. This is especially useful in reasoning types of questions. Using this pattern reduces the possibility of inaccuracy of outcomes and makes assessing the LLM’s response easier. For example,

When I was 10 years old, my brother was half my age. I am now 50 years old. How old is my brother? Take a step-by-step approach in your response, cite sources and give reasoning before sharing final answer.

Meta Language Creation Pattern

In this pattern, the LLM is explained about one or more special symbols, words or sentences so that the LLM understands their meaning in the conversation that follows. For example:

Keep the following in mind when answering the subsequent questions: When I say twin city or twin cities, I mean Dallas/Fort Worth and not Minneapolis/St. Paul.

Output Automater Pattern

In this pattern, the LLM is asked to generate script of some sort instead of instructions text that the user can then execute instead of following the instructions. For example:

Write a python script that would identify the open TCP ports on my Windows server and close those ports.

Flipped Interaction Pattern

In normal course of interaction we ask questions to an LLM and the LLM provides an answer to those questions. In this pattern, the LLM is encouraged to ask questions to achieve an objective. Idea behind using this pattern is that you do not know what questions to ask an LLM and would want to rely on LLM’s vast knowledge to help you. For example:

Ask me questions, one question at a time, so that I can plan a short vacation in Rome. I will be leaving from Washington DC. When I am done answering questions, create an itinerary for me based on the answers I provided.

Persona Pattern

In this pattern, the LLM is instructed to act like a certain kind of person (assume the persona in other words) and answer the question as that kind of person. For example:

Act like an economist to explain the importance of Large Language Models in the field of economics.

Audience Persona Pattern

This is opposite of Persona pattern. In this pattern, the LLM is instructed to provide an answer that is understandable by a certain kind of person (the audience). Essentially LLM sets the tone and content of the answer in such a way that is understandable by the persona set in the prompt. For example:

Explain large language model to a nine year old.

Question Refinement Pattern

In this pattern, the LLM is encouraged to provide a better version of the question asked (and ask the user if they want to use the newer version). This is especially useful when the user asking the question is not an expert in the field of question being asked and would like to rely on the knowledge an LLM has in that field. For example:

Whenever I ask a question about a software artifact’s security, suggest a better version of the question to use that incorporates information specific to security risks in the language or framework that I am using instead and ask me if I would like to use your question instead.

Alternative Approaches Pattern

In this pattern, the LLM is instructed to provide alternative answers to a given question. This is quite useful when the user asking the question is interested in exploring multiple solutions to their problem and then picking the best one. For example:

I need to travel from Baltimore, MD to Niagara Falls, NY by road. Suggest me a few alternate ways I can take. Include the pros and cons of each approach.

Cognitive Verifier Pattern

In this pattern, the LLM is forced to always subdivide a question into additional questions that can be used to provide a better answer to the original question. This is especially useful when the question being asked is very high level or the user does not have much knowledge about the question. For example,

When I ask you a question, generate three additional questions that would help you give a more accurate answer. When I have answered the three questions, combine the answers to produce the final answers to my original question.

Fact Check List Pattern

In this pattern, the LLM is instructed to produce a list of facts on which basis the LLM has provided an answer. The user can then verify the facts to validate the accuracy and truthfulness of the answer. This is a useful pattern to use when the user is not an expert in the domain of question being asked and can used these facts to verify the answer. For example,

when you generate an answer, create a set of facts that the answer depends on that should be fact-checked and list this set of facts at the end of your output. Only include facts related to authentication and authorization.

Template Pattern

In this pattern, the LLM is forced to produce an output in a specific format. The user specifies the template for output format and the placeholders and ask the LLM to produce an output in the format specified in the template by filling the placeholder content. For example,

Please create an item list for me to make a dining table and 4 chairs from scratch. I am going to provide a template for your output . are my placeholders for content. Try to fit the output into one or more of the placeholders that I list. Please preserve the formatting and overall template that I provide.
This is the template:
Aisle : , ()

Infinite Generation Pattern

In this pattern, the LLM is instructed to generate a series of outputs without having the user to re-enter the generation prompt for each kind of output. For example,

I have the following table called Users in my SQL database to store user information:
Id nvarchar(32) Primary Key
Name nvarchar(100)
Email nvarchar(100)
CreatedDate Date
ModifiedDate Date
Write SQL statements for creating, updating, reading and deleting user records. Use the placeholders instead of actual values when creating SQL statements.

Visualization Generator Pattern

In this pattern, the LLM is instructed to generate an output in a format that can be fed to a visualization tool that accepts text as an input (e.g. Dall-E). This pattern overcomes the limitation of LLMs of not being able to create images by generating textual inputs in correct format to plug into another tool that generates the correct diagram. For example,

Here’s a simple user flow for my web application:
User is on login page. When the user logs in, application checks if the credentials are correct, then the user is taken to the dashboard. If the user credentials are incorrect, then the user is redirected back to the login page.
Create a flowchart for this user flow in Graphviz Dot format.

Game Play Pattern

In this pattern, the LLM is asked to create a game around a certain topic. This is quite useful when a user wants the LLM to generate scenarios or questions revolving around a specific topic and require them to apply problem solving or other skills to accomplish a task related to the scenario. For example,

We are going to play a trivia game checking my knowledge about Tom Cruise movies. You will ask me questions about Tom Cruise’s movies. For each question, you will give me 4 options and ask me to choose one option. If I pick correct option, you give me 10 points. If I pick incorrect option, you deduct 5 points. At the end of the game, you will tell me my score. The game will have just 3 questions.

Reflection Pattern

In this pattern, the LLM is asked to provide the rationale behind the output (along with the output). This is quite useful when a user wants to assess the LLM’s output validity as well as getting to know how the LLM came up with a particular answer. Furthermore, this pattern can also be used by users to fine tune their prompts because now they have a better understanding of how the LLM is providing the output. For example,

When you provide an answer, please explain the reasoning and assumptions behind your selection of software frameworks. If possible, use specific examples or evidence with associated code samples to support your answer of why the framework is the best selection for the task.

Refusal Breaker Pattern

In this pattern, when the LLM is not able (or refuses) to answer a question for any reason, it is encouraged to help the user by providing alternate or rephrased questions that a user can ask and also provide reasons behind not answering the question. For example,

Whenever you can’t answer a question, explain why and provide one or more alternate wordings of the question that you can’t answer so that I can improve my questions.

Context Manager Pattern

In this pattern, the LLM is encouraged to keep the conversation very focused by asking it to concentrate only on certain topics or removing certain topics on the consideration. This pattern empowers the users to have a greater control over what things LLM should consider or ignore when generating output. For example,

When analyzing the following pieces of code, only consider code complexity aspects. Do not consider formatting or naming conventions.

Recipe Pattern

In this pattern, the LLM is instructed to provide a sequence of steps based on some provided input data to achieve a stated goal. This pattern combines the Template, Alternative Approaches, and Reflection patterns. For example,

I am trying to deploy an application to the cloud. I know that I need to install the necessary dependen- cies on a virtual machine for my application. I know that I need to sign up for an account in Azure. Please provide a complete sequence of steps. Please fill in any missing steps. Please identify any unnecessary steps.

Ask for Input Pattern

In this pattern, the LLM is instructed to wait for the user’s input and generate an output based on the user’s input and other thing specified in the prompt. This is quite useful in scenarios where we do not want the LLM to automatically start generating output and rather wait for user to provide input and generate output for that input. For example,

From now on, I am going to provide you some paragraphs of texts. You will summarize each paragraph. Now ask me for the first paragraph of text.

Outline Expansion Pattern

In this pattern, the LLM is instructed to generate an outline (a list of items) and then expand on items in that list. This is specially useful when a user wants to generate really long text (a book or an article) that would exceed the output limits of the LLM. For example,

I want to write a book on Generative AI. Generate a bullet point outline for that and then ask me for the bullet point you should expand on.

Menu Actions Pattern

In this pattern, the LLM is instructed to take an action on a certain thing. Generally this pattern is used in conjunction with Outline Expansion pattern where the LLM is instructed to perform some action on the list of items generated through the Outline Expansion pattern. For example,

Whenever I write “text ”, you will write some text for the bullet point specified in placeholder. is the placeholder is for the number of paragraphs you will write. If is missing in the input, you will generate exactly one paragraph of text.

Tail Generation Pattern

In this pattern, the LLM is forced to generate a tail at the end of the output to remind the LLM of the task at hand. This is especially useful in scenarios where a user is having a long conversation with the LLM. For example,

I want to write a book on Generative AI. Generate a bullet point outline for that. At the end, ask me what bullet point to expand on.

Semantic Filter Pattern

In this pattern, the LLM is asked to remove or keep certain information in text based on specified semantic rules. This is quite useful when the user wants to remove certain sensitive information (like medical history, personally identifying information etc.) from input text. For example,

Filter this information to remove any personally identifying information or information that could potentially be used to re-identify the person.

Resources

For writing this blog post, I referenced the following resources:

Prompt Engineering for ChatGPT: https://www.coursera.org/learn/prompt-engineering.
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT: https://arxiv.org/abs/2302.11382.

If you are interested in learning more about prompt patters, I would highly recommend that you check out these resources.

Summary

This turned out to be a rather long post but I hope you have found it useful and worthy of your time. Please share your thoughts by providing comments.

The post Prompt Patterns Every Generative AI Developer Must Know appeared first on GauravMantri.com.

Generative AI – All About Prompts

Gaurav — Tue, 26 Sep 2023 20:08:13 +0000

In this short post, we will talk about prompts. We will talk about what prompts are, prompt engineering and why it is such a big deal. We will also briefly talk about prompt patterns.

So let’s begin!

Prompts

In very simple terms, prompts are inputs to a large language model (LLM). Based on the prompt, an LLM provides desired output.

If you are coming from database world, you write queries to extract information from a database. If you were to imagine an LLM as a database (that contains massive information), the prompts are the queries to the LLM based on which the LLM generates an output.

However, unlike database queries which follows a certain syntax (T-SQL, for example) and are bound by rules, prompts do not have rigid syntax or rules. You write prompts in natural language (which an LLM understands very well) and structure the text in your prompt in such a way so that you get the output you desire.

Prompt Engineering

To me, prompt engineering is the art and science of writing effective prompts.

When I first started working with Generative AI, I thought why folks are making big deal about prompt engineering. After all, you want some information from an LLM and to get that information, you write some text (prompt) and should get the output you want. I was proven wrong very shortly .

Think of an LLM as a very smart human assistant you have got who is in possession of a great deal of knowledge (and is eager to show it off). Unless you tell this human assistant what information you want and how you want it, you are going to get very wild results. Effectively constraining an LLM to produce the output you want is essentially prompt engineering.

A few things that you can keep in mind when creating prompts:

Token size & costs

LLMs are often constrained by the size of the input you can provide (also known as token size). Furthermore, you are charged by the token usage. So when creating the prompts, you will need to be creative and surgical in your approach to ensure that you do not exceed those limits and also keep the costs under control.

Output size, tone and format

LLMs are extremely creative and are capable of producing huge amounts of text data.

With proper prompting you can control how much data you would want LLM to output. For example, including “Provide an answer in no more than 500 words” in the prompt will force an LLM to constrain the output in approximately those many words.

With proper prompting, you can also control the tone of the output. For example, having a prompt like “Explain Newton’s 3rd law of motion in a poem” will produce an entirely different output than having a prompt like “Explain Newton’s 3rd law of motion”.

With proper prompting, you can also control the format in which you want to see the data (default being text). For example, including “Provide an answer in JSON format with the following structure (your JSON structure)” in the prompt will force an LLM to provide the output in JSON format.

One-shot and few-shots prompting

Some times you would want to teach an LLM by including some examples specific to your use case in the prompt. This is known as one-shot prompting (when only one example is included) or few-shots prompting (when more than one examples are included). Both of these techniques have proven to be very effective in getting desired outputs from an LLM.

Prompt Injection

Prompt injection is a way for a user of your Generative AI app to get unintended information from an LLM. For example, let’s say your Generative AI app is supposed to provide information about only specific topics however through prompt injection, a user may be able to get information about other topics. Through proper prompt engineering, one can prevent prompt injection.

Prompt Patterns

Much like software design patterns, there are well established patterns that will help you write effective prompts. One-shot and few-shot prompting is an example of prompt patterns. We will talk about these prompt patterns in subsequent posts, however you can go to https://arxiv.org/abs/2302.11382 to learn more about these patterns.

Resources

Following are some resources I used to learn about prompts and prompt engineering:

ChatGPT Prompt Engineering for Developers: https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/.
Building Systems with the ChatGPT API: https://www.deeplearning.ai/short-courses/building-systems-with-chatgpt/.
Prompt Engineering for ChatGPT: https://www.coursera.org/learn/prompt-engineering.

Summary

That’s it for this post. I hope you have found the information useful. Please share your thoughts by providing comments.

The post Generative AI – All About Prompts appeared first on GauravMantri.com.