RAG with Azure & Python: Step-by-Step Guide

Muhammad ZubairAugust 15, 2025

0 3,273 9 minutes read

In today’s AI era, especially when building applications that utilize Retrieval Augmented Generation, developers face several different issues. It includes integrating AI search into existing workflows, dealing with large or unstructured datasets, ensuring answers are both accurate and relevant to the context, and keeping performance fast and reliable at scale. Without the right setup, the system may produce incomplete, misleading, or slow responses, resulting in a poor user experience.

The solution to these challenges is to combine Azure AI Search with Azure OpenAI. This approach uses vector embeddings, semantic search, and GPT-powered summarization to pull the most relevant information from your data and turn it into clear, precise answers. With Azure’s capabilities such as Cognitive Search, parallel indexing, and strong security features, you can create AI solutions that are accurate, scalable, and integrate smoothly into your existing applications.

Azure Vector Search

When building Retrieval Augmented Generation (RAG) systems, one of the most important steps is storing and searching your data efficiently. In Azure, Azure AI Search can act both as a vector storage option and as a powerful information retrieval tool. This makes it a natural choice for implementing RAG workflows.

What is Azure AI Search?

Azure AI Search lets you store and search your own data. This data could be:

Text content
Images with embedded text
Extracted key phrases or named entities (people, organizations, locations)
Structured and unstructured documents

You can upload many types of files, including PDF documents, Word files, text files, PowerPoint presentations, and Excel spreadsheets. Even if your documents contain both text and images (multi‑modal data), Azure AI Search can process them. It will extract the text, generate captions for images, detect languages, and pull out important keywords or named entities. Azure can bring data from various Azure resources, such as JSON files, Blob Storage containers, but not from the internet.

Search Capabilities

There are 3 types of Azure AI searches:

Vector Search: Finds documents based on semantic meaning.
Full‑Text Search: Matches exact keywords.
Hybrid Search: Combines keyword matching with vector similarity for the best results.

Hybrid search is especially powerful because it balances precise keyword matches with context‑aware semantic matches.

AI Enrichment

Azure AI Search can connect to an Azure AI resource to enrich your data with features like image analysis and captioning, text analytics, keyword extraction, language detection for 56 languages, phonetic matching, and entity recognition. This allows it to add captions to images and identify locations, people, or organizations within your documents.

How Azure AI Search Works

Azure AI Search uses three main components:

Indexer: Breaks (chunks) your documents into smaller pieces for easier processing.
Skillset: Connects to Azure AI services to perform enrichment like keyword extraction or image captioning.
Index: Stores the processed data so it can be searched quickly.

Workflow Example:

Choose your data source, such as Blob Storage or Cosmos DB.
The Indexer chunks your documents and sends them to the Skillset.
The Skillset, using Azure AI, performs text and image analysis.
The results go back to the Indexer, which maps fields (e.g., key phrases, locations) to the Index.
The Index makes the enriched data searchable using Lucene syntax (similar to SQL queries).

Lucene syntax can also support autocomplete and geo‑search filtering for location‑based searches.

Azure AI Search in a RAG Workflow

In a RAG setup:

Data is ingested into Azure AI Search from your chosen source.
The Skillset calls a vector embedding model in Azure OpenAI to create embeddings for the data.
These embeddings are stored in the Index alongside text fields.
When a user submits a query, the query is also converted into a vector using the embedding model.
Azure AI Search performs a vector similarity search or hybrid search to find the most relevant documents.
The retrieved documents are sent to a GPT model with the user’s question as context.
The GPT model generates a final, context‑aware answer for the user.

Hybrid Search with Azure AI Search

Hybrid search combines the power of vector search and keyword search to give more accurate results. In this lab, Azure AI Search acts as both an intelligent AI toolset and a vector database, while the embedding engine and GPT engine are provided by Azure OpenAI.

To understand this search, we will work with hotel reviews stored in PDF format. There are some documents containing customer reviews of hotels around the world. These are unstructured documents, meaning they are not in a table or database, but in plain text inside PDFs. Some reviews are short, some are detailed, and they may contain location names, reviewer names, and other useful details. Here is the PDF folder where you can fetch PDFs, and the PDFs that have been used in this documentation as well.

Step 1: Setting up the Storage Account

We first need a place to store our PDF files.

Go to the Azure Portal and search for Storage Accounts in the search bar.
Create a new storage account:
- Choose an existing resource group or create a new one.
- Give the account a unique name.
- Select a region.
- Set redundancy to Geo-redundant storage.

Click on create. Once created, enable anonymous blob access in the configuration settings. This allows us to access the files directly without generating a Shared Access Signature (SAS) key each time. Make sure to save it.

Create a container inside the storage account (for example, call it dummy-pdfs) and set its access level to container.

Upload all PDF files into this container. When you click a file, it will display an interface where each file has a public URL that you can open directly in your browser.

Step 2: Creating the Azure AI Search Resource

Next, we create a search service that will allow us to query and analyze the documents.

In the Azure Portal, search for Azure OpenAI.
Create a new search service:
- Place it in the same region as your Azure OpenAI resource to avoid compatibility issues.
- Choose the Basic pricing tier for better performance.

Once deployed, this service will have three main components:

Index: the searchable database structure.
Indexer: moves data from the source into the index.
Skill Set: adds AI enrichment, like extracting key phrases or generating captions.

Step 3: Preparing the Azure OpenAI Resource

Hybrid search requires a GPT model and an embedding model. Deploy an Azure OpenAI resource in AI Foundry (S0 pricing tier recommended). After deploying, click on Explore Azure AI Foundry Portal. Click on that, then you will open this interface.

Deploy a GPT model (e.g., GPT‑4 or GPT‑3.5 Turbo if GPT‑4 is unavailable).
Deploy the text-embedding-ada‑002 model for vector embeddings. This model generates 1,536-dimensional embeddings for each chunk of text.

Step 4: Connecting Data and Creating the Index

Now we connect the storage container to Azure AI Search and create the index. In Azure AI, click Import and vectorize data.

Select Azure Blob Storage as the data source and then click on RAG. Choose the storage account and reviews container.

Connect to your Azure OpenAI embedding model (text-embedding-ada‑002).

Skip image vectorization (our PDFs do not contain images), but note that Azure AI Search can vectorize images if needed.

Choose an indexing schedule (manual or automatic every few minutes). Provide a name prefix for the index, indexer, and skill set.

Create the index, as this will process the documents, break them into chunks, generate embeddings, and store them in the index.

Step 5: Understanding How Hybrid Search Works

Keyword Search matches exact words and phrases from the user’s query to the text in documents.
Vector Search compares the semantic meaning of the query with document embeddings to find conceptually similar results.
Hybrid Search combines both, increasing the accuracy of retrieved documents.

When you run a query like The Al Fahidi Inn, the system:

Converts the query into an embedding vector.
Finds documents with high cosine similarity to this vector.
Also checks for keyword matches.
Returns results ranked by a search score (0 to 1, where closer to 1 is a better match).

Step 6: Integrating with Azure OpenAI Studio

To make the system interactive:

Go to Azure AI Foundry | Azure OpenAI Chat playground. Click on chat.
Select your GPT model.
Add a data source → choose Azure AI Search and connect to your index.
Choose the text-embedding-ada‑002 model for embeddings.
Set the search type to Hybrid.
Save and start chatting with your data.

Now, when you ask a question like “What is the review The Bloomsbury Retreat,?”, the GPT model retrieves the relevant documents from Azure AI Search and uses them to answer, including citations linking directly to the original PDFs.

Step 7: Deployment Options

You can deploy this hybrid search solution as:

A web app: includes an optional chat history stored in Azure Cosmos DB.
A Microsoft Teams app: package the solution into a zip file and upload it to Teams.
Connecting Azure with Python

Now that our hybrid search architecture is ready in Azure AI Search with vector embeddings from Azure OpenAI, the next step is to integrate it into Python so we can use it in any custom application instead of relying only on the Azure AI Studio interface.

We’ll do this in VS Code using a Jupyter Notebook.

1. Cloning the Project Repository

The example code is available on GitHub in the repository rag-with-azure-openai.
To get started:

After cloning or downloading the repository. The folder we’ll work with contains:

.env: environment variables like keys, endpoints, and model names.
run.ipynb: the main Jupyter Notebook with the integration code.

2. Setting Environment Variables

In the .env file, fill in values from your Azure resources:

Azure AI Search

API Key → From Keys under your search service.
Endpoint → From Overview of the search service.
Index Name → From Indexes section (e.g., demo-index).

Azure OpenAI

API Key → From Keys and Endpoint tab of your Azure OpenAI resource.
Endpoint → From the same section.
Embedding Engine Name → e.g., text-embedding-ada-002.
Chat Engine Name → e.g., gpt-4 or gpt-35-turbo.

Save the file when done.

3. Installing Dependencies

Open the notebook run.ipynb and run the first cell to install the OpenAI SDK:

!pip install openai
!pip install dotenv
!pip install requests

4. Creating the Azure OpenAI Client

The first step in the code is creating an Azure OpenAI client using our .env variables.
This client lets us:

Generate vector embeddings for user queries.
Send prompts to the GPT engine for answers.

5. Generating Embeddings for User Queries

Whenever a user asks something (e.g., “What is the review of The Al Fahidi Inn in Dubai?”), we:

Generate vector embeddings using the embedding engine.
Compare them with the stored document embeddings in Azure AI Search.

Embedding arrays have 1,536 dimensions for text-embedding-ada-002.

6. Searching the Azure AI Search Index

We send an API request to Azure AI Search that:

Targets the chunk field (contains document text).
Uses cosine similarity between:
- The query’s vector embeddings.
- The stored document embeddings in the text_vector field.
Returns the top 3 most relevant documents (k=3).
Includes a search score (0–1, where closer to 1 means higher relevance).

Example:

{
  "chunk": "Hotel review text…",
  "score": 0.93
}

7. Passing Results to GPT for Summarization

The top documents are passed to the GPT engine with a system prompt that:

Tells the model to only answer from the provided reviews.
Politely refuse if no relevant answer exists.
Avoids adding outside information or links.

Common Issues in RAG Systems and Their Solutions:

1. Model Gives Wrong Answers

Problem:
Model replies even when the answer isn’t in the documents.

Solution:

Use a clear system prompt like: “Only answer using provided documents; say ‘I don’t know’ otherwise.”
Clean your documents to remove junk or outdated info.

2. Bad Chunking or Retrieval

Problem:
Chunks are too big/small, or irrelevant documents are retrieved.

Solution:

Use smaller chunk sizes (e.g., 200–300 tokens).
Tune top-k (e.g., top 3 or 5) to fetch better matching chunks.

3. Conflicting or Noisy Data

Problem:
Inconsistent documents confuse the model.

Solution:

Review and clean your data.
Remove or rewrite contradicting statements.

4. Wrong Output Format

Problem:
Model responds in the wrong format (e.g., paragraph instead of JSON).

Solution:

Prompt clearly: “Respond in bullet points/JSON format.”
Use a parser in your code to extract what you need.

5. Slow Indexing with Large Files

Problem:
Indexing too many files one by one is slow.

Solution:

Use parallel indexing: split into containers, run multiple indexers, then combine results.

6. Trouble Extracting from PDFs or Tables

Problem:
Important data is inside PDFs, tables, or scanned docs.

Solution:

Use Azure AI Search for PDFs and basic unstructured docs.
Use Document Intelligence or Form Recognizer for complex layouts.

7. Security Risks

Problem:
API keys, data, and access controls may be exposed.

Solution:

Store secrets in Azure Key Vault.
Use RBAC, MFA, and Private Endpoints to secure access.

Conclusion

Building a RAG (Retrieval Augmented Generation) system might sound technical, but with Azure’s tools, it becomes much more approachable even for beginners. We’ve walked through how vector embeddings work, how Azure AI Search finds relevant content, how to solve common problems, and how to tie it all together using GPT.

Once you understand the flow, everything starts making sense and you realize you don’t need to be a data scientist to make it work.

The Key Takeaway

First, make sure your content is well-prepared, clean documents, meaningful text, and useful information.
Generate vector embeddings so the AI can understand the deeper meaning behind words and phrases.
Set up Azure AI Search to store your documents and their embeddings, allowing quick and smart retrieval.
Fix problems as they come up like inaccurate answers or indexing issues by refining your documents and configuration.
Finally, connect everything with the GPT model using the Chat Completions API so your AI can respond with context-aware answers.