Project with Azure AI: Automatically Turning Text into Speech

Muhammad ZubairJuly 15, 2025

0 858 4 minutes read

Microsoft Azure is a comprehensive cloud computing platform created by Microsoft, offering a vast array of services including computing, analytics, storage, and networking. Among its most powerful offerings are its Azure AI Services, which provide pre-built artificial intelligence capabilities that developers can easily integrate into their applications without needing a background in data science.

In this project, we leveraged Azure’s power to develop a simple yet powerful automated system that converts text files into high-quality, human-like speech. The workflow is elegant: we store plain text files in Azure Blob Storage, use a Python script to fetch and process them, and then leverage the Azure Speech Service to perform the text-to-speech conversion. The resulting audio files are generated automatically, eliminating the need for manual recording.

This setup is incredibly useful for creating audiobooks, building accessibility tools for the visually impaired, generating voiceovers for videos, or providing audio updates in applications and IoT devices.

Overview of the Azure Resources

To build this system, we provisioned and used three key services within the Microsoft Azure ecosystem:

Azure Speech Service: The core AI service that converts our text into lifelike speech using neural voices.
Azure Blob Storage: A scalable cloud storage solution where we upload and manage our source text files.
Azure OpenAI Service (Future-Proofing): While not strictly necessary for basic text-to-speech, we deployed a model to potentially generate dynamic text content for conversion in the future, making the pipeline fully end-to-end from text generation to speech synthesis.

Step-by-Step: Creating the Azure Resources

1. Azure Speech Service

Log into the Azure Portal.
Search for “Speech” in the top search bar.
Click “Create” and fill in the details:
- Subscription: Your Azure subscription.
- Resource Group: Select an existing group or create a new one (e.g., Text-To-Speech-Resources).
- Region: Choose a region close to your users (e.g., East US).
- Name: Provide a unique name (e.g., MySpeechService).
- Pricing tier: The Free F0 tier is perfect for testing and low-volume use.
Click “Review + create” and then “Create”.

Information to Retrieve:
- Navigate to your new Speech resource.
- Go to Keys and Endpoint under Resource Management.
- Copy Key 1 and your Location/Region (e.g., eastus). You will need these for the Python code.

2. Azure OpenAI Service

This is where we will store the text files we want to convert into speech.

Steps to Create:
1. In the Azure Portal, search for “Storage Accounts”.
2. Click “Create” and fill in the details:
  - Subscription & Resource Group: Use the same as above.
  - Storage account name: Must be globally unique (e.g., mytexttospeechstorage).
  - Region: Same as your Speech service (East US).
  - Performance: Standard.
  - Redundancy: Locally-redundant storage (LRS) is sufficient for this demo.
3. Click “Review + create” and then “Create”.
4. Once deployed, go to the resource and create a container:
  - Under “Data storage,” click “Containers”.
  - Click “+ Container”, give it a name (e.g., text-files), and choose a public or private access level.
Information to Retrieve:
- Go to your Storage Account.
- Navigate to Access Keys under Security + networking.
- Copy the Connection String. This securely connects your Python code to the storage.

3. Azure Blob Storage

Purpose: Store dummy text files for speech processing.
Steps Taken:
- Created a Storage Account.
- Created a Blob Container (public or private).
- And also enable storage from the configuration under settings.
- Uploaded 7 dummy .txt files containing sample sentences.
- Noted down:
  - Container Name
  - Storage Connection String

Dummy Data Preparation

Before writing our code, we need sample text to convert.
On your local machine, create 7 simple text files (dummy_text_1.txt, dummy_text_2.txt, etc.).
Add different sample sentences to each. For example:
- dummy_text_1.txt: “Hello, and welcome to this demonstration of Azure Text to Speech. This audio was generated entirely by artificial intelligence.”
- dummy_text_2.txt: “The weather today is sunny with a high of seventy-five degrees. A perfect day for a walk in the park.”
In the Azure Portal, navigate to your Blob Storage container (text-files) and click Upload to add all your text files.

Implementation Steps

1. Fetch Text from Azure Blob Storage

Used Azure Storage Blob SDK for Python.
Retrieved file content from container using:
- Connection String
- Container Name
- Blob File Name
Install dependencies

!pip install azure-storage-blob

from azure.storage.blob import BlobServiceClient

# Azure Storage Account connection string (Portal se lein)
connection_string = ""

# Blob container name jahan aapne dummy files upload ki hain
container_name = "files"

# Jo file read karni hai uska blob name
blob_name = "dummy_text_1.txt"

# Blob service client create karain
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

# Container client lein
container_client = blob_service_client.get_container_client(container_name)

# Blob client lein
blob_client = container_client.get_blob_client(blob_name)

# Blob se text content read karain
blob_data = blob_client.download_blob().readall()
text_content = blob_data.decode("utf-8")

print("Text content from blob:")
print(text_content)

2 Convert Text to Speech

Used Azure Speech SDK for Python.
Provided:
- speech_key
- service_region
Saved generated audio as .wav file.
Install dependencies

!pip install azure-cognitiveservices-speech

import azure.cognitiveservices.speech as speechsdk

# Azure Speech key and region
speech_key = "#speech key from acess keys under security+networking"
service_region = "eastus"

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_config = speechsdk.audio.AudioOutputConfig(filename="outputaudio.wav")
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

# Text fetched from Azure Blob Storage (from Step 1)
text = text_content  # From the previous step

# Convert text to speech
result = speech_synthesizer.speak_text_async(text).get()

if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech generated successfully!")
else:
    print("Speech generation failed.")

This is the output created from the above code.

Conclusion

This project showed how we can easily turn written text into spoken words with the help of Microsoft Azure. We stored our text files in Azure Blob Storage, then used Python to read them and send the content to Azure Speech Service. The service returned clear and natural-sounding audio files without the need for any manual recording. This simple setup can be very useful in real life like creating audiobooks, making apps more accessible for people who cannot read, or sending voice updates automatically. It saves time, reduces effort, and makes it possible to bring written information to life in just a few seconds.