Write code with Code with natural speech natural speech
The open-source voice assistant for developers.
With Serenade, you can write code using natural speech. Serenade's speech-to-code engine is designed for developers from the ground up and fully open-source.
Take a break from typing
Give your hands a break without missing a beat. Whether you have an injury or you're looking to prevent one, Serenade can help you be just as productive without typing at all.
Secure, fast speech-to-code
Serenade can run in the cloud, to minimize impact on your system's resources, or completely locally, so all of your voice commands and source code stay on-device. It's up to you, and everything is open-source.
Add voice to any application
Serenade integrates with your existing tools—from writing code with VS Code to messaging with Slack—so you don't have to learn an entirely new workflow. And, Serenade provides you with the right speech engine to match what you're editing, whether that's code or prose.
Code more flexibly
Don't get stuck at your keyboard all day. Break up your workflow by using natural voice commands without worrying about syntax, formatting, and symbols.
Customize your workflow
Create powerful custom voice commands and plugins using Serenade's open protocol, and add them to your workflow. Or, try customizations shared by the Serenade community.
Start coding with voice today
Ready to supercharge your workflow with voice? Download Serenade for free and start using speech alongside typing, or leave your keyboard behind.
- About AssemblyAI
The top free Speech-to-Text APIs, AI Models, and Open Source Engines
This post compares the best free Speech-to-Text APIs and AI models on the market today, including APIs that have a free tier. We’ll also look at several free open-source Speech-to-Text engines and explore why you might choose an API vs. an open-source library, or vice versa.
Choosing the best Speech-to-Text API , AI model, or open-source engine to build with can be challenging. You need to compare accuracy, model design, features, support options, documentation, security, and more.
This post examines the best free Speech-to-Text APIs and AI models on the market today, including ones that have a free tier, to help you make an informed decision. We’ll also look at several free open-source Speech-to-Text engines and explore why you might choose an API or AI model vs. an open-source library, or vice versa.
Looking for a powerful speech-to-text API or AI model?
Learn why AssemblyAI is the leading Speech AI partner.
Free Speech-to-Text APIs and AI Models
APIs and AI models are more accurate, easier to integrate, and come with more out-of-the-box features than open-source options. However, large-scale use of APIs and AI models can come with a higher cost than open-source options.
If you’re looking to use an API or AI model for a small project or a trial run, many of today’s Speech-to-Text APIs and AI models have a free tier. This means that the API or model is free for anyone to use up to a certain volume per day, per month, or per year.
Let’s compare three of the most popular Speech-to-Text APIs and AI models with a free tier: AssemblyAI, Google, and AWS Transcribe.
AssemblyAI offers speech AI models via an API that product teams and developers can use to build powerful AI solutions based on voice data for their users.
AssemblyAI offers cutting-edge AI models such as Speaker Diarization , Topic Detection, Entity Detection , Automated Punctuation and Casing , Content Moderation , Sentiment Analysis , Text Summarization , and more. These AI models help users get more out of voice data, with continuous improvements being made to accuracy .
The company offers a $50 credit to get users started with speech-to-text.
AssemblyAI also offers Speech Understanding models, including Audio Intelligence models and LeMUR. LeMUR enables users to leverage Large Language Models (LLMs) to pull valuable information from their voice data—including answering questions, generating summaries and action items, and more.
Its high accuracy and diverse collection of AI models built by AI experts make AssemblyAI a sound option for developers looking for a free Speech-to-Text API. The API also supports virtually every audio and video file format out-of-the-box for easier transcription.
AssemblyAI offers two options for Speech-to-Text: "Best" and "Nano. " Best is the default model, which gives users access to the company's most accurate and advanced Speech-to-Text offering to help users capture the nuances of voice data. The company's Nano tier offers high-quality Speech-to-Text at an accessible price point for users that require cost efficiency.
AssemblyAI has expanded the languages it supports to include 17 different languages for its Best offering and 102 languages for its Nano offering, with additional languages released monthly. See the full list here .
AssemblyAI’s easy-to-use models also allow for quick set-up and transcription in any programming language. You can copy/paste code examples in your preferred language directly from the AssemblyAI Docs or use the AssemblyAI Python SDK or another one of its ready-to-use integrations .
- Free to test in the AI playground , plus $50 credits with an API sign-up
- Speech-to-Text Best – $0.37 per hour
- Speech-to-Text Nano – $0.12 per hour
- Streaming Speech-to-Text – $0.47 per hour
- Speech Understanding – varies
- Volume pricing is also available
See the full pricing list here .
- High accuracy
- Breadth of AI models available, built by AI experts
- Continuous model iteration and improvement
- Developer-friendly documentation and SDKs
- Pay as you go and custom plans
- White glove support
- Strict security and privacy practices
- Models are not open-source
Google Speech-to-Text is a well-known speech transcription API. Google gives users 60 minutes of free transcription, with $300 in free credits for Google Cloud hosting.
Google only supports transcribing files already in a Google Cloud Bucket, so the free credits won’t get you very far. Google also requires you to sign up for a GCP account and project — whether you're using the free tier or paid.
With good accuracy and 125+ languages supported, Google is a decent choice if you’re willing to put in some initial work.
- 60 minutes of free transcription
- $300 in free credits for Google Cloud hosting
- Decent accuracy
- Multi-language support
- Only supports transcription of files in a Google Cloud Bucket
- Difficult to get started
- Lower accuracy than other similarly-priced APIs
- AWS Transcribe
AWS Transcribe offers one hour free per month for the first 12 months of use.
Like Google, you must create an AWS account first if you don’t already have one. AWS also has lower accuracy compared to alternative APIs and only supports transcribing files already in an Amazon S3 bucket.
However, if you’re looking for a specific feature, like medical transcription, AWS has some options. Its Transcribe Medical API is a medical-focused ASR option that is available today.
- One hour free per month for the first 12 months of use
- Tiered pricing , based on usage, ranges from $0.02400 to $0.00780
- Integrates into existing AWS ecosystem
- Medical language transcription
- Difficult to get started from scratch
- Only supports transcribing files already in an Amazon S3 bucket
Open-Source Speech Transcription engines
An alternative to APIs and AI models, open-source Speech-to-Text libraries are completely free--with no limits on use. Some developers also see data security as a plus, since your data doesn’t have to be sent to a third party or the cloud.
There is work involved with open-source engines, so you must be comfortable putting in a lot of time and effort to get the results you want, especially if you are trying to use these libraries at scale. Open-source Speech-to-Text engines are typically less accurate than the APIs discussed above.
If you want to go the open-source route, here are some options worth exploring:
DeepSpeech is an open-source embedded Speech-to-Text engine designed to run in real-time on a range of devices, from high-powered GPUs to a Raspberry Pi 4. The DeepSpeech library uses end-to-end model architecture pioneered by Baidu.
DeepSpeech also has decent out-of-the-box accuracy for an open-source option and is easy to fine-tune and train on your own data.
- Easy to customize
- Can use it to train your own model
- Can be used on a wide range of devices
- Lack of support
- No model improvement outside of individual custom training
- Heavy lift to integrate into production-ready applications
Kaldi is a speech recognition toolkit that has been widely popular in the research community for many years.
Like DeepSpeech, Kaldi has good out-of-the-box accuracy and supports the ability to train your own models. It’s also been thoroughly tested—a lot of companies currently use Kaldi in production and have used it for a while—making more developers confident in its application.
- Can use it to train your own models
- Active user base
- Can be complex and expensive to use
- Uses a command-line interface
Flashlight ASR (formerly Wav2Letter)
Flashlight ASR, formerly Wav2Letter, is Facebook AI Research’s Automatic Speech Recognition (ASR) Toolkit. It is also written in C++ and usesthe ArrayFire tensor library.
Like DeepSpeech, Flashlight ASR is decently accurate for an open-source library and is easy to work with on a small project.
- Customizable
- Easier to modify than other open-source options
- Processing speed
- Very complex to use
- No pre-trained libraries available
- Need to continuously source datasets for training and model updates, which can be difficult and costly
- SpeechBrain
SpeechBrain is a PyTorch-based transcription toolkit. The platform releases open implementations of popular research works and offers a tight integration with Hugging Face for easy access.
Overall, the platform is well-defined and constantly updated, making it a straightforward tool for training and finetuning.
- Integration with Pytorch and Hugging Face
- Pre-trained models are available
- Supports a variety of tasks
- Even its pre-trained models take a lot of customization to make them usable
- Lack of extensive docs makes it not as user-friendly, except for those with extensive experience
Coqui is another deep learning toolkit for Speech-to-Text transcription. Coqui is used in over twenty languages for projects and also offers a variety of essential inference and productionization features.
The platform also releases custom-trained models and has bindings for various programming languages for easier deployment.
- Generates confidence scores for transcripts
- Large support comunity
- No longer updated and maintained by Coqui
Whisper by OpenAI, released in September 2022, is comparable to other current state-of-the-art open-source options.
Whisper can be used either in Python or from the command line and can also be used for multilingual translation.
Whisper has five different models of varying sizes and capabilities, depending on the use case, including v3 released in November 2023 .
However, you’ll need a fairly large computing power and access to an in-house team to maintain, scale, update, and monitor the model to run Whisper at a large scale, making the total cost of ownership higher compared to other options.
As of March 2023, Whisper is also now available via API . On-demand pricing starts at $0.006/minute.
- Multilingual transcription
- Can be used in Python
- Five models are available, each with different sizes and capabilities
- Need an in-house research team to maintain and update
- Costly to run
Which free Speech-to-Text API, AI model, or Open Source engine is right for your project?
The best free Speech-to-Text API, AI model, or open-source engine will depend on our project. Do you want something that is easy-to-use, has high accuracy, and has additional out-of-the-box features? If so, one of these APIs might be right for you:
Alternatively, you might want a completely free option with no data limits—if you don’t mind the extra work it will take to tailor a toolkit to your needs. If so, you might choose one of these open-source libraries:
Whichever you choose, make sure you find a product that can continually meet the needs of your project now and what your project may develop into in the future.
Want to get started with an API?
Get a free API key for AssemblyAI.
Popular posts
Talk to ChatGPT on a Phone Call
Featured writer
Case Studies
Veed co-founders turn to Speech AI to democratize AI video editing
How to use Google's Speech-to-Text API to transcribe audio in Python
Senior Developer Educator
Automatic Speech Recognition
Universal-2 vs OpenAI's Whisper: Comparing Speech-to-Text models in real-world use cases
Senior Developer Advocate