If you're using an audio chunker for transcription, choosing the right segment size is critical. Too large, and your transcription API may fail or produce poor results. Too small, and you'll waste time and money on unnecessary API calls.
This guide covers the optimal audio chunk sizes for transcription across all major services, including OpenAI Whisper, Google Speech-to-Text, AWS Transcribe, and more.
Transcription API Limits: Quick Reference
Here's a comprehensive comparison of file limits for popular transcription services:
| Service | Max File Size | Max Duration | Recommended Chunk |
|---|---|---|---|
| OpenAI Whisper API | 25 MB | ~2-3 hours (varies) | 10-15 minutes |
| Google Speech-to-Text | 10 MB (sync) | 1 min (sync) / unlimited (async) | 1-5 minutes |
| AWS Transcribe | 2 GB | 4 hours | 30-60 minutes |
| AssemblyAI | 5 GB | Unlimited | 30-60 minutes |
| Rev.ai | 2 GB | 17 hours | 30-60 minutes |
| Deepgram | 2 GB | Unlimited | 30-60 minutes |
OpenAI Whisper API: Optimal Chunk Size
The Whisper API is one of the most popular transcription services, but it has a strict 25 MB file size limit. Here's how to optimize your audio for Whisper:
Whisper API Recommendations
- Optimal chunk size: 10-15 minutes of audio
- File format: MP3 at 128kbps (best size/quality ratio)
- Max file size: 25 MB
At 128kbps MP3, you can fit approximately 25-30 minutes of audio into the 25 MB limit. However, we recommend 10-15 minute chunks for several reasons:
- Better error handling - if one chunk fails, you only need to retry that segment
- Improved context accuracy - shorter segments tend to produce more accurate transcriptions
- Easier to manage timestamps when combining transcripts
Pro Tip: When using ChunkAudio as your audio chunker for transcription, enable Smart Silence Detection. This ensures your chunks don't cut off mid-sentence, which improves transcription accuracy.
Google Speech-to-Text: Chunk Size Guide
Google offers two transcription modes with very different limits:
Synchronous Recognition
- Max duration: 1 minute
- Max file size: 10 MB
- Best for: Short clips, real-time applications
Asynchronous Recognition
- Max duration: 480 minutes (8 hours)
- File must be in Google Cloud Storage
- Best for: Long-form content
For synchronous transcription, you'll need to cut audio into segments of 1 minute or less. For async, larger chunks of 30-60 minutes work well.
AWS Transcribe: Chunk Size Guide
AWS Transcribe is more lenient with file sizes but has a 4-hour duration limit:
AWS Transcribe Recommendations
- Optimal chunk size: 30-60 minutes
- Max file size: 2 GB
- Max duration: 4 hours
- Supported formats: MP3, MP4, WAV, FLAC, OGG, AMR, WebM
How to Split Audio for Transcription
Here's the recommended workflow for preparing long audio files for transcription:
- Analyze your audio: Check the total duration and file size
- Choose your transcription service: Different services have different limits
- Calculate chunk size: Use the table above to determine optimal segment length
- Use an audio chunker: Split your audio into equal parts using ChunkAudio
- Enable silence detection: Ensure cuts happen at natural pauses
- Process and combine: Transcribe each chunk, then combine the results
Important: When combining transcripts from multiple chunks, pay attention to the segment boundaries. Even with silence detection, you may need to manually check for repeated or cut-off words at chunk boundaries.
File Format Considerations
The file format affects both size and compatibility:
| Format | Size per Minute | Compatibility | Recommendation |
|---|---|---|---|
| MP3 128kbps | ~1 MB | Universal | Best for most APIs |
| MP3 320kbps | ~2.4 MB | Universal | Better quality if needed |
| WAV 16-bit | ~10 MB | Universal | Avoid - too large |
| FLAC | ~5 MB | Most APIs | Good for quality priority |
For most transcription use cases, MP3 at 128kbps offers the best balance of file size and audio quality. Transcription accuracy is rarely affected by lossy compression at this bitrate for speech.
Handling Long Recordings
For very long recordings (4+ hours), consider this workflow:
- Split into 30-minute chunks using ChunkAudio
- Transcribe chunks in parallel (most APIs support batch processing)
- Use timestamps to align and merge transcripts
- Review boundary points for accuracy
Split Audio for Transcription Now
Use ChunkAudio to prepare your audio files for any transcription service. Free, private, and instant.
Open ChunkAudio