Best Audio Chunk Sizes for Transcription Services

If you're using an audio chunker for transcription, choosing the right segment size is critical. Too large, and your transcription API may fail or produce poor results. Too small, and you'll waste time and money on unnecessary API calls.

This guide covers the optimal audio chunk sizes for transcription across all major services, including OpenAI Whisper, Google Speech-to-Text, AWS Transcribe, and more.

Transcription API Limits: Quick Reference

Here's a comprehensive comparison of file limits for popular transcription services:

Service Max File Size Max Duration Recommended Chunk
OpenAI Whisper API 25 MB ~2-3 hours (varies) 10-15 minutes
Google Speech-to-Text 10 MB (sync) 1 min (sync) / unlimited (async) 1-5 minutes
AWS Transcribe 2 GB 4 hours 30-60 minutes
AssemblyAI 5 GB Unlimited 30-60 minutes
Rev.ai 2 GB 17 hours 30-60 minutes
Deepgram 2 GB Unlimited 30-60 minutes

OpenAI Whisper API: Optimal Chunk Size

The Whisper API is one of the most popular transcription services, but it has a strict 25 MB file size limit. Here's how to optimize your audio for Whisper:

Whisper API Recommendations

  • Optimal chunk size: 10-15 minutes of audio
  • File format: MP3 at 128kbps (best size/quality ratio)
  • Max file size: 25 MB

At 128kbps MP3, you can fit approximately 25-30 minutes of audio into the 25 MB limit. However, we recommend 10-15 minute chunks for several reasons:

Pro Tip: When using ChunkAudio as your audio chunker for transcription, enable Smart Silence Detection. This ensures your chunks don't cut off mid-sentence, which improves transcription accuracy.

Google Speech-to-Text: Chunk Size Guide

Google offers two transcription modes with very different limits:

Synchronous Recognition

Asynchronous Recognition

For synchronous transcription, you'll need to cut audio into segments of 1 minute or less. For async, larger chunks of 30-60 minutes work well.

AWS Transcribe: Chunk Size Guide

AWS Transcribe is more lenient with file sizes but has a 4-hour duration limit:

AWS Transcribe Recommendations

  • Optimal chunk size: 30-60 minutes
  • Max file size: 2 GB
  • Max duration: 4 hours
  • Supported formats: MP3, MP4, WAV, FLAC, OGG, AMR, WebM

How to Split Audio for Transcription

Here's the recommended workflow for preparing long audio files for transcription:

  1. Analyze your audio: Check the total duration and file size
  2. Choose your transcription service: Different services have different limits
  3. Calculate chunk size: Use the table above to determine optimal segment length
  4. Use an audio chunker: Split your audio into equal parts using ChunkAudio
  5. Enable silence detection: Ensure cuts happen at natural pauses
  6. Process and combine: Transcribe each chunk, then combine the results

Important: When combining transcripts from multiple chunks, pay attention to the segment boundaries. Even with silence detection, you may need to manually check for repeated or cut-off words at chunk boundaries.

File Format Considerations

The file format affects both size and compatibility:

Format Size per Minute Compatibility Recommendation
MP3 128kbps ~1 MB Universal Best for most APIs
MP3 320kbps ~2.4 MB Universal Better quality if needed
WAV 16-bit ~10 MB Universal Avoid - too large
FLAC ~5 MB Most APIs Good for quality priority

For most transcription use cases, MP3 at 128kbps offers the best balance of file size and audio quality. Transcription accuracy is rarely affected by lossy compression at this bitrate for speech.

Handling Long Recordings

For very long recordings (4+ hours), consider this workflow:

  1. Split into 30-minute chunks using ChunkAudio
  2. Transcribe chunks in parallel (most APIs support batch processing)
  3. Use timestamps to align and merge transcripts
  4. Review boundary points for accuracy

Split Audio for Transcription Now

Use ChunkAudio to prepare your audio files for any transcription service. Free, private, and instant.

Open ChunkAudio

Related Guides