Audio · MiniMax

Text-to-Speech

Convert text to natural-sounding speech using MiniMax speech-02-hd with voice cloning, voice design, and streaming support.

POST/v1/audio/speech

Supported Models

ModelProvider
speech-02-hdMiniMax

Request

Body Parameters

modelstringrequired

Use "speech-02-hd"

textstringrequired

Text to convert to speech

voice_settingobjectrequired

Voice configuration — voice_id, speed, vol, pitch

audio_settingobject

Output audio config — sample_rate, bitrate, format, channel

Use a built-in voice ID like male-qn-qingse, or a custom voice created via Voice Cloning or Voice Design.

voice_setting Fields

voice_idstringrequired

Built-in or cloned voice identifier

speednumber

Speech speed multiplier

Default: 1.0

volnumber

Volume level

Default: 1.0

pitchnumber

Pitch adjustment

Default: 0

audio_setting Fields

sample_rateinteger

Sample rate in Hz

Default: 32000

bitrateinteger

Bitrate in bps

Default: 128000

formatstring

Audio format

Default: mp3

Options: mp3, wav, pcm, flac

channelinteger

Number of audio channels

Default: 1

cURL
curl https://api.metriqual.com/v1/audio/speech \
  -H "Authorization: Bearer mql_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "speech-02-hd",
    "text": "Hello, welcome to Metriqual!",
    "voice_setting": {
      "voice_id": "male-qn-qingse",
      "speed": 1.0,
      "vol": 1.0,
      "pitch": 0
    },
    "audio_setting": {
      "sample_rate": 32000,
      "bitrate": 128000,
      "format": "mp3"
    }
  }' --output speech.mp3
TypeScript SDK
const audio = await mql.audio.speech({
  model: 'speech-02-hd',
  text: 'Hello, welcome to Metriqual!',
  voice_setting: {
    voice_id: 'male-qn-qingse',
    speed: 1.0,
    vol: 1.0,
    pitch: 0
  },
  audio_setting: {
    sample_rate: 32000,
    bitrate: 128000,
    format: 'mp3'
  }
});
Python SDK
audio = mql.audio.speech(
    model="speech-02-hd",
    text="Hello, welcome to Metriqual!",
    voice_setting={
        "voice_id": "male-qn-qingse",
        "speed": 1.0,
        "vol": 1.0,
        "pitch": 0,
    },
    audio_setting={
        "sample_rate": 32000,
        "bitrate": 128000,
        "format": "mp3",
    },
)

Async Speech (Long-form)

POST/v1/audio/speech/async

For long text, use async speech generation. Submit the request, then poll for status and download when complete.

Related Endpoints

GET /v1/audio/speech/async/:task_idendpoint

Check async task status

GET /v1/audio/speech/async/:task_id/downloadendpoint

Download completed audio

cURL
# Start async generation
curl -X POST https://api.metriqual.com/v1/audio/speech/async \
  -H "Authorization: Bearer mql_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "speech-02-hd",
    "text": "Very long text content goes here..."
  }'

# Check status
curl https://api.metriqual.com/v1/audio/speech/async/TASK_ID \
  -H "Authorization: Bearer mql_your_key"

# Download when ready
curl https://api.metriqual.com/v1/audio/speech/async/TASK_ID/download \
  -H "Authorization: Bearer mql_your_key" --output output.mp3
Python SDK
# Start async generation
task = mql.audio.speech_async(
    model="speech-02-hd",
    text="Very long text content goes here...",
    voice_setting={"voice_id": "male-qn-qingse"},
)

# Or submit and wait for completion in one call
audio = mql.audio.speech_async_and_wait(
    model="speech-02-hd",
    text="Very long text content goes here...",
    voice_setting={"voice_id": "male-qn-qingse"},
)

WebSocket Streaming

WS/v1/audio/speech/stream

Real-time TTS streaming over WebSocket. Connect, send text chunks, and receive audio in real-time. Ideal for interactive voice applications.