Audio · OpenAI

Audio Transcription

Transcribe audio files to text using OpenAI Whisper and GPT-4o transcription models through the Metriqual gateway.

POST/v1/audio/transcriptions

Supported Models

Model	Provider	Description
`whisper-1`	OpenAI	Whisper v2 — general purpose transcription
`gpt-4o-transcribe`	OpenAI	GPT-4o powered transcription with higher accuracy
`gpt-4o-mini-transcribe`	OpenAI	Faster GPT-4o transcription, lower cost

Request

Multipart Form Parameters

filefileformrequired

Audio file (mp3, mp4, mpeg, mpga, m4a, wav, webm — max 25 MB)

modelstringformrequired

Model ID

languagestringform

ISO-639-1 language code (improves accuracy)

promptstringform

Hint text to guide transcription style

response_formatstringform

Output format

Default: json

Options: json, text, srt, verbose_json, vtt, diarized_json

temperaturenumberform

Sampling temperature (0-1)

Default: 0

include[]stringform

Extra fields to include

Options: logprobs

cURL — Whisper

curl https://api.metriqual.com/v1/audio/transcriptions \
  -H "Authorization: Bearer mql_your_key" \
  -F file=@audio.mp3 \
  -F model=whisper-1 \
  -F language=en

cURL — GPT-4o Transcribe

curl https://api.metriqual.com/v1/audio/transcriptions \
  -H "Authorization: Bearer mql_your_key" \
  -F file=@meeting.mp3 \
  -F model=gpt-4o-transcribe \
  -F response_format=verbose_json \
  -F "include[]=logprobs"

TypeScript SDK

const result = await mql.audio.transcribe({
  file: audioBuffer,
  model: 'gpt-4o-transcribe',
  language: 'en',
  response_format: 'verbose_json',
  include: ['logprobs']
});

console.log(result.text);
console.log(result.logprobs); // token-level probabilities

Python SDK

with open("audio.mp3", "rb") as f:
    result = mql.audio.transcribe(
        file=f,
        model="gpt-4o-transcribe",
        language="en",
    )
print(result["text"])

Response

The response format depends on the response_format parameter.

diarized_json (gpt-4o models only) includes speaker labels for each segment.

200

json (default)

{
  "text": "Hello, this is a test transcription of audio content."
}

200

verbose_json

{
  "text": "Hello world.",
  "task": "transcribe",
  "language": "english",
  "duration": 3.42,
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 3.42,
      "text": " Hello world.",
      "temperature": 0
    }
  ],
  "logprobs": [
    { "token": "Hello", "logprob": -0.12 },
    { "token": " world", "logprob": -0.05 }
  ]
}

Quick Start