Speech-to-Text

Speech-to-Text (STT) API

Overview

The Speech-to-Text API provides endpoints for converting audio to text transcriptions. It supports both synchronous (real-time) and asynchronous (batch) processing modes.

Base URL: https://asr-api.stg.prosa.ai/v2/speech/stt

Authentication: All endpoints require the x-api-key header with your API key.


List Models (GET /stt/models)

List all available ASR (Automatic Speech Recognition) models.

Authentication: API key required

Request:

curl -X GET 'https://asr-api.stg.prosa.ai/v2/speech/stt/models' \
  -H 'x-api-key: your-api-key'

Response (200 OK):

[
  {
    "name": "stt-general",
    "label": "ASR General",
    "language": "Bahasa Indonesia",
    "domain": "general",
    "acoustic": "recording",
    "channels": 1,
    "samplerate": 16000
  }
]

Response Fields:

Field
Type
Description

name

string

Model identifier (use this in transcribe requests)

label

string

Human-readable model name

language

string

Supported language

domain

string

Model specialization domain

acoustic

string

Optimal audio source type

channels

integer

Optimal number of audio channels

samplerate

integer

Optimal sample rate in Hz

Error Responses:

  • 401 Unauthorized: Invalid or missing API key


Transcribe Audio (POST /stt)

Submit an audio file for transcription. Supports both synchronous and asynchronous modes.

Input (data vs URI): For small amounts of audio (e.g. below one minute; the threshold may vary by configuration), include base64-encoded audio in the request body. For larger audio, provide a publicly accessible URI to the audio file instead.

Currently supported URI:

  • HTTP URL that returns the audio file, e.g. https://storage.example.com/file.wav

  • Google Drive: URL to a Google Drive audio file or a Google Drive file ID, e.g. googledrive://file_id

Processing behavior:

  • Short ASR requests: The job is processed on the fly and the client is expected to wait for the result in the response. If the job cannot be completed within the allotted time, it is queued and only the job ID is returned.

  • Long ASR requests: The job is always queued. Poll for results using the job endpoints, or set up a webhook endpoint to receive notifications. See Receiving Webhook.

Authentication: API key required

Request:

Request Body - Config:

Field
Type
Required
Default
Description

engine

string

✅ Yes

-

ASR model name (from list models)

wait

boolean

No

false

true for sync, false for async

speaker_count

integer

No

1

Expected number of speakers

include_filler

boolean

No

false

Include filler words (um, uh)

include_partial_results

boolean

No

false

Include partial transcriptions

auto_punctuation

boolean

No

false

Auto-add punctuation

enable_spoken_numerals

boolean

No

false

Convert "one" to "1"

enable_speech_insights

boolean

No

false

Enable speech analytics

enable_voice_insights

boolean

No

false

Enable voice analytics

Request Body - Request:

Field
Type
Required
Description

data

string

Conditional

Base64-encoded audio (required if no uri)

uri

string

Conditional

URL to audio file (required if no data)

label

string

No

Optional label for the job

duration

number

No

Audio duration in seconds

mime_type

string

No

Audio MIME type

sample_rate

integer

No

Audio sample rate

channels

integer

No

Number of audio channels

⚠️ Important: Either data or uri must be provided, but not both. URI-based requests are only allowed for asynchronous requests (wait: false).

Response (200 OK) - Synchronous:

Response (200 OK) - Asynchronous:

Response Fields:

Field
Type
Description

job_id

string (UUID)

Unique job identifier

status

string

Job status (see status values below)

created_at

string (datetime)

Job creation timestamp

modified_at

string (datetime)

Last modification timestamp

result.data

array

Array of transcription segments

result.data[].transcript

string

Transcribed text

result.data[].final

boolean

Whether segment is complete

result.data[].time_start

number

Start time in seconds

result.data[].time_end

number

End time in seconds

result.data[].channel

integer

Audio channel number

Error Responses:

  • 400 Bad Request: Invalid audio data

  • 400 Bad Request: Model not found

  • 400 Bad Request: No audio provided

  • 401 Unauthorized: Invalid API key


List Jobs (GET /stt)

Retrieve all STT jobs with optional filtering.

Authentication: API key required

Request:

Query Parameters:

Parameter
Type
Required
Description

page

integer

No

Page number (default: 1)

per_page

integer

No

Items per page (default: 10)

from_date

string (date)

No

Filter from date (YYYY-MM-DD)

until_date

string (date)

No

Filter until date (YYYY-MM-DD)

sort_by

string

No

Sort field (time or label)

sort_ascend

boolean

No

Sort ascending

query_text

string

No

Search in transcription text

Response (200 OK):


Get Job (GET /stt/{job_id})

Retrieve a specific STT job with full results.

Authentication: API key required

Request:

Path Parameters:

Parameter
Type
Required
Description

job_id

string (UUID)

✅ Yes

Job identifier

Response (200 OK):

Error Responses:

  • 404 Not Found: Job not found


Get Job Status (GET /stt/{job_id}/status)

Retrieve only the status of a job (lightweight endpoint).

Authentication: API key required

Request:

Path Parameters:

Parameter
Type
Required
Description

job_id

string (UUID)

✅ Yes

Job identifier

Response (200 OK):

Response Fields:

Field
Type
Description

job_id

string (UUID)

Job identifier

status

string

Current job status

progress.total

number

Overall progress percentage

progress.details.transfer

number

Transfer progress %

progress.details.transcribe

number

Transcription progress %


Archive Job (DELETE /stt/{job_id})

Soft delete a job. Archived jobs are retained for audit purposes.

Authentication: API key required

Request:

Path Parameters:

Parameter
Type
Required
Description

job_id

string (UUID)

✅ Yes

Job identifier

Response (200 OK):

Error Responses:

  • 403 Forbidden: Job is in progress

  • 404 Not Found: Job not found or already archived

Note: This performs a soft delete - the job is marked as archived but data is retained for audit purposes.


Job Status Values

Status
Description

created

Job has been created

queued

Job is waiting to be processed

in_progress

Job is being processed

complete

Job completed successfully

failed

Job failed due to an error

cancelled

Job was cancelled


Supported Audio Formats

Format
Extensions

Audio

.wav, .mp3, .m4a, .ogg, .weba, .webm, .flac, .gsm, .wma

Video

.mp4, .webm, .mov, .avi, .wmv, .mpg


Limits

Limit
Value

Max audio duration (sync)

60 seconds

Max audio duration (async)

4 hours

Max request size

10 MB

Max concurrent jobs

Contact support


Webhooks

Instead of polling for job status, you can receive real-time notifications when jobs complete or fail.

See Webhooks for:

  • Creating webhook endpoints

  • Event types (stt.job.completed, stt.job.failed)

  • Verifying webhook signatures

  • Managing deliveries

Last updated

Was this helpful?