Speech-to-Text
Speech-to-Text (STT) API
Overview
The Speech-to-Text API provides endpoints for converting audio to text transcriptions. It supports both synchronous (real-time) and asynchronous (batch) processing modes.
Base URL: https://asr-api.stg.prosa.ai/v2/speech/stt
Authentication: All endpoints require the x-api-key header with your API key.
List Models (GET /stt/models)
List all available ASR (Automatic Speech Recognition) models.
Authentication: API key required
Request:
curl -X GET 'https://asr-api.stg.prosa.ai/v2/speech/stt/models' \
-H 'x-api-key: your-api-key'Response (200 OK):
[
{
"name": "stt-general",
"label": "ASR General",
"language": "Bahasa Indonesia",
"domain": "general",
"acoustic": "recording",
"channels": 1,
"samplerate": 16000
}
]Response Fields:
name
string
Model identifier (use this in transcribe requests)
label
string
Human-readable model name
language
string
Supported language
domain
string
Model specialization domain
acoustic
string
Optimal audio source type
channels
integer
Optimal number of audio channels
samplerate
integer
Optimal sample rate in Hz
Error Responses:
401 Unauthorized: Invalid or missing API key
Transcribe Audio (POST /stt)
Submit an audio file for transcription. Supports both synchronous and asynchronous modes.
Input (data vs URI): For small amounts of audio (e.g. below one minute; the threshold may vary by configuration), include base64-encoded audio in the request body. For larger audio, provide a publicly accessible URI to the audio file instead.
Currently supported URI:
HTTP URL that returns the audio file, e.g.
https://storage.example.com/file.wavGoogle Drive: URL to a Google Drive audio file or a Google Drive file ID, e.g.
googledrive://file_id
Processing behavior:
Short ASR requests: The job is processed on the fly and the client is expected to wait for the result in the response. If the job cannot be completed within the allotted time, it is queued and only the job ID is returned.
Long ASR requests: The job is always queued. Poll for results using the job endpoints, or set up a webhook endpoint to receive notifications. See Receiving Webhook.
Authentication: API key required
Request:
Request Body - Config:
engine
string
✅ Yes
-
ASR model name (from list models)
wait
boolean
No
false
true for sync, false for async
speaker_count
integer
No
1
Expected number of speakers
include_filler
boolean
No
false
Include filler words (um, uh)
include_partial_results
boolean
No
false
Include partial transcriptions
auto_punctuation
boolean
No
false
Auto-add punctuation
enable_spoken_numerals
boolean
No
false
Convert "one" to "1"
enable_speech_insights
boolean
No
false
Enable speech analytics
enable_voice_insights
boolean
No
false
Enable voice analytics
Request Body - Request:
data
string
Conditional
Base64-encoded audio (required if no uri)
uri
string
Conditional
URL to audio file (required if no data)
label
string
No
Optional label for the job
duration
number
No
Audio duration in seconds
mime_type
string
No
Audio MIME type
sample_rate
integer
No
Audio sample rate
channels
integer
No
Number of audio channels
⚠️ Important: Either
dataorurimust be provided, but not both. URI-based requests are only allowed for asynchronous requests (wait: false).
Response (200 OK) - Synchronous:
Response (200 OK) - Asynchronous:
Response Fields:
job_id
string (UUID)
Unique job identifier
status
string
Job status (see status values below)
created_at
string (datetime)
Job creation timestamp
modified_at
string (datetime)
Last modification timestamp
result.data
array
Array of transcription segments
result.data[].transcript
string
Transcribed text
result.data[].final
boolean
Whether segment is complete
result.data[].time_start
number
Start time in seconds
result.data[].time_end
number
End time in seconds
result.data[].channel
integer
Audio channel number
Error Responses:
400 Bad Request: Invalid audio data
400 Bad Request: Model not found
400 Bad Request: No audio provided
401 Unauthorized: Invalid API key
List Jobs (GET /stt)
Retrieve all STT jobs with optional filtering.
Authentication: API key required
Request:
Query Parameters:
page
integer
No
Page number (default: 1)
per_page
integer
No
Items per page (default: 10)
from_date
string (date)
No
Filter from date (YYYY-MM-DD)
until_date
string (date)
No
Filter until date (YYYY-MM-DD)
sort_by
string
No
Sort field (time or label)
sort_ascend
boolean
No
Sort ascending
query_text
string
No
Search in transcription text
Response (200 OK):
Get Job (GET /stt/{job_id})
Retrieve a specific STT job with full results.
Authentication: API key required
Request:
Path Parameters:
job_id
string (UUID)
✅ Yes
Job identifier
Response (200 OK):
Error Responses:
404 Not Found: Job not found
Get Job Status (GET /stt/{job_id}/status)
Retrieve only the status of a job (lightweight endpoint).
Authentication: API key required
Request:
Path Parameters:
job_id
string (UUID)
✅ Yes
Job identifier
Response (200 OK):
Response Fields:
job_id
string (UUID)
Job identifier
status
string
Current job status
progress.total
number
Overall progress percentage
progress.details.transfer
number
Transfer progress %
progress.details.transcribe
number
Transcription progress %
Archive Job (DELETE /stt/{job_id})
Soft delete a job. Archived jobs are retained for audit purposes.
Authentication: API key required
Request:
Path Parameters:
job_id
string (UUID)
✅ Yes
Job identifier
Response (200 OK):
Error Responses:
403 Forbidden: Job is in progress
404 Not Found: Job not found or already archived
Note: This performs a soft delete - the job is marked as archived but data is retained for audit purposes.
Job Status Values
created
Job has been created
queued
Job is waiting to be processed
in_progress
Job is being processed
complete
Job completed successfully
failed
Job failed due to an error
cancelled
Job was cancelled
Supported Audio Formats
Audio
.wav, .mp3, .m4a, .ogg, .weba, .webm, .flac, .gsm, .wma
Video
.mp4, .webm, .mov, .avi, .wmv, .mpg
Limits
Max audio duration (sync)
60 seconds
Max audio duration (async)
4 hours
Max request size
10 MB
Max concurrent jobs
Contact support
Webhooks
Instead of polling for job status, you can receive real-time notifications when jobs complete or fail.
See Webhooks for:
Creating webhook endpoints
Event types (
stt.job.completed,stt.job.failed)Verifying webhook signatures
Managing deliveries
Last updated
Was this helpful?