Voice

Voice Models on the
Same Hicap Workflow

Run ElevenLabs text-to-speech and speech-to-text through Hicap without adding a separate auth flow, endpoint surface, or billing path. Keep the same Hicap base URL and ship voice next to the rest of your AI stack.

Voice quickstart

Same base URL. Same auth header. Voice added cleanly.

The integration model stays simple: route requests through Hicap, keep using your Hicap key, and target the ElevenLabs-compatible voice endpoints you need.

1

Point requests at https://api.hicap.ai/v1.

2

Send your Hicap key in the api-key header.

3

Use ElevenLabs model IDs for TTS and STT requests.

Text to speech
Eleven v3
Expressive generation across 70+ languages
Long-form voice
Multilingual v2
Stable multilingual output across 29 languages
Speech to text
Scribe v2
High-accuracy transcription with diarization support
Platform

What Stays the Same

The point of the voice route is consolidation, not a separate setup track. Teams already using Hicap should not have to think about voice as a second platform.

Same Hicap base URL

Keep requests on https://api.hicap.ai/v1 and authenticate with the same api-key header you already use for chat and other model traffic.

ElevenLabs request shape

Use the ElevenLabs-style voice paths and model IDs while routing traffic through Hicap instead of wiring up a separate voice integration.

One platform for AI + voice

Keep billing, access, and operational routing in one place whether your app is generating text, audio, or transcripts.

Text to Speech

Choose the Right Voice Model

Hicap currently exposes ElevenLabs' main speech generation models so teams can cover both expressive voice work and steadier long-form narration from one route.

Eleven v3

Expressive speech synthesis

Best fit when voice tone, character, and performance matter. ElevenLabs positions Eleven v3 as its most emotionally rich text-to-speech model.

70+ supported languages
Built for dynamic, expressive delivery
Multi-speaker dialogue support
Up to 5,000 characters per request

Eleven Multilingual v2

Stable long-form generation

A steadier option for narration, explainers, and multilingual production where consistency over longer passages matters more than theatrical range.

29 supported languages
Natural long-form generation
Consistent multilingual delivery
Up to 10,000 characters per request
Speech to Text

Transcription Models for Production Audio

Both Scribe models are available through Hicap, from broad language coverage to newer transcription features like speaker diarization and transcript cleanup.

Scribe v1

Broad language coverage

A straightforward speech-to-text option for turning recorded audio into searchable text across a wide language set.

90+ supported languages
Word-level timestamps
Audio and video transcription
Available through the same Hicap gateway

Scribe v2

Higher-accuracy transcription

The more capable transcription option for production workflows that need better recognition, speaker separation, and cleaner transcripts.

Keyterm prompting up to 1000 terms
Speaker diarization up to 32 speakers
Dynamic audio tagging
Optional transcript cleanup
Quickstart

Same Hicap URL. Same api-key Header.

These examples keep the ElevenLabs endpoint shapes and model IDs while moving authentication and routing onto Hicap.

Text to Speech

Generate audio through Hicap

bash
curl --request POST \\
--url "https://api.hicap.ai/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \\
--header "Content-Type: application/json" \\
--header "api-key: $HICAP_API_KEY" \\
--data '{
"text": "The first move is what sets everything in motion.",
"model_id": "eleven_v3"
}' \\
--output speech.mp3

Replace JBFqnCBsd6RMkjVDRZzb with the ElevenLabs voice ID you want to use. If you need a different response format, follow the ElevenLabs-compatible request options while keeping the Hicap base URL and auth header.

Speech to Text

Transcribe files through Hicap

bash
curl --request POST \\
--url "https://api.hicap.ai/v1/speech-to-text" \\
--header "api-key: $HICAP_API_KEY" \\
--form "file=@./meeting.mp3" \\
--form "model_id=scribe_v2"

Send audio or video with multipart form data and switch the model_id between scribe_v1 and scribe_v2based on the transcription quality and feature set you need.

Current Hicap voice coverage includes Eleven v3 and Eleven Multilingual v2 for generation, plus Scribe v1 and Scribe v2 for transcription. That keeps the voice surface focused and predictable while the rest of the Hicap model catalog remains available through the same account.

Ready to add voice to your stack?

Bring speech generation and transcription into the same Hicap workflow your team already understands.