> ## Documentation Index
> Fetch the complete documentation index at: https://docs.callkaro.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Models, Voice & Transcriber

> Configure LLM models, voice providers, and transcribers for your AI agents

## Overview

Callkaro AI provides flexibility in choosing the right combination of **LLM Model**, **Voice Provider**, and **Transcriber** for your AI agents.

<CardGroup cols={3}>
  <Card title="LLM Models" icon="brain">
    Core intelligence powering your agent's conversation and decision-making capabilities
  </Card>

  <Card title="Voice Providers" icon="microphone">
    Text-to-speech engines that give your agent its unique voice and personality
  </Card>

  <Card title="Transcribers" icon="closed-captioning">
    Speech-to-text engines that convert customer speech into text for processing
  </Card>
</CardGroup>

***

## LLM Models

The Language Learning Model is the brain of your AI agent. It processes the conversation, makes decisions, and generates appropriate responses based on your system prompt.

### Available Models

#### OpenAI Models

<AccordionGroup>
  <Accordion title="gpt-4o" icon="star">
    OpenAI's flagship model with superior reasoning capabilities, multimodal understanding, and excellent performance across diverse tasks.

    **Realtime Compatible:** Yes (use `gpt-4o-realtime-preview`)
  </Accordion>

  <Accordion title="gpt-4o-mini" icon="bolt">
    Compact version of GPT-4o offering excellent balance between cost and performance.

    **Realtime Compatible:** Yes (use `gpt-4o-mini-realtime-preview`)
  </Accordion>

  <Accordion title="gpt-4.1" icon="sparkles">
    **Recommended for most use cases.** Advanced GPT-4 variant with improved performance and reliability.
  </Accordion>

  <Accordion title="gpt-4.1-mini" icon="zap">
    **Recommended for most use cases.** Optimized mini variant offering strong performance.
  </Accordion>

  <Accordion title="gpt-4.1-nano" icon="feather">
    Ultra-lightweight model for simple, straightforward interactions.
  </Accordion>
</AccordionGroup>

#### Open Source Models

<AccordionGroup>
  <Accordion title="llama-3.3-70b-versatile" icon="chess-knight">
    Large parameter Meta model offering versatile performance across various tasks.
  </Accordion>

  <Accordion title="llama-4-maverick-17b-128e-instruct" icon="rocket">
    Efficiency-optimized Llama 4 variant with extended context window.
  </Accordion>

  <Accordion title="llama-4-scout-17b-16e-instruct" icon="shield">
    Fast and affordable Llama 4 model for standard conversational tasks.
  </Accordion>

  <Accordion title="llama-3.1-8b-instant" icon="flash">
    Lightweight, fast model for quick responses.
  </Accordion>

  <Accordion title="gemma2-9b-it" icon="gem">
    Google's Gemma model optimized for instruction-following and conversation.
  </Accordion>
</AccordionGroup>

#### Realtime Models

<Warning>
  **Realtime models require OpenAI voice provider.** When selecting a realtime model, your voice provider will automatically switch to OpenAI.
</Warning>

<AccordionGroup>
  <Accordion title="gpt-4o-realtime-preview" icon="tower-broadcast">
    Premium realtime model with ultra-low latency and natural conversation flow.

    **Features:**

    * Sub-200ms response time
    * Natural interruptions and turn-taking
    * Real-time voice streaming
    * Advanced emotion detection
  </Accordion>

  <Accordion title="gpt-4o-mini-realtime-preview" icon="signal-stream">
    Cost-effective realtime model for most voice applications.

    **Features:**

    * Low latency responses
    * Real-time voice streaming
    * Natural conversation flow
  </Accordion>
</AccordionGroup>

### Model Parameters

#### Temperature

**Range:** 0.0 to 1.0

Temperature controls the randomness and creativity of the model's responses.

| Value         | Behavior                            |
| ------------- | ----------------------------------- |
| **0.0 - 0.3** | Deterministic, focused, predictable |
| **0.4 - 0.7** | Balanced, natural, consistent       |
| **0.8 - 1.0** | Creative, varied, spontaneous       |

***

## Voice Providers

Voice providers convert your agent's text responses into natural-sounding speech.

### Available Providers

<Tabs>
  <Tab title="Cartesia">
    Cartesia offers high-quality, low-latency voice synthesis with extensive multilingual support and emotion control.

    ### Features

    * Ultra-low latency (\< 300ms)
    * Extensive Indian language support
    * Emotion and style control
    * Multiple voice models (sonic, sonic-turbo, sonic-2, sonic-3)

    ### Available Voices

    #### English (Indian Accent)

    * **Janvi** - Slower, conversational female voice (customer support, hotel reception)
    * **Kiara** - Versatile, engaging female voice (commercials, narrations, promos)
    * **Aditi** - Slower female voice (commercials, narrations)
    * **Devansh** - Friendly, neutral male voice (call center support)
    * **Neil** - Clear and crisp male voice (customer support, sales, reception)
    * **Indian Lady** - Young, rich, curious voice (narrator, fictional character)
    * **Indian Man** - Smooth male voice (narrator)

    #### Hindi & Hinglish

    * **Apoorva** - Warm, friendly female (Hinglish sales, commercials)
    * **Ananya** - Warm, friendly female (Hinglish sales)
    * **Hinglish Speaking Woman** - Versatile bilingual voice
    * **Ishan** - Conversational male (Hinglish sales, support)
    * **Ayush** - Confident young male (Hindi demos, instructions)
    * **Rupali** - Firm young female (Hindi natural conversation)
    * **Aadhya** - Slower Hindi female conversational voice
    * **Amit** - Calm, clear Hindi male (narration, conversation)
    * **Indian Conversational Woman** - Warm feminine voice
    * **Parvati** - Young, friendly female (customer support)
    * **Mihir** - Deeper toned male (casual conversation, support)
    * **Hindi Reporter Man** - Clear, authoritative (news, documentaries)
    * **Hindi Narrator Man** - Warm, authoritative (audiobooks, documentaries)

    #### Regional Languages

    * **Prakash** (Kannada) - Instructor voice
    * **Divya** (Kannada) - Joyful narrator
    * **Suresh** (Marathi) - Instruction voice
    * **Anika** (Marathi) - Enthusiastic seller
    * **Vikram** (Telugu) - Folk narrator
    * **Sindhu** (Telugu) - Conversational partner
    * **Amit** (Gujarati) - Sports student
    * **Isha** (Gujarati) - Learner voice

    #### American Accent

    * **Brooke** - Friendly, natural female
    * **Wise Lady** - Authoritative narrator
    * **Corinne** - Smooth, conversational (phone calls, support)
    * **Cathy** - Enthusiastic coder
    * **Friendly Sidekick** - Supportive male (games, videos)

    ### Voice Models

    * **sonic-3** - Latest model, best quality
    * **sonic-turbo** - Optimized for speed
    * **sonic-2** - Stable, reliable
    * **sonic** - Original model

    ### Voice Parameters

    #### Speed (-1.0 to 1.0)

    Adjusts speaking rate relative to normal.

    ### Emotion Control

    Cartesia supports emotion control allowing you to adjust:

    * Curiosity
    * Positivity
    * Surprise
    * Anger
    * Sadness
  </Tab>

  <Tab title="Eleven Labs">
    Premium voice synthesis with industry-leading quality and extensive multilingual support.

    ### Features

    * Highest voice quality
    * Extensive language support (32+ languages)
    * Voice cloning capabilities
    * Advanced voice customization

    ### Voice Parameters

    #### Stability (0.0 - 1.0)

    Controls consistency and predictability of voice generation.

    #### Similarity Boost (0.0 - 1.0)

    Enhances similarity to the original voice sample.

    #### Style (0.0 - 1.0)

    Controls style exaggeration and expressiveness.

    #### Speed (0.5 - 2.0)

    Adjusts speaking rate.

    ### Voice Models

    * **eleven\_flash\_v2\_5** - Latest, fastest (32 languages)
    * **eleven\_turbo\_v2\_5** - High quality, low latency (32 languages)
    * **eleven\_turbo\_v2** - English-optimized, fast
    * **eleven\_flash\_v2** - English-optimized, ultra-fast

    ### Supported Languages

    * Hindi
    * English
    * Kannada
    * Marathi
    * Tamil
    * Telugu
    * Bengali
    * Gujarati
    * Malayalam

    <Note>
      **Need another language?** Contact support to request additional language support.
    </Note>
  </Tab>

  <Tab title="Sarvam">
    Indian-focused voice provider with excellent support for Indian languages.

    ### Features

    * Optimized for Indian accents
    * 11 Indian languages
    * Built for Indian market

    ### Available Voices

    * **Anushka** - Clear and professional
    * **Manisha** - Warm and friendly
    * **Vidya** - Articulate and precise
    * **Arya** - Young and energetic
    * **Abhilash** - Deep and authoritative
    * **Karun** - Natural and conversational
    * **Hitesh** - Professional and engaging

    ### Voice Models

    * **bulbul:v2** - Latest model with best quality

    ### Voice Parameters

    #### Pitch (0.0 - 1.0)

    Adjusts voice pitch.

    #### Speed (0.5 - 2.0)

    Adjusts speaking rate.

    ### Supported Languages

    * Hindi (hi-IN)
    * English (en-IN)
    * Kannada (kn-IN)
    * Marathi (mr-IN)
    * Tamil (ta-IN)
    * Telugu (te-IN)
    * Bengali (bn-IN)
    * Gujarati (gu-IN)
    * Malayalam (ml-IN)

    <Note>
      **Need another language?** Contact support to request additional language support.
    </Note>
  </Tab>

  <Tab title="Azure">
    Microsoft Azure's reliable Text-to-Speech service with extensive global language support.

    ### Features

    * Wide language coverage
    * Voice style support
    * Enterprise-grade reliability

    ### Voice Parameters

    #### Speed (-1.0 to 1.0)

    Adjusts speaking rate relative to normal.

    #### Pitch (-1.0 to 1.0)

    Adjusts voice pitch.

    #### Volume (0 - 100)

    Controls voice volume level.

    #### Voice Style

    Available styles vary by voice, common styles include:

    * default
    * cheerful
    * calm
    * empathetic
    * newscast
    * customerservice

    #### Style Degree (0.01 - 2.0)

    Controls intensity of the voice style.
  </Tab>

  <Tab title="OpenAI">
    OpenAI's built-in voices for realtime models only.

    ### Features

    * Ultra-low latency
    * Natural conversation flow
    * Automatic emotion detection

    ### Available Voices

    * **sage** - Balanced, professional
    * **alloy** - Neutral, versatile
    * **ash** - Clear, articulate
    * **ballad** - Smooth, melodic
    * **coral** - Warm, friendly
    * **echo** - Deep, resonant
    * **shimmer** - Light, energetic
    * **verse** - Expressive, dynamic

    <Warning>
      **OpenAI voices are only available with realtime models** (`gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview`).
    </Warning>
  </Tab>
</Tabs>

***

## Transcribers

Transcribers convert customer speech into text that the LLM can process.

### Available Providers

<Tabs>
  <Tab title="Deepgram">
    Industry-leading speech recognition with high accuracy and extensive model options.

    ### Features

    * Highest accuracy for English
    * Domain-specific models
    * Multilingual support
    * Keyword boosting

    ### Models

    #### Nova Series

    * **nova-3-general** - Latest general-purpose model, best overall accuracy
    * **nova-3** - Advanced multi-language support
    * **nova-3-medical** - Optimized for medical terminology
    * **nova-2-general** - Stable, reliable general-purpose
    * **nova-2-phonecall** - Optimized for phone call quality
    * **nova-2-meeting** - Best for meetings and conference calls
    * **nova-2-conversationalai** - Optimized for AI conversations
    * **nova-2-medical** - Medical terminology support
    * **nova-2-finance** - Financial terminology support
    * **nova-2-voicemail** - Voicemail optimization
    * **nova-2-drivethru** - Drive-through environments
    * **nova-2-automotive** - In-vehicle environments

    #### Other Models

    * **flux-general** - Fast, lightweight option
    * **voicemail** - Voicemail-specific

    ### Supported Languages

    * Hindi
    * English
    * Kannada
    * Marathi
    * Tamil
    * Telugu
    * Bengali
    * Gujarati
    * Malayalam

    <Note>
      **Need another language?** Contact support to request additional language support.
    </Note>

    ### Keyword Boosting

    Deepgram supports **keyword boosting** to improve recognition of specific terms like company names, product names, and industry-specific jargon.
  </Tab>

  <Tab title="Groq">
    Ultra-fast transcription powered by Whisper models on specialized hardware.

    ### Features

    * Extremely fast inference
    * Whisper model quality
    * Good multilingual support

    ### Models

    * **whisper-large-v3-turbo** - Fastest, recommended for most use cases
    * **whisper-large-v3** - Highest quality
    * **distil-whisper-large-v3-en** - Optimized for English

    ### Supported Languages

    * Hindi
    * English
    * Kannada
    * Marathi
    * Tamil
    * Telugu
    * Bengali
    * Gujarati
    * Malayalam

    <Note>
      **Need another language?** Contact support to request additional language support.
    </Note>
  </Tab>

  <Tab title="Sarvam">
    Indian-language-focused transcription with excellent Hindi and regional language support.

    ### Features

    * Best-in-class Hindi recognition
    * 12 Indian languages
    * Code-mixing support (Hinglish)
    * Unknown language detection

    ### Models

    * **saarika:v2.5** - Latest, highest accuracy (Recommended)
    * **saaras:v2.5** - Alternative model variant
    * **saarika:v2.0** - Stable version
    * **saarika:v2** - Standard version
    * **saarika:v1** - Legacy version
    * **saarika:flash** - Fast inference

    ### Supported Languages

    * Hindi (hi-IN)
    * English (en-IN)
    * Kannada (kn-IN)
    * Marathi (mr-IN)
    * Tamil (ta-IN)
    * Telugu (te-IN)
    * Bengali (bn-IN)
    * Gujarati (gu-IN)
    * Malayalam (ml-IN)

    <Note>
      **Need another language?** Contact support to request additional language support.
    </Note>
  </Tab>

  <Tab title="Azure">
    Microsoft Azure Speech-to-Text with strong Indian language support.

    ### Features

    * Multi-language detection
    * Reliable accuracy
    * Extensive Indian language support
    * Enterprise-grade service

    ### Multi-Language Support

    Azure transcriber supports **simultaneous multi-language detection**, allowing your agent to automatically detect and transcribe multiple languages within the same conversation.

    ### Supported Languages

    * Hindi (hi-IN)
    * English (en-IN)
    * Kannada (kn-IN)
    * Marathi (mr-IN)
    * Tamil (ta-IN)
    * Telugu (te-IN)
    * Bengali (bn-IN)
    * Gujarati (gu-IN)
    * Malayalam (ml-IN)

    <Note>
      **Need another language?** Contact support to request additional language support.
    </Note>
  </Tab>

  <Tab title="Eleven Labs">
    Eleven Labs' transcription service with multilingual support.

    ### Features

    * Integrated with Eleven Labs voice
    * Good multilingual support
    * Decent Indian language coverage

    ### Models

    * **scribe\_v2** - Latest transcription model

    ### Supported Languages

    * Hindi
    * English
    * Kannada
    * Marathi
    * Tamil
    * Telugu
    * Bengali
    * Gujarati
    * Malayalam

    <Note>
      **Need another language?** Contact support to request additional language support.
    </Note>
  </Tab>
</Tabs>
