Skip to main content

Overview

Callkaro AI provides flexibility in choosing the right combination of LLM Model, Voice Provider, and Transcriber for your AI agents.

LLM Models

Core intelligence powering your agent’s conversation and decision-making capabilities

Voice Providers

Text-to-speech engines that give your agent its unique voice and personality

Transcribers

Speech-to-text engines that convert customer speech into text for processing

LLM Models

The Language Learning Model is the brain of your AI agent. It processes the conversation, makes decisions, and generates appropriate responses based on your system prompt.

Available Models

OpenAI Models

OpenAI’s flagship model with superior reasoning capabilities, multimodal understanding, and excellent performance across diverse tasks.Realtime Compatible: Yes (use gpt-4o-realtime-preview)
Compact version of GPT-4o offering excellent balance between cost and performance.Realtime Compatible: Yes (use gpt-4o-mini-realtime-preview)
Recommended for most use cases. Advanced GPT-4 variant with improved performance and reliability.
Recommended for most use cases. Optimized mini variant offering strong performance.
Ultra-lightweight model for simple, straightforward interactions.

Open Source Models

Large parameter Meta model offering versatile performance across various tasks.
Efficiency-optimized Llama 4 variant with extended context window.
Fast and affordable Llama 4 model for standard conversational tasks.
Lightweight, fast model for quick responses.
Google’s Gemma model optimized for instruction-following and conversation.

Realtime Models

Realtime models require OpenAI voice provider. When selecting a realtime model, your voice provider will automatically switch to OpenAI.
Premium realtime model with ultra-low latency and natural conversation flow.Features:
  • Sub-200ms response time
  • Natural interruptions and turn-taking
  • Real-time voice streaming
  • Advanced emotion detection
Cost-effective realtime model for most voice applications.Features:
  • Low latency responses
  • Real-time voice streaming
  • Natural conversation flow

Model Parameters

Temperature

Range: 0.0 to 1.0 (Default: 0.8) Temperature controls the randomness and creativity of the model’s responses.
ValueBehavior
0.0 - 0.3Deterministic, focused, predictable
0.4 - 0.7Balanced, natural, consistent
0.8 - 1.0Creative, varied, spontaneous

Voice Providers

Voice providers convert your agent’s text responses into natural-sounding speech.

Available Providers

Cartesia offers high-quality, low-latency voice synthesis with extensive multilingual support and emotion control.

Features

  • Ultra-low latency (< 300ms)
  • Extensive Indian language support
  • Emotion and style control
  • Multiple voice models (sonic, sonic-turbo, sonic-2, sonic-3)

Available Voices

English (Indian Accent)

  • Janvi - Slower, conversational female voice (customer support, hotel reception)
  • Kiara - Versatile, engaging female voice (commercials, narrations, promos)
  • Aditi - Slower female voice (commercials, narrations)
  • Devansh - Friendly, neutral male voice (call center support)
  • Neil - Clear and crisp male voice (customer support, sales, reception)
  • Indian Lady - Young, rich, curious voice (narrator, fictional character)
  • Indian Man - Smooth male voice (narrator)

Hindi & Hinglish

  • Apoorva - Warm, friendly female (Hinglish sales, commercials)
  • Ananya - Warm, friendly female (Hinglish sales)
  • Hinglish Speaking Woman - Versatile bilingual voice
  • Ishan - Conversational male (Hinglish sales, support)
  • Ayush - Confident young male (Hindi demos, instructions)
  • Rupali - Firm young female (Hindi natural conversation)
  • Aadhya - Slower Hindi female conversational voice
  • Amit - Calm, clear Hindi male (narration, conversation)
  • Indian Conversational Woman - Warm feminine voice
  • Parvati - Young, friendly female (customer support)
  • Mihir - Deeper toned male (casual conversation, support)
  • Hindi Reporter Man - Clear, authoritative (news, documentaries)
  • Hindi Narrator Man - Warm, authoritative (audiobooks, documentaries)

Regional Languages

  • Prakash (Kannada) - Instructor voice
  • Divya (Kannada) - Joyful narrator
  • Suresh (Marathi) - Instruction voice
  • Anika (Marathi) - Enthusiastic seller
  • Vikram (Telugu) - Folk narrator
  • Sindhu (Telugu) - Conversational partner
  • Amit (Gujarati) - Sports student
  • Isha (Gujarati) - Learner voice

American Accent

  • Brooke - Friendly, natural female
  • Wise Lady - Authoritative narrator
  • Corinne - Smooth, conversational (phone calls, support)
  • Cathy - Enthusiastic coder
  • Friendly Sidekick - Supportive male (games, videos)

Voice Models

  • sonic-3 - Latest model, best quality
  • sonic-turbo - Optimized for speed
  • sonic-2 - Stable, reliable
  • sonic - Original model

Voice Parameters

Speed (-1.0 to 1.0)

Adjusts speaking rate relative to normal.

Emotion Control

Cartesia supports emotion control allowing you to adjust:
  • Curiosity
  • Positivity
  • Surprise
  • Anger
  • Sadness

Transcribers

Transcribers convert customer speech into text that the LLM can process.

Available Providers

Industry-leading speech recognition with high accuracy and extensive model options.

Features

  • Highest accuracy for English
  • Domain-specific models
  • Multilingual support
  • Keyword boosting

Models

Nova Series

  • nova-3-general - Latest general-purpose model, best overall accuracy
  • nova-3 - Advanced multi-language support
  • nova-3-medical - Optimized for medical terminology
  • nova-2-general - Stable, reliable general-purpose
  • nova-2-phonecall - Optimized for phone call quality
  • nova-2-meeting - Best for meetings and conference calls
  • nova-2-conversationalai - Optimized for AI conversations
  • nova-2-medical - Medical terminology support
  • nova-2-finance - Financial terminology support
  • nova-2-voicemail - Voicemail optimization
  • nova-2-drivethru - Drive-through environments
  • nova-2-automotive - In-vehicle environments

Other Models

  • flux-general - Fast, lightweight option
  • voicemail - Voicemail-specific

Supported Languages

  • Hindi
  • English
  • Kannada
  • Marathi
  • Tamil
  • Telugu
  • Bengali
  • Gujarati
  • Malayalam
Need another language? Contact support to request additional language support.

Keyword Boosting

Deepgram supports keyword boosting to improve recognition of specific terms like company names, product names, and industry-specific jargon.