Overview
Callkaro AI provides flexibility in choosing the right combination of LLM Model, Voice Provider, and Transcriber for your AI agents.LLM Models
Core intelligence powering your agent’s conversation and decision-making capabilities
Voice Providers
Text-to-speech engines that give your agent its unique voice and personality
Transcribers
Speech-to-text engines that convert customer speech into text for processing
LLM Models
The Language Learning Model is the brain of your AI agent. It processes the conversation, makes decisions, and generates appropriate responses based on your system prompt.Available Models
OpenAI Models
gpt-4o
gpt-4o
OpenAI’s flagship model with superior reasoning capabilities, multimodal understanding, and excellent performance across diverse tasks.Realtime Compatible: Yes (use
gpt-4o-realtime-preview)gpt-4o-mini
gpt-4o-mini
Compact version of GPT-4o offering excellent balance between cost and performance.Realtime Compatible: Yes (use
gpt-4o-mini-realtime-preview)gpt-4.1
gpt-4.1
Recommended for most use cases. Advanced GPT-4 variant with improved performance and reliability.
gpt-4.1-mini
gpt-4.1-mini
Recommended for most use cases. Optimized mini variant offering strong performance.
gpt-4.1-nano
gpt-4.1-nano
Ultra-lightweight model for simple, straightforward interactions.
Open Source Models
llama-3.3-70b-versatile
llama-3.3-70b-versatile
Large parameter Meta model offering versatile performance across various tasks.
llama-4-maverick-17b-128e-instruct
llama-4-maverick-17b-128e-instruct
Efficiency-optimized Llama 4 variant with extended context window.
llama-4-scout-17b-16e-instruct
llama-4-scout-17b-16e-instruct
Fast and affordable Llama 4 model for standard conversational tasks.
llama-3.1-8b-instant
llama-3.1-8b-instant
Lightweight, fast model for quick responses.
gemma2-9b-it
gemma2-9b-it
Google’s Gemma model optimized for instruction-following and conversation.
Realtime Models
gpt-4o-realtime-preview
gpt-4o-realtime-preview
Premium realtime model with ultra-low latency and natural conversation flow.Features:
- Sub-200ms response time
- Natural interruptions and turn-taking
- Real-time voice streaming
- Advanced emotion detection
gpt-4o-mini-realtime-preview
gpt-4o-mini-realtime-preview
Cost-effective realtime model for most voice applications.Features:
- Low latency responses
- Real-time voice streaming
- Natural conversation flow
Model Parameters
Temperature
Range: 0.0 to 1.0 (Default: 0.8) Temperature controls the randomness and creativity of the model’s responses.| Value | Behavior |
|---|---|
| 0.0 - 0.3 | Deterministic, focused, predictable |
| 0.4 - 0.7 | Balanced, natural, consistent |
| 0.8 - 1.0 | Creative, varied, spontaneous |
Voice Providers
Voice providers convert your agent’s text responses into natural-sounding speech.Available Providers
- Cartesia
- Eleven Labs
- Sarvam
- Azure
- OpenAI
Cartesia offers high-quality, low-latency voice synthesis with extensive multilingual support and emotion control.
Features
- Ultra-low latency (< 300ms)
- Extensive Indian language support
- Emotion and style control
- Multiple voice models (sonic, sonic-turbo, sonic-2, sonic-3)
Available Voices
English (Indian Accent)
- Janvi - Slower, conversational female voice (customer support, hotel reception)
- Kiara - Versatile, engaging female voice (commercials, narrations, promos)
- Aditi - Slower female voice (commercials, narrations)
- Devansh - Friendly, neutral male voice (call center support)
- Neil - Clear and crisp male voice (customer support, sales, reception)
- Indian Lady - Young, rich, curious voice (narrator, fictional character)
- Indian Man - Smooth male voice (narrator)
Hindi & Hinglish
- Apoorva - Warm, friendly female (Hinglish sales, commercials)
- Ananya - Warm, friendly female (Hinglish sales)
- Hinglish Speaking Woman - Versatile bilingual voice
- Ishan - Conversational male (Hinglish sales, support)
- Ayush - Confident young male (Hindi demos, instructions)
- Rupali - Firm young female (Hindi natural conversation)
- Aadhya - Slower Hindi female conversational voice
- Amit - Calm, clear Hindi male (narration, conversation)
- Indian Conversational Woman - Warm feminine voice
- Parvati - Young, friendly female (customer support)
- Mihir - Deeper toned male (casual conversation, support)
- Hindi Reporter Man - Clear, authoritative (news, documentaries)
- Hindi Narrator Man - Warm, authoritative (audiobooks, documentaries)
Regional Languages
- Prakash (Kannada) - Instructor voice
- Divya (Kannada) - Joyful narrator
- Suresh (Marathi) - Instruction voice
- Anika (Marathi) - Enthusiastic seller
- Vikram (Telugu) - Folk narrator
- Sindhu (Telugu) - Conversational partner
- Amit (Gujarati) - Sports student
- Isha (Gujarati) - Learner voice
American Accent
- Brooke - Friendly, natural female
- Wise Lady - Authoritative narrator
- Corinne - Smooth, conversational (phone calls, support)
- Cathy - Enthusiastic coder
- Friendly Sidekick - Supportive male (games, videos)
Voice Models
- sonic-3 - Latest model, best quality
- sonic-turbo - Optimized for speed
- sonic-2 - Stable, reliable
- sonic - Original model
Voice Parameters
Speed (-1.0 to 1.0)
Adjusts speaking rate relative to normal.Emotion Control
Cartesia supports emotion control allowing you to adjust:- Curiosity
- Positivity
- Surprise
- Anger
- Sadness
Transcribers
Transcribers convert customer speech into text that the LLM can process.Available Providers
- Deepgram
- Groq
- Sarvam
- Azure
- Eleven Labs
Industry-leading speech recognition with high accuracy and extensive model options.
Features
- Highest accuracy for English
- Domain-specific models
- Multilingual support
- Keyword boosting
Models
Nova Series
- nova-3-general - Latest general-purpose model, best overall accuracy
- nova-3 - Advanced multi-language support
- nova-3-medical - Optimized for medical terminology
- nova-2-general - Stable, reliable general-purpose
- nova-2-phonecall - Optimized for phone call quality
- nova-2-meeting - Best for meetings and conference calls
- nova-2-conversationalai - Optimized for AI conversations
- nova-2-medical - Medical terminology support
- nova-2-finance - Financial terminology support
- nova-2-voicemail - Voicemail optimization
- nova-2-drivethru - Drive-through environments
- nova-2-automotive - In-vehicle environments
Other Models
- flux-general - Fast, lightweight option
- voicemail - Voicemail-specific
Supported Languages
- Hindi
- English
- Kannada
- Marathi
- Tamil
- Telugu
- Bengali
- Gujarati
- Malayalam
Need another language? Contact support to request additional language support.