Conversations
Import, analyze, and review conversations.
| Status | Client ID | Protocol ID | Conversation ID | Work Unit | Type | Call Status | Conv. Date | Imported | Analyzed | Duration | Score | Issues | Rev. | Audio |
|---|
Reports
Aggregated insights across analyzed conversations
Evaluators
Configure evaluators, scoring, and profiles.
How Evaluators Work
Each evaluator assesses a single aspect of a conversation during analysis. Enable only what's relevant — the system prompt is compiled automatically from all enabled evaluators in the order shown below.
- Toggle to enable or disable evaluators
- Drag ≡ handle to reorder priority
- Click pencil to edit details and rubric
- Category headers let you enable/disable entire groups
Framework Prompt
Edit the base prompt template that wraps all evaluators
<<SCHEMA_JSON>> - Where the JSON schema will be inserted
<<EVALUATOR_RUBRICS>> - Where the rubric definitions will be inserted
Scoring Scale
Configure rating levels and their numeric scores
What are Profiles?
Profiles are snapshots of your entire evaluation configuration. They let you switch between different setups instantly — useful when you need different configurations for different protocols, teams, or audit types.
- Enabled evaluators and their order
- Framework prompt template
- Scoring scale configuration
- Go to Evaluator List and enable/disable the evaluators you need
- Adjust Framework and Scoring if needed
- Come back here and click Create Profile
No profiles yet
Create a profile to save your current evaluation setup and switch between configurations quickly.
Alignment
Measure and improve the accuracy of your AI evaluators by comparing them against human reviews
How do you know your evaluators are accurate?
The Alignment section helps you answer this question. You manually review a sample of conversations, then compare your human ratings against the AI's ratings. The agreement rate tells you how much you can trust each evaluator.
Filter calibration metrics by protocol and/or dataset.
Agreement by Evaluator
Click an evaluator to view its detailed confusion matrix below.
No manual reviews yet.
To see calibration data, open a conversation's detail view and manually review the AI ratings.
⚠️ Below 75% agreement with 5+ reviews — consider reviewing the rubric
Confusion Matrix
Select an evaluator above to view its confusion matrix.
No Test Datasets Yet
Datasets are named collections of conversations that you use to test evaluator accuracy. Group conversations by protocol or any dimension you want to measure against.
After creating a dataset, add conversations to it, then manually review them in the Conversations section. Come back to the Metrics tab to see how AI compares to your human ratings.
Compare Cohorts
Compare two sets of conversations side by side to detect quality differences
A Cohort A
B Cohort B
No comparison yet
Select filters for two cohorts and click Compare Cohorts to see side-by-side results.
Settings
Configure model parameters.
Model Architecture
Configure the AI model and its behavior parameters
The AI model used for analysis. Flash models are faster and cheaper, Pro models are more capable.
Higher values make output more random and creative. Lower values make it more focused and deterministic.
Controls reasoning depth for 2.5 models. Higher budgets may improve quality for complex analysis tasks.
Batch Processing
Configure how conversations are processed in bulk
Number of conversations to analyze simultaneously. Higher values speed up batch analysis but use more API quota.
Transcription
Extract raw transcriptions from audio using multimodal AI
When enabled, audio files will be transcribed during pipeline extraction. This adds processing time and API costs.
The multimodal model used to transcribe audio. This can be different from the analysis model.
Flash models are faster and recommended for transcription tasks.
Instructions for the model on how to transcribe the audio. Customize to adjust speaker labels, formatting, or language handling.