Conversations

Import, analyze, and review conversations.

0 Conversations

- Avg Score

0% Pass Rate

Numeric Scores

score_0_to_3 values, rounded

0 0

1 0

2 0

3 0

Rating Classification

Evaluator-assigned ratings

Pass

0

Minor

0

Major

0

Critical

0

Calibration

Manual vs AI ratings

0 Compared

0% Agreement

0 Disagreements

No manual reviews yet.

	Status	Client ID	Protocol ID	Conversation ID	Work Unit	Type	Call Status	Conv. Date	Imported	Analyzed	Duration	Score	Issues	Rev.	Audio

Showing 1-20 of 0

Page of 1

Show:

Reports

Aggregated insights across analyzed conversations

Recurring Issues Report

No issues report generated yet.

Click "Generate Report" to analyze recurring patterns across your evaluated conversations.

Evaluators

Configure evaluators, scoring, and profiles.

How Evaluators Work

Each evaluator assesses a single aspect of a conversation during analysis. Enable only what's relevant — the system prompt is compiled automatically from all enabled evaluators in the order shown below.

Toggle to enable or disable evaluators
Drag ≡ handle to reorder priority
Click pencil to edit details and rubric
Category headers let you enable/disable entire groups

30 / 30 enabled

0 selected

Loading evaluators...

Framework Prompt

Edit the base prompt template that wraps all evaluators

Required placeholders: <<SCHEMA_JSON>> - Where the JSON schema will be inserted <<EVALUATOR_RUBRICS>> - Where the rubric definitions will be inserted

Scoring Scale

Configure rating levels and their numeric scores

Score Range: 1-4

The scoring scale defines how ratings map to numeric scores. Higher scores indicate better performance. The "unknown" rating always has a null score.

What are Profiles?

Profiles are snapshots of your entire evaluation configuration. They let you switch between different setups instantly — useful when you need different configurations for different protocols, teams, or audit types.

What gets saved:

Enabled evaluators and their order
Framework prompt template
Scoring scale configuration

How to create a profile:

Go to Evaluator List and enable/disable the evaluators you need
Adjust Framework and Scoring if needed
Come back here and click Create Profile

No profiles yet

Create a profile to save your current evaluation setup and switch between configurations quickly.

Evaluators

Framework

Scoring

Alignment

Measure and improve the accuracy of your AI evaluators by comparing them against human reviews

Loading calibration data...

How do you know your evaluators are accurate?

The Alignment section helps you answer this question. You manually review a sample of conversations, then compare your human ratings against the AI's ratings. The agreement rate tells you how much you can trust each evaluator.

1

Review conversations Go to Conversations, open any analyzed conversation, and submit your human ratings for each evaluator

2

Organize into datasets Use the Datasets tab to group reviewed conversations into named test sets

3

Analyze agreement Come back here to see which evaluators agree with your judgments and which ones need better rubrics

-

Manual Reviews

-

Agreement Rate

-

Disagreements

Protocol Dataset

Filter calibration metrics by protocol and/or dataset.

Agreement by Evaluator

Click an evaluator to view its detailed confusion matrix below.

No manual reviews yet.

To see calibration data, open a conversation's detail view and manually review the AI ratings.

⚠️ Below 75% agreement with 5+ reviews — consider reviewing the rubric

Confusion Matrix

Select an evaluator above to view its confusion matrix.

	Human Rating
Pass	-	-	-	-
Minor	-	-	-	-
Major	-	-	-	-
Critical	-	-	-	-

No Test Datasets Yet

Datasets are named collections of conversations that you use to test evaluator accuracy. Group conversations by protocol or any dimension you want to measure against.

After creating a dataset, add conversations to it, then manually review them in the Conversations section. Come back to the Metrics tab to see how AI compares to your human ratings.

Compare Cohorts

Compare two sets of conversations side by side to detect quality differences

For meaningful comparisons, both cohorts should be analyzed using the same evaluator profile and rubrics. If the evaluator configuration changed between analyses, consider re-analyzing both sets first.

A Cohort A

Client IDs

Protocols

From

To

B Cohort B

Client IDs

Protocols

From

To

Comparing cohorts...

No comparison yet

Select filters for two cohorts and click Compare Cohorts to see side-by-side results.

Settings

Configure model parameters.

Model Architecture

Configure the AI model and its behavior parameters

Model

The AI model used for analysis. Flash models are faster and cheaper, Pro models are more capable.

Fast Cost-effective

Temperature

Higher values make output more random and creative. Lower values make it more focused and deterministic.

0.0

Thinking Budget

Controls reasoning depth for 2.5 models. Higher budgets may improve quality for complex analysis tasks.

Batch Processing

Configure how conversations are processed in bulk

Parallel Analyses

Number of conversations to analyze simultaneously. Higher values speed up batch analysis but use more API quota.

Transcription

Extract raw transcriptions from audio using multimodal AI

Enable Transcription

When enabled, audio files will be transcribed during pipeline extraction. This adds processing time and API costs.

Transcription Model

The multimodal model used to transcribe audio. This can be different from the analysis model.

Flash models are faster and recommended for transcription tasks.

Transcription Prompt

Instructions for the model on how to transcribe the audio. Customize to adjust speaker labels, formatting, or language handling.

Active instructions used for audio-to-text.

0 chars

This will restore all settings to their default values.

Conversation Arena

How Evaluators Work

Framework Prompt

Scoring Scale

What are Profiles?

No profiles yet

How do you know your evaluators are accurate?

Agreement by Evaluator

Confusion Matrix

How to read this matrix:

No Test Datasets Yet

Test Datasets

A Cohort A

B Cohort B

No comparison yet

Model Architecture

Batch Processing

Transcription

Transcription Prompt Editor

Reset to Factory Settings?

Unsaved Changes

Delete 0 Conversation(s)?

Create Dataset

Add Conversations

Delete Profile

Overwrite Profile

Are you sure?

Edit Evaluator

Compiled Prompt Preview

Create Profile

Manage Categories

Edit Category

Import Conversations

Define Search Criteria

Select Conversations

How many conversations to load for preview?

Importing...