What is the One-Model-Per-Test Guarantee?

Every test run uses exactly ONE reasoning model (GPT-5 Mini) from start to finish. There is no model switching or fallback, ensuring consistent behavior and predictable costs.

What are the AI usage limits for each tier?

Limits per test run: Guest (25 steps), Starter (35 steps), Indie (55 steps), Pro (75 steps), Agency (100 steps). Vision checks usually range from 1 to 10 depending on tier.

Does Rihario retry failed AI calls?

Yes. Rihario uses a Same-Model Retry Envelope: if a transient error (like 429 or network timeout) occurs, it retries once with the exact same model and prompt after a 200-400ms delay.

Model Limits & Guarantees

Explicit, honest information about how AI models are used in Rihario. This document explains what guarantees exist, what happens when things fail, and what the models can and cannot do.

One-Model-Per-Test Rule

Guarantee: Every test run uses exactly ONE reasoning model (GPT-5 Mini) from start to finish.

What This Means

No model switching - The same model is used throughout the entire test
No fallback models - If the model fails, the test fails (with clear error messages)
Consistent behavior - Same model = same behavior patterns
Same for all users - Guest and registered tests use the same model

Why This Matters

Predictable behavior - You know what to expect
Consistent costs - No surprise expensive model calls
Easier debugging - One model to understand, not multiple
No model-specific quirks - Results aren't affected by switching between models

Model Architecture

GPT-5 Mini (Text/Reasoning)

Used for: All text-based reasoning, planning, and decision-making
When: Every step of every test
Retries: Yes, same-model retry (see Retry Policy below)
Fallback: No - if it fails, the test fails

GPT-4o (Visual Analysis)

Used for: Screenshot-based visual issue detection ONLY
When: Selectively (final steps, layout shifts, errors)
Retries: No - single attempt only
Fallback: No - if it fails, visual analysis is skipped

Retry Policy

Same-Model Retry Envelope

What it is: A resilience guard for transient API failures (network blips, rate limits).

How It Works

First attempt with GPT-5 Mini
If failure is retryable (429, network error, timeout):
- Wait 200-400ms (randomized backoff)
- Retry with SAME model, SAME prompt
Maximum 1 retry (2 total attempts)
If retry fails: Test fails with clear error message

What Is Retryable

429 (rate limit exceeded) - Too many requests
Network errors - Connection resets, timeouts, DNS failures
Timeouts - Request took too long

What Is NOT Retryable

400 (bad request) - Configuration issue, invalid API key or model name
401 (unauthorized) - API key is invalid or expired
Invalid responses - Malformed JSON, unexpected format

Guarantee

Same model used for retry (no model switching)
Same prompt used (deterministic behavior)
Maximum 1 retry (bounded latency - 200-400ms)

Tier-Based Usage Limits

To ensure fair usage and prevent abuse, each test run has specific limits based on your subscription tier. These limits apply to every single run.

Tier	Step Limit (LLM Calls)	Vision Limit (Screenshots)	Retry Limit
Guest	25 steps	1 check	1 retry
Starter	35 steps	1 check	1 retry
Indie	55 steps	3 checks	1 retry
Pro	75 steps	5 checks	1 retry
Agency	100 steps	10 checks	1 retry

Limit Definitions

Step Limit: The maximum number of actions (clicks, types, scrolls) the AI can take in one test. If reached, the test stops and is marked as "Exhausted" or "Completed" depending on outcome.
Vision Limit: How many times the AI can use GPT-4o analysis on a screenshot. Used for visual checks and complex debugging.

Token Budgets

Per-Call Limits

Every LLM call has a strict token budget to prevent uncontrolled growth:

Call Type	Token Budget	Purpose
Planning	3000	Test plan generation
Diagnosis	3000	UI diagnosis analysis
Testability	2500	Testability analysis
Action Generation	2000	Step-by-step actions
Cookie Banner	1500	Cookie banner detection
Error Analysis	2000	Error explanation
Self-Healing	2000	Alternative selector finding
Context Synthesis	2500	Multi-source context
Summary	2000	Final result summary

DOM Pruning Rules

To stay within token budgets, DOM snapshots are pruned:

Removed:

<script> tags and content
<style> tags and content
HTML comments
Excessive whitespace

Kept:

Interactive elements (buttons, inputs, links)
Visible content
Structural information

Truncation:

Deterministic (from start, preserving end)
Tag-boundary aware when possible
Never random

Context Limiting

History:

Limited to last 5 steps for action generation
Older steps are discarded

Elements:

Limited to 50-60 most relevant elements
Hidden elements filtered out
Prioritized by interactivity

DOM Snapshots:

Pruned before inclusion in prompts
Maximum length enforced per call type
Deterministic truncation

What Happens on API Failure

GPT-5 Mini Failures

Scenario 1: Transient Error (429, network)

Automatic retry after 200-400ms
If retry succeeds: Test continues
If retry fails: Test fails with error message

Scenario 2: Non-Retryable Error (400, 401)

No retry attempted
Test fails immediately
Clear error message logged

Error Messages

GPT-5 Mini API error (400): [details] - Check API key and model name
GPT-5 Mini API authentication failed (401) - Check OPENAI_API_KEY
GPT-5 Mini API rate limit exceeded (429) - Wait and retry
GPT-5 Mini call failed after 2 attempt(s): [details] - Both attempts failed

GPT-4o Failures

Behavior:

No retry attempted
Visual analysis is skipped
Test continues without visual issue detection
Warning logged

Why:

Visual analysis is optional/selective
Failures don't block test execution
Cost optimization (don't retry expensive calls)

What the Model CAN Do

✅ Generate test actions based on page context
✅ Analyze DOM structure and identify interactive elements
✅ Detect cookie banners and suggest dismissal strategies
✅ Explain test failures in plain English
✅ Find alternative selectors when primary fails
✅ Synthesize context from multiple sources (DOM, logs, errors)
✅ Perform testability analysis
✅ Generate structured test plans

What the Model CANNOT Do

❌ Guarantee 100% test success rate
❌ Handle all edge cases perfectly
❌ Work without valid API key
❌ Bypass rate limits
❌ Process unlimited context (token budgets enforced)
❌ Compare visual baselines (not implemented)
❌ Auto-fix code (read-only analysis)

Token Usage Guarantees

Guaranteed:

No single call exceeds its token budget
DOM pruning is deterministic (same input = same output)
Context is limited to prevent unbounded growth
Large DOMs don't cause unbounded latency

Not Guaranteed:

Exact token counts (estimates used)
Perfect pruning (some edge cases may slip through)
Zero token waste (conservative estimates used)

Cost Implications

GPT-5 Mini

Used for every reasoning step
Token budgets limit per-call costs
Retries add minimal cost (same model, same prompt)
No fallback model costs

GPT-4o

Used selectively (not every step)
Only for visual analysis
No retries (cost control)
Failures don't block tests

Optimization

DOM pruning reduces input tokens
History limiting reduces context size
Selective GPT-4o usage reduces visual analysis costs

Failure Handling Philosophy

Fail Fast

Non-retryable errors fail immediately
Clear error messages for debugging
No silent degradation

Resilient

Transient errors are retried once
Same model ensures consistency
Bounded retry latency (200-400ms)

Honest

Errors are logged with full context
No false success indicators
User knows exactly what failed and why

No Fallback Models

Why

Consistency: Same model = same behavior
Predictability: No model-specific quirks
Cost control: No expensive fallback calls
Simplicity: Easier to debug and maintain

What Happens Instead

Same-model retry for transient failures
Clear error messages for permanent failures
User can retry the test manually

Limitations

Known Limitations

Token budgets may truncate very large DOMs
Retry only handles transient failures
GPT-4o failures are silent (visual analysis skipped)
No baseline comparison for visual tests
Model responses are not cached between tests

Acceptable Trade-offs

Deterministic truncation (predictable behavior)
Bounded retries (controlled latency)
Selective visual analysis (cost optimization)
Fail-fast on permanent errors (clear feedback)

Support & Troubleshooting

Common Issues

1. "GPT-5 Mini API error (400)"

Check OPENAI_API_KEY is valid
Verify model name is 'gpt-5-mini'
Check API key has proper permissions

2. "GPT-5 Mini API authentication failed (401)"

OPENAI_API_KEY is invalid or expired
Regenerate API key in OpenAI dashboard

3. "GPT-5 Mini API rate limit exceeded (429)"

Wait a few minutes
Check OpenAI usage limits
Retry the test

4. Test fails immediately after starting

Check API key is set in .env file
Verify network connectivity
Check OpenAI service status

5. Large DOMs causing issues

DOM pruning should handle this automatically
Token budgets prevent unbounded growth
If issues persist, check DOM size in logs

Summary

One model per test: GPT-5 Mini for all reasoning
Retry policy: Same-model retry, max 1 retry, 200-400ms backoff
Token budgets: Strict per-call limits with DOM pruning
Failure handling: Fail-fast with clear errors, retry transient failures
No fallbacks: Consistency over complexity
Honest limitations: Documented and acknowledged

This architecture prioritizes predictability, cost control, and clear failure modes over complex fallback strategies.