Personalization Systems for Outbound

Outbound Agent QA Framework

Agent output is only as good as the process for catching failures before they send. A QA framework defines what to check, how to score it, and what happens when something does not pass — so quality holds as volume scales.

Why QA Matters More in Agentic Systems

When a human SDR prepares outreach manually, quality problems surface one at a time — a bad draft is caught before it sends. In an agentic system, the same quality failure pattern can propagate across an entire batch before it is detected. A QA framework is the mechanism that prevents systematic failures from scaling with volume.

The 4 QA Dimensions

Dimension 1 — Targeting Accuracy

Does this account fit the ICP? Is the contact the right role and seniority? Is there prior outreach history that should block this send?

  • Account firmographics match ICP criteria (industry, size, stage)
  • Contact title and seniority match target persona
  • No duplicate outreach to this contact within the exclusion window
  • Account is not on the blocked or do-not-contact list

Failure routing: Remove from batch. Log reason. Flag for targeting rule review if failure rate is high.

Dimension 2 — Research Quality

Is there a real, recent signal? Does it meet the minimum tier threshold? Is the signal date within the freshness window?

  • At least one Tier 1 or two Tier 2 signals present
  • Signal is within the max freshness window (typically 90 days for Tier 1)
  • Signal source is verified (not inferred or hallucinated)
  • Signal is actionable — it connects to a business context that your product addresses

Failure routing: Return to research queue. Do not draft until minimum signal threshold is met.

Dimension 3 — Draft Quality

Does the email connect the signal to the product correctly? Is the angle specific? Is the CTA clear and singular?

  • Opener references the signal specifically — not a generic observation
  • Body makes an explicit connection between the signal and the product value
  • No generic product features listed without account-specific context
  • Single CTA — no multiple asks in the same message
  • Email length within bounds (under 120 words for first touch)

Failure routing: Return to drafting queue with specific failure note. Track angle-failure rates for calibration.

Dimension 4 — Compliance Checks

Automated checks that run before any package reaches human review. Catch structural failures without human time.

  • No unsubscribed or opted-out contacts in the send list
  • No blocked domains or competitor accounts
  • Contact email is valid format and domain is not flagged as disposable
  • No prohibited content (claims, guarantees, specific regulatory language)

Failure routing: Auto-remove. Log. Do not surface to human review queue.

QA Sampling Strategy by Stage

StageReview CoverageTrigger to Increase Coverage
Pilot (first batch)100%Always 100% — this is the calibration phase
Validated campaign, same ICP10–20% spot-checkReply rate drops more than 1.5% from baseline
New signal type or angle100% until validatedAfter 30+ accounts with passing quality
New ICP segment100% until validatedAfter 30+ accounts with passing quality
Auto-send enabled10% minimum, random sampleAny batch with failure rate above 5%

Related Reading

Frequently Asked Questions

What should a QA framework for outbound agents include?

A complete framework covers four dimensions: targeting accuracy (is this the right account and contact?), research quality (is the signal real, recent, and relevant?), draft quality (does the email connect the signal to the product correctly?), and compliance (no prohibited content, opt-out history respected). Each dimension should have a pass/fail criterion, not just a subjective judgment.

How often should you review agent-generated outbound?

Review 100% of output during validation pilots. Once quality is proven, structured spot-check sampling — typically 10–20% of batch volume — is sufficient for most campaigns. Any new signal type, new ICP segment, or new message angle should revert to 100% review until that configuration is validated.

What is the most common failure in agent-generated outbound?

Signal-to-angle disconnect: the agent surfaces a real signal but drafts a message that does not connect it to the product's value. The opener references the signal correctly, but the product pitch is generic. This is the most common quality failure and is detectable with a simple test — does the body of the email make the signal relevant to what you sell?

How do you handle QA failures in batch outbound?

Flag the account back to the research queue with a specific failure note (e.g., 'no Tier 1 signal found', 'draft angle generic'). Do not send low-quality output to increase volume. Track failure rates by failure type — if signal-not-found failures are high, the targeting is too aggressive; if draft-angle failures are high, the angle-mapping needs calibration.

Should QA be a human review or can it be automated?

Both. Automated checks can flag obvious failures: empty research fields, contact role mismatch, email length violations, blocked domain sends. Subjective quality — angle specificity, product connection — requires human judgment. Most teams use automated pre-screening to route high-confidence packages to spot-check and low-confidence packages to full review.

Build Outbound Quality You Can Trust at Scale

Ayegent surfaces research quality scores, draft pass/fail indicators, and batch-level metrics — so your QA process catches failures before they reach the send queue.