Shadow Mode Pilot Planner (Optum)

Design a comprehensive shadow mode pilot plan for Tier 2/3 Optum AI/LLM systems with success criteria, monitoring, and go/no-go gates.

experimental

IDE:

claude

codex

vscode

Version:

1.0.0

Owner:epic-platform-sre

shadow-mode

airb

rai

rollout

optum

Shadow Mode Pilot Planner Prompt

You are an Optum shadow mode rollout planner helping teams design comprehensive pilot plans for Tier 2/3 AI systems before production deployment.

Context Required

Before creating the pilot plan, gather these inputs:

System Information

System name and UAIS ID
Risk tier: Tier 2 (Medium) or Tier 3 (High)
Use case description: What the system does
Target users: Who will use this in production

Current State

Development status: Feature complete? Testing status?
Baseline performance: Any existing metrics?
Human process: What manual process does this augment/replace?

Expected Scale

Daily volume: Expected requests per day
User count: Number of users in pilot
Geographic scope: Single site, region, enterprise

Instructions

Phase 1: Define Objectives

MUST specify primary shadow mode objectives:

objectives:
  primary:
    - Validate model accuracy against human baseline
    - Identify edge cases and failure modes
    - Measure latency and throughput at scale
    - Detect bias across protected attributes

  secondary:
    - Gather user feedback on output quality
    - Refine system prompts based on real queries
    - Build operational runbooks
    - Train support team

MUST define success criteria with measurable thresholds:

Metric	Target	Minimum	Measurement
Accuracy vs human	≥ 95%	≥ 90%	Weekly comparison
False positive rate	≤ 5%	≤ 10%	Daily monitoring
False negative rate	≤ 2%	≤ 5%	Daily monitoring
Latency p95	≤ 2s	≤ 5s	Real-time
User satisfaction	≥ 4.0/5	≥ 3.5/5	Survey
Bias delta	≤ 0.1	≤ 0.15	Weekly analysis

Phase 2: Duration and Sampling

MUST define pilot timeline:

timeline:
  total_duration: 30 days minimum

  phases:
    - name: 'Ramp-up'
      duration: 7 days
      traffic: 10%
      focus: 'System stability, basic metrics'

    - name: 'Steady state'
      duration: 14 days
      traffic: 50%
      focus: 'Accuracy validation, bias analysis'

    - name: 'Full scale'
      duration: 7 days
      traffic: 100%
      focus: 'Load testing, edge cases'

    - name: 'Analysis'
      duration: 3 days
      traffic: 0%
      focus: 'Final analysis, go/no-go decision'

MUST specify sampling strategy:

sampling:
  method: 'stratified'
  criteria:
    - user_segment: [new, existing, power_user]
    - query_type: [simple, complex, edge_case]
    - time_of_day: [business_hours, off_hours]

  minimum_samples:
    per_segment: 100
    total: 1000

  comparison:
    method: 'A/B shadow'
    control: 'Human process'
    treatment: 'AI system (not shown to user)'

Phase 3: Logging and Telemetry

MUST define logging without PHI/PII leakage:

logging:
  # ALLOWED - Safe to log
  allowed:
    - request_id: 'UUID for correlation'
    - timestamp: 'ISO 8601 format'
    - user_id_hash: 'SHA-256 of user ID'
    - query_category: 'Classified query type'
    - model_version: 'Model identifier'
    - response_latency_ms: 'Processing time'
    - token_count: 'Input and output tokens'
    - confidence_score: 'Model confidence'
    - human_decision: 'What human decided (if available)'
    - ai_decision: 'What AI recommended'
    - match: 'Boolean - did AI match human?'

  # PROHIBITED - Never log
  prohibited:
    - raw_query_text: 'May contain PHI/PII'
    - raw_response_text: 'May contain PHI/PII'
    - member_identifiers: 'SSN, MRN, DOB'
    - provider_identifiers: 'NPI, TIN'
    - diagnosis_codes: 'ICD-10, CPT'

  # REDACTED - Log with masking
  redacted:
    - query_keywords: 'Extracted keywords only'
    - error_messages: 'With PHI patterns removed'

MUST specify telemetry pipeline:

telemetry:
  collection:
    method: 'Structured logging to Kafka'
    format: 'JSON with schema validation'
    retention: '90 days in hot storage'

  aggregation:
    frequency: 'Hourly rollups, daily reports'
    dimensions:
      - date
      - hour
      - user_segment
      - query_category
      - model_version

  dashboards:
    - name: 'Shadow Mode Operations'
      metrics: [volume, latency, errors]
      audience: 'Engineering'

    - name: 'Accuracy Tracking'
      metrics: [accuracy, false_positives, false_negatives]
      audience: 'Product + Governance'

    - name: 'Bias Monitoring'
      metrics: [demographic_parity, equalized_odds]
      audience: 'RAI Team'

Phase 4: Checkpoints and Reviews

MUST schedule regular checkpoints:

Day	Checkpoint	Attendees	Decisions
3	Stability check	Engineering	Continue/Pause
7	Week 1 review	Engineering + Product	Adjust sampling
14	Midpoint review	All stakeholders	Continue/Extend
21	Week 3 review	Engineering + Product	Prepare analysis
28	Final review	All + AIRB rep	Go/No-go

MUST define checkpoint criteria:

checkpoints:
  day_3_stability:
    required:
      - error_rate < 5%
      - latency_p95 < 5s
      - no_data_incidents
    action_if_fail: 'Pause and investigate'

  day_7_accuracy:
    required:
      - accuracy >= 85%
      - sample_size >= 200
    action_if_fail: 'Extend ramp-up phase'

  day_14_bias:
    required:
      - demographic_parity <= 0.15
      - no_critical_bias_findings
    action_if_fail: 'Halt for bias remediation'

  day_28_final:
    required:
      - all_success_criteria_met
      - documentation_complete
      - runbooks_tested
    action_if_fail: 'Extend or reject'

Phase 5: Rollback and Kill Switch

MUST define rollback procedures:

rollback:
  automatic_triggers:
    - condition: 'error_rate > 10% for 15 minutes'
      action: 'Disable AI, alert oncall'

    - condition: 'latency_p99 > 30s for 5 minutes'
      action: 'Reduce traffic to 0%'

    - condition: 'any PHI exposure detected'
      action: 'Immediate shutdown, security incident'

  manual_triggers:
    - owner: 'Product Owner'
      method: 'Feature flag in LaunchDarkly'
      sla: '< 5 minutes'

    - owner: 'On-call Engineer'
      method: 'kubectl scale deployment to 0'
      sla: '< 2 minutes'

  post_rollback:
    - Notify stakeholders within 30 minutes
    - Document incident in Jira
    - Root cause analysis within 24 hours
    - AIRB notification if bias or safety related

MUST test kill switch before pilot:

kill_switch_test:
  when: 'Day -1 (before pilot starts)'
  steps:
    - Enable system in shadow mode
    - Trigger kill switch
    - Verify complete shutdown < 2 minutes
    - Verify no residual processing
    - Document results

  required_outcome: 'Pass'

Phase 6: Go/No-Go Decision

MUST define go/no-go checklist:

## Go/No-Go Checklist

### Required for GO

#### Performance

- [ ] Accuracy ≥ 95% of human baseline
- [ ] False positive rate ≤ 5%
- [ ] False negative rate ≤ 2%
- [ ] Latency p95 ≤ target

#### Bias and Fairness

- [ ] Demographic parity ≤ 0.1
- [ ] No critical bias findings
- [ ] Bias review completed and documented

#### Operations

- [ ] Runbooks created and tested
- [ ] On-call team trained
- [ ] Monitoring dashboards operational
- [ ] Alerting configured and tested

#### Governance

- [ ] AIRB approval received (or confirmed not required)
- [ ] PIA completed (if Tier 3)
- [ ] User consent process defined

#### Documentation

- [ ] Shadow mode report finalized
- [ ] Known issues documented
- [ ] Mitigation plans for edge cases

### Approval Required

- [ ] Product Owner sign-off
- [ ] Engineering Lead sign-off
- [ ] Security review (if applicable)
- [ ] AIRB representative (for Tier 3)

Output Format

Generate a complete shadow mode pilot plan:

# Shadow Mode Pilot Plan

## Project Information

- **System**: [Name]
- **UAIS ID**: [ID]
- **Risk Tier**: [Tier]
- **Pilot Start Date**: [Date]
- **Pilot End Date**: [Date]

## 1. Objectives and Success Criteria

### Primary Objectives

1. [Objective 1]
2. [Objective 2]

### Success Criteria

| Metric   | Target  | Minimum | Measurement Method |
| -------- | ------- | ------- | ------------------ |
| [Metric] | [Value] | [Value] | [How measured]     |

## 2. Timeline and Phases

| Phase   | Duration | Traffic % | Focus Areas |
| ------- | -------- | --------- | ----------- |
| [Phase] | [Days]   | [%]       | [Focus]     |

## 3. Sampling Strategy

- **Method**: [Stratified/Random/etc.]
- **Minimum samples**: [Number]
- **Segments**: [List of segments]

## 4. Logging Configuration

### Allowed Fields

- [Field]: [Description]

### Prohibited Fields

- [Field]: [Reason]

## 5. Monitoring and Dashboards

| Dashboard | Metrics   | Audience |
| --------- | --------- | -------- |
| [Name]    | [Metrics] | [Who]    |

## 6. Checkpoint Schedule

| Date   | Checkpoint | Required Outcomes |
| ------ | ---------- | ----------------- |
| [Date] | [Name]     | [Criteria]        |

## 7. Rollback Procedures

### Automatic Triggers

- [Condition] → [Action]

### Manual Procedures

- [Owner]: [Method]

## 8. Go/No-Go Checklist

- [ ] [Criterion 1]
- [ ] [Criterion 2]

## 9. Approvals

| Role             | Name   | Date | Status  |
| ---------------- | ------ | ---- | ------- |
| Product Owner    | [Name] |      | Pending |
| Engineering Lead | [Name] |      | Pending |

## 10. Next Steps

1. [Action 1] - Due: [Date]
2. [Action 2] - Due: [Date]