Skip to content

Tutorial 9: Multi-LLM Code Reviews

Learn to leverage multiple AI models for comprehensive code reviews, getting diverse perspectives on security, performance, architecture, and best practices. This powerful technique catches issues that single-model reviews might miss.

Learning Objectives

By the end of this tutorial, you'll:

  • ✅ Understand multi-LLM review benefits
  • ✅ Configure different review focuses
  • ✅ Interpret consensus and disagreements
  • ✅ Act on review recommendations
  • ✅ Create custom review workflows

Prerequisites

  • Completed previous tutorials
  • A project with some code
  • 60 minutes of time

Why Multi-LLM Reviews?

Different AI models have different strengths:

  • GPT-4: Architecture and design patterns
  • Claude: Code quality and readability
  • Gemini: Security and performance
  • Specialized Models: Domain-specific insights

Combined, they provide comprehensive analysis.

Part 1: Basic Multi-LLM Review

Simple Review

In your project:

bash
/review

This triggers:

  1. Code analysis across multiple models
  2. Consensus building on findings
  3. Prioritized recommendations
  4. Actionable suggestions

Understanding the Output

🔍 Multi-LLM Code Review

## Consensus Issues (All Models Agree)
⚠️ HIGH: SQL Injection vulnerability in api/search.ts:45
   - Direct string concatenation in query
   - Suggested fix: Use parameterized queries
   
⚠️ MEDIUM: Missing error handling in services/payment.ts:78
   - Unhandled promise rejection
   - Suggested fix: Add try-catch block

## Model-Specific Insights

### GPT-4 (Architecture)
- Consider extracting business logic from controllers
- Repository pattern would improve testability
- Current coupling score: 7.2/10

### Claude (Readability)
- Function getUserData() is too complex (cyclomatic: 15)
- Variable names could be more descriptive
- Documentation coverage: 45%

### Gemini (Performance)
- Database queries in loop at handlers/sync.ts:123
- Missing indexes on frequently queried fields
- Potential memory leak in websocket handler

Part 2: Focused Reviews

Security-Focused Review

bash
/review --security

Models focus on:

  • Injection vulnerabilities
  • Authentication flaws
  • Data exposure risks
  • Encryption issues
  • OWASP top 10

Example output:

🔒 Security-Focused Review

## Critical Security Issues

### Authentication Bypass
Location: middleware/auth.ts:34
```typescript
// Vulnerable code
if (user.role === 'admin' || user.id == adminId) {
  // Type coercion vulnerability!
}

Fix: Use strict equality (===)

Exposed Sensitive Data

Location: api/users/route.ts:67

  • Password hashes returned in API response
  • Remove sensitive fields before sending

Missing Rate Limiting

  • No rate limiting on login endpoint
  • Brute force attacks possible
  • Implement rate limiting middleware

### Performance-Focused Review

```bash
/review --performance

Focuses on:

  • Query optimization
  • Caching opportunities
  • Bundle sizes
  • Runtime performance
  • Memory usage

Architecture Review

bash
/review --architecture

Analyzes:

  • Design patterns
  • Code organization
  • Dependency management
  • Scalability concerns
  • Maintainability

Part 3: Consensus Mechanisms

Understanding Agreement Levels

bash
/review --show-consensus

Output shows:

## Consensus Analysis

### Strong Agreement (3/3 models)
✅ Need input validation on user endpoints
✅ Database connection pool not configured
✅ Missing unit tests for payment service

### Partial Agreement (2/3 models)
⚡ Consider caching for getProducts()
⚡ Extract magic numbers to constants
⚡ Add logging to error paths

### Single Model Concerns
🤔 GPT-4: Consider event sourcing pattern
🤔 Claude: Rename variables for clarity
🤔 Gemini: Optimize image loading

Confidence Scores

Each finding includes confidence:

  • High (90%+): Clear issue, definite fix
  • Medium (70-89%): Likely issue, suggested fix
  • Low (50-69%): Potential issue, investigate

Part 4: Advanced Review Patterns

Pattern 1: Incremental Reviews

Review only changes:

bash
# Review changes since last commit
/review --since HEAD~1

# Review specific files
/review src/api/**.ts --performance

# Review PR changes
/review --pr

Pattern 2: Custom Review Profiles

Create .orchestre/review-profiles.json:

json
{
  "pre-deploy": {
    "focus": ["security", "performance", "errors"],
    "threshold": "medium",
    "models": ["gpt-4", "gemini"],
    "fail-on": ["critical", "high"]
  },
  "daily": {
    "focus": ["architecture", "maintainability"],
    "threshold": "low",
    "models": ["all"]
  }
}

Use profiles:

bash
/review --profile pre-deploy

Pattern 3: Review Automation

Create .orchestre/commands/review-before-deploy.md:

markdown
# /review-before-deploy

Comprehensive review before production deployment.

## Prompt

Perform multi-stage review:

1. **Security Review**
   /review --security --threshold high

2. **Performance Review**  
   /review --performance --show-metrics

3. **Error Handling Review**
   /review --focus "error handling, logging"

4. **Dependencies Review**
   - Check for vulnerabilities
   - Verify licenses
   - Identify outdated packages

5. **Production Readiness**
   - Environment variables
   - Error tracking setup
   - Monitoring configuration
   - Backup procedures

Generate deployment checklist based on findings.

Part 5: Acting on Reviews

Prioritization Strategy

bash
/orchestrate "Fix issues from code review, prioritized by impact"

Creates plan like:

## Review Remediation Plan

### Immediate (Security Critical)
1. Fix SQL injection in search
2. Add authentication to admin endpoints
3. Remove exposed API keys

### Next Sprint (Performance)
4. Implement caching layer
5. Optimize database queries
6. Add connection pooling

### Technical Debt (Architecture)
7. Refactor to repository pattern
8. Extract business logic
9. Improve test coverage

Automated Fixes

Some issues can be auto-fixed:

bash
# Auto-fix simple issues
/execute-task "Apply automated fixes from review"

This handles:

  • Code formatting
  • Simple security fixes
  • Basic optimizations
  • Import organization

Review Validation

After fixes:

bash
# Verify fixes
/review --validate-fixes

Shows:

## Fix Validation

✅ SQL injection: FIXED
✅ Missing error handling: FIXED
⚠️ Performance optimization: PARTIAL
   - Query optimization done
   - Caching still pending
❌ Architecture refactor: NOT STARTED

Real-World Example

E-commerce Checkout Review

bash
/review src/checkout --security --performance

Findings:

## Multi-LLM Review: Checkout System

### 🔴 Critical Issues

1. **Payment Token Exposure**
   - Model: Gemini (Security)
   - Confidence: 95%
   - Location: checkout/payment.ts:134
   ```typescript
   // BAD: Token in frontend
   const token = await stripe.createToken(card)
   
   // GOOD: Token on backend only
   const { clientSecret } = await api.createPaymentIntent()
  1. Race Condition in Inventory
    • Models: GPT-4, Claude (Consensus)
    • Confidence: 88%
    • Location: checkout/inventory.ts:78
    • Issue: No transaction isolation
    • Fix: Use database transactions

🟡 Performance Concerns

  1. Sequential API Calls
    • Model: Gemini (Performance)
    • Impact: +800ms latency
    typescript
    // Current: Sequential
    const shipping = await getShipping()
    const tax = await getTax()
    const total = await getTotal()
    
    // Better: Parallel
    const [shipping, tax, total] = await Promise.all([
      getShipping(),
      getTax(), 
      getTotal()
    ])

🟢 Best Practice Suggestions

  1. Add Idempotency
    • Model: GPT-4 (Architecture)
    • Prevent duplicate charges
    • Add idempotency keys

## Custom Review Workflows

### Workflow 1: Pre-commit Hook

`.orchestre/commands/pre-commit-review.md`:

```markdown
# /pre-commit-review

Quick review before committing code.

## Prompt

Perform fast, focused review:

1. Check staged files only
2. Focus on:
   - Obvious bugs
   - Security issues
   - Broken imports
   - Console.logs left in

3. If issues found:
   - Block commit
   - Show fixes
   - Offer to apply

Keep under 5 seconds.

Workflow 2: PR Review Assistant

.orchestre/commands/pr-review.md:

markdown
# /pr-review

Comprehensive PR review with summary.

## Prompt

Review pull request:

1. **Change Analysis**
   - What changed and why
   - Impact assessment
   - Risk evaluation

2. **Code Quality**
   - Run full multi-LLM review
   - Focus on changed files
   - Check test coverage

3. **PR Summary**
   Generate markdown summary:
   - Key changes
   - Risks
   - Testing notes
   - Deployment considerations

Post as PR comment.

Best Practices

1. Regular Reviews

bash
# Daily architecture review
/review --architecture --threshold low

# Pre-deploy security
/review --security --threshold high

2. Focus on Consensus

Issues all models agree on are usually real problems.

3. Context Matters

bash
# Different standards for different code
/review src/experiments --relaxed
/review src/payments --strict

4. Track Progress

bash
# Save review results
/review --output review-report.md

# Track improvements
/review --compare-to last-week

5. Custom Training

Document your decisions:

markdown
<!-- CLAUDE.md -->
## Code Review Decisions
- We accept console.log in development files
- We require 80% test coverage
- We use functional components only

Troubleshooting

Conflicting Recommendations

When models disagree:

  1. Consider the context
  2. Check model specialties
  3. Get human judgment
  4. Document decision

Too Many Issues

Focus reviews:

bash
/review --critical-only
/review --focus "security"
/review --threshold high

False Positives

Train the system:

bash
/learn "This pattern is intentional because..."

Practice Exercises

1. Security Audit

bash
/review --security --output security-audit.md
/execute-task "Fix critical security issues"
/review --validate-fixes

2. Performance Sprint

bash
/review --performance
/orchestrate "Performance improvement sprint"
/performance-check --before-after

3. Architecture Refactor

bash
/review --architecture
/orchestrate "Refactor based on architecture review"
/review --compare-before-after

What You've Learned

✅ Leveraged multiple AI models for reviews ✅ Understood consensus mechanisms ✅ Created focused review workflows ✅ Acted on review findings ✅ Built custom review patterns

Next Steps

You now have AI-powered code quality assurance!

Continue to: Advanced: Complex Orchestration →

Enhance your reviews:

  • Create team-specific profiles
  • Integrate with CI/CD
  • Build review dashboards
  • Train on your patterns

Remember: Multi-LLM reviews are like having a team of expert reviewers available 24/7!

Built with ❤️ for the AI Coding community, by Praney Behl