Tutorial 9: Multi-LLM Code Reviews

Learn to leverage multiple AI models for comprehensive code reviews, getting diverse perspectives on security, performance, architecture, and best practices. This powerful technique catches issues that single-model reviews might miss.

Learning Objectives

By the end of this tutorial, you'll:

✅ Understand multi-LLM review benefits
✅ Configure different review focuses
✅ Interpret consensus and disagreements
✅ Act on review recommendations
✅ Create custom review workflows

Prerequisites

Completed previous tutorials
A project with some code
60 minutes of time

Why Multi-LLM Reviews?

Different AI models have different strengths:

GPT-4: Architecture and design patterns
Claude: Code quality and readability
Gemini: Security and performance
Specialized Models: Domain-specific insights

Combined, they provide comprehensive analysis.

Part 1: Basic Multi-LLM Review

Simple Review

In your project:

bash

/review

This triggers:

Code analysis across multiple models
Consensus building on findings
Prioritized recommendations
Actionable suggestions

Understanding the Output

🔍 Multi-LLM Code Review

## Consensus Issues (All Models Agree)
⚠️ HIGH: SQL Injection vulnerability in api/search.ts:45
   - Direct string concatenation in query
   - Suggested fix: Use parameterized queries
   
⚠️ MEDIUM: Missing error handling in services/payment.ts:78
   - Unhandled promise rejection
   - Suggested fix: Add try-catch block

## Model-Specific Insights

### GPT-4 (Architecture)
- Consider extracting business logic from controllers
- Repository pattern would improve testability
- Current coupling score: 7.2/10

### Claude (Readability)
- Function getUserData() is too complex (cyclomatic: 15)
- Variable names could be more descriptive
- Documentation coverage: 45%

### Gemini (Performance)
- Database queries in loop at handlers/sync.ts:123
- Missing indexes on frequently queried fields
- Potential memory leak in websocket handler

Part 2: Focused Reviews

Security-Focused Review

bash

/review --security

Models focus on:

Injection vulnerabilities
Authentication flaws
Data exposure risks
Encryption issues
OWASP top 10

Example output:

🔒 Security-Focused Review

## Critical Security Issues

### Authentication Bypass
Location: middleware/auth.ts:34
```typescript
// Vulnerable code
if (user.role === 'admin' || user.id == adminId) {
  // Type coercion vulnerability!
}

Fix: Use strict equality (===)

Exposed Sensitive Data

Location: api/users/route.ts:67

Password hashes returned in API response
Remove sensitive fields before sending

Missing Rate Limiting

No rate limiting on login endpoint
Brute force attacks possible
Implement rate limiting middleware


### Performance-Focused Review

```bash
/review --performance

Focuses on:

Query optimization
Caching opportunities
Bundle sizes
Runtime performance
Memory usage

Architecture Review

bash

/review --architecture

Analyzes:

Design patterns
Code organization
Dependency management
Scalability concerns
Maintainability

Part 3: Consensus Mechanisms

Understanding Agreement Levels

bash

/review --show-consensus

Output shows:

## Consensus Analysis

### Strong Agreement (3/3 models)
✅ Need input validation on user endpoints
✅ Database connection pool not configured
✅ Missing unit tests for payment service

### Partial Agreement (2/3 models)
⚡ Consider caching for getProducts()
⚡ Extract magic numbers to constants
⚡ Add logging to error paths

### Single Model Concerns
🤔 GPT-4: Consider event sourcing pattern
🤔 Claude: Rename variables for clarity
🤔 Gemini: Optimize image loading

Confidence Scores

Each finding includes confidence:

High (90%+): Clear issue, definite fix
Medium (70-89%): Likely issue, suggested fix
Low (50-69%): Potential issue, investigate

Part 4: Advanced Review Patterns

Pattern 1: Incremental Reviews

Review only changes:

bash

# Review changes since last commit
/review --since HEAD~1

# Review specific files
/review src/api/**.ts --performance

# Review PR changes
/review --pr

Pattern 2: Custom Review Profiles

Create .orchestre/review-profiles.json:

json

{
  "pre-deploy": {
    "focus": ["security", "performance", "errors"],
    "threshold": "medium",
    "models": ["gpt-4", "gemini"],
    "fail-on": ["critical", "high"]
  },
  "daily": {
    "focus": ["architecture", "maintainability"],
    "threshold": "low",
    "models": ["all"]
  }
}

Use profiles:

bash

/review --profile pre-deploy

Pattern 3: Review Automation

Create .orchestre/commands/review-before-deploy.md:

markdown

# /review-before-deploy

Comprehensive review before production deployment.

## Prompt

Perform multi-stage review:

1. **Security Review**
   /review --security --threshold high

2. **Performance Review**  
   /review --performance --show-metrics

3. **Error Handling Review**
   /review --focus "error handling, logging"

4. **Dependencies Review**
   - Check for vulnerabilities
   - Verify licenses
   - Identify outdated packages

5. **Production Readiness**
   - Environment variables
   - Error tracking setup
   - Monitoring configuration
   - Backup procedures

Generate deployment checklist based on findings.

Part 5: Acting on Reviews

Prioritization Strategy

bash

/orchestrate "Fix issues from code review, prioritized by impact"

Creates plan like:

## Review Remediation Plan

### Immediate (Security Critical)
1. Fix SQL injection in search
2. Add authentication to admin endpoints
3. Remove exposed API keys

### Next Sprint (Performance)
4. Implement caching layer
5. Optimize database queries
6. Add connection pooling

### Technical Debt (Architecture)
7. Refactor to repository pattern
8. Extract business logic
9. Improve test coverage

Automated Fixes

Some issues can be auto-fixed:

bash

# Auto-fix simple issues
/execute-task "Apply automated fixes from review"

This handles:

Code formatting
Simple security fixes
Basic optimizations
Import organization

Review Validation

After fixes:

bash

# Verify fixes
/review --validate-fixes

Shows:

## Fix Validation

✅ SQL injection: FIXED
✅ Missing error handling: FIXED
⚠️ Performance optimization: PARTIAL
   - Query optimization done
   - Caching still pending
❌ Architecture refactor: NOT STARTED

Real-World Example

E-commerce Checkout Review

bash

/review src/checkout --security --performance

Findings:

## Multi-LLM Review: Checkout System

### 🔴 Critical Issues

1. **Payment Token Exposure**
   - Model: Gemini (Security)
   - Confidence: 95%
   - Location: checkout/payment.ts:134
   ```typescript
   // BAD: Token in frontend
   const token = await stripe.createToken(card)
   
   // GOOD: Token on backend only
   const { clientSecret } = await api.createPaymentIntent()

Race Condition in Inventory
- Models: GPT-4, Claude (Consensus)
- Confidence: 88%
- Location: checkout/inventory.ts:78
- Issue: No transaction isolation
- Fix: Use database transactions

🟡 Performance Concerns

Sequential API Calls

Model: Gemini (Performance)
Impact: +800ms latency

typescript

// Current: Sequential
const shipping = await getShipping()
const tax = await getTax()
const total = await getTotal()

// Better: Parallel
const [shipping, tax, total] = await Promise.all([
  getShipping(),
  getTax(), 
  getTotal()
])

🟢 Best Practice Suggestions

Add Idempotency
- Model: GPT-4 (Architecture)
- Prevent duplicate charges
- Add idempotency keys


## Custom Review Workflows

### Workflow 1: Pre-commit Hook

`.orchestre/commands/pre-commit-review.md`:

```markdown
# /pre-commit-review

Quick review before committing code.

## Prompt

Perform fast, focused review:

1. Check staged files only
2. Focus on:
   - Obvious bugs
   - Security issues
   - Broken imports
   - Console.logs left in

3. If issues found:
   - Block commit
   - Show fixes
   - Offer to apply

Keep under 5 seconds.

Workflow 2: PR Review Assistant

.orchestre/commands/pr-review.md:

markdown

# /pr-review

Comprehensive PR review with summary.

## Prompt

Review pull request:

1. **Change Analysis**
   - What changed and why
   - Impact assessment
   - Risk evaluation

2. **Code Quality**
   - Run full multi-LLM review
   - Focus on changed files
   - Check test coverage

3. **PR Summary**
   Generate markdown summary:
   - Key changes
   - Risks
   - Testing notes
   - Deployment considerations

Post as PR comment.

Best Practices

1. Regular Reviews

bash

# Daily architecture review
/review --architecture --threshold low

# Pre-deploy security
/review --security --threshold high

2. Focus on Consensus

Issues all models agree on are usually real problems.

3. Context Matters

bash

# Different standards for different code
/review src/experiments --relaxed
/review src/payments --strict

4. Track Progress

bash

# Save review results
/review --output review-report.md

# Track improvements
/review --compare-to last-week

5. Custom Training

Document your decisions:

markdown

<!-- CLAUDE.md -->
## Code Review Decisions
- We accept console.log in development files
- We require 80% test coverage
- We use functional components only

Troubleshooting

Conflicting Recommendations

When models disagree:

Consider the context
Check model specialties
Get human judgment
Document decision

Too Many Issues

Focus reviews:

bash

/review --critical-only
/review --focus "security"
/review --threshold high

False Positives

Train the system:

bash

/learn "This pattern is intentional because..."

Practice Exercises

1. Security Audit

bash

/review --security --output security-audit.md
/execute-task "Fix critical security issues"
/review --validate-fixes

2. Performance Sprint

bash

/review --performance
/orchestrate "Performance improvement sprint"
/performance-check --before-after

3. Architecture Refactor

bash

/review --architecture
/orchestrate "Refactor based on architecture review"
/review --compare-before-after

What You've Learned

✅ Leveraged multiple AI models for reviews ✅ Understood consensus mechanisms ✅ Created focused review workflows ✅ Acted on review findings ✅ Built custom review patterns

Next Steps

You now have AI-powered code quality assurance!

Continue to: Advanced: Complex Orchestration →

Enhance your reviews:

Create team-specific profiles
Integrate with CI/CD
Build review dashboards
Train on your patterns

Remember: Multi-LLM reviews are like having a team of expert reviewers available 24/7!

Tutorial 9: Multi-LLM Code Reviews ​

Learning Objectives ​

Prerequisites ​

Why Multi-LLM Reviews? ​

Part 1: Basic Multi-LLM Review ​

Simple Review ​

Understanding the Output ​

Part 2: Focused Reviews ​

Security-Focused Review ​

Exposed Sensitive Data ​

Missing Rate Limiting ​

Architecture Review ​

Part 3: Consensus Mechanisms ​

Understanding Agreement Levels ​

Confidence Scores ​

Part 4: Advanced Review Patterns ​

Pattern 1: Incremental Reviews ​

Pattern 2: Custom Review Profiles ​

Pattern 3: Review Automation ​

Part 5: Acting on Reviews ​

Prioritization Strategy ​

Automated Fixes ​

Review Validation ​

Real-World Example ​

E-commerce Checkout Review ​

🟡 Performance Concerns ​

🟢 Best Practice Suggestions ​

Workflow 2: PR Review Assistant ​

Best Practices ​

1. Regular Reviews ​

2. Focus on Consensus ​

3. Context Matters ​

4. Track Progress ​

5. Custom Training ​

Troubleshooting ​

Conflicting Recommendations ​

Too Many Issues ​

False Positives ​

Practice Exercises ​

1. Security Audit ​

2. Performance Sprint ​

3. Architecture Refactor ​

What You've Learned ​

Next Steps ​

Tutorial 9: Multi-LLM Code Reviews

Learning Objectives

Prerequisites

Why Multi-LLM Reviews?

Part 1: Basic Multi-LLM Review

Simple Review

Understanding the Output

Part 2: Focused Reviews

Security-Focused Review

Exposed Sensitive Data

Missing Rate Limiting

Architecture Review

Part 3: Consensus Mechanisms

Understanding Agreement Levels

Confidence Scores

Part 4: Advanced Review Patterns

Pattern 1: Incremental Reviews

Pattern 2: Custom Review Profiles

Pattern 3: Review Automation

Part 5: Acting on Reviews

Prioritization Strategy

Automated Fixes

Review Validation

Real-World Example

E-commerce Checkout Review

🟡 Performance Concerns

🟢 Best Practice Suggestions

Workflow 2: PR Review Assistant

Best Practices

1. Regular Reviews

2. Focus on Consensus

3. Context Matters

4. Track Progress

5. Custom Training

Troubleshooting

Conflicting Recommendations

Too Many Issues

False Positives

Practice Exercises

1. Security Audit

2. Performance Sprint

3. Architecture Refactor

What You've Learned

Next Steps