Tutorial 9: Multi-LLM Code Reviews
Learn to leverage multiple AI models for comprehensive code reviews, getting diverse perspectives on security, performance, architecture, and best practices. This powerful technique catches issues that single-model reviews might miss.
Learning Objectives
By the end of this tutorial, you'll:
- ✅ Understand multi-LLM review benefits
- ✅ Configure different review focuses
- ✅ Interpret consensus and disagreements
- ✅ Act on review recommendations
- ✅ Create custom review workflows
Prerequisites
- Completed previous tutorials
- A project with some code
- 60 minutes of time
Why Multi-LLM Reviews?
Different AI models have different strengths:
- GPT-4: Architecture and design patterns
- Claude: Code quality and readability
- Gemini: Security and performance
- Specialized Models: Domain-specific insights
Combined, they provide comprehensive analysis.
Part 1: Basic Multi-LLM Review
Simple Review
In your project:
/reviewThis triggers:
- Code analysis across multiple models
- Consensus building on findings
- Prioritized recommendations
- Actionable suggestions
Understanding the Output
🔍 Multi-LLM Code Review
## Consensus Issues (All Models Agree)
⚠️ HIGH: SQL Injection vulnerability in api/search.ts:45
- Direct string concatenation in query
- Suggested fix: Use parameterized queries
⚠️ MEDIUM: Missing error handling in services/payment.ts:78
- Unhandled promise rejection
- Suggested fix: Add try-catch block
## Model-Specific Insights
### GPT-4 (Architecture)
- Consider extracting business logic from controllers
- Repository pattern would improve testability
- Current coupling score: 7.2/10
### Claude (Readability)
- Function getUserData() is too complex (cyclomatic: 15)
- Variable names could be more descriptive
- Documentation coverage: 45%
### Gemini (Performance)
- Database queries in loop at handlers/sync.ts:123
- Missing indexes on frequently queried fields
- Potential memory leak in websocket handlerPart 2: Focused Reviews
Security-Focused Review
/review --securityModels focus on:
- Injection vulnerabilities
- Authentication flaws
- Data exposure risks
- Encryption issues
- OWASP top 10
Example output:
🔒 Security-Focused Review
## Critical Security Issues
### Authentication Bypass
Location: middleware/auth.ts:34
```typescript
// Vulnerable code
if (user.role === 'admin' || user.id == adminId) {
// Type coercion vulnerability!
}Fix: Use strict equality (===)
Exposed Sensitive Data
Location: api/users/route.ts:67
- Password hashes returned in API response
- Remove sensitive fields before sending
Missing Rate Limiting
- No rate limiting on login endpoint
- Brute force attacks possible
- Implement rate limiting middleware
### Performance-Focused Review
```bash
/review --performanceFocuses on:
- Query optimization
- Caching opportunities
- Bundle sizes
- Runtime performance
- Memory usage
Architecture Review
/review --architectureAnalyzes:
- Design patterns
- Code organization
- Dependency management
- Scalability concerns
- Maintainability
Part 3: Consensus Mechanisms
Understanding Agreement Levels
/review --show-consensusOutput shows:
## Consensus Analysis
### Strong Agreement (3/3 models)
✅ Need input validation on user endpoints
✅ Database connection pool not configured
✅ Missing unit tests for payment service
### Partial Agreement (2/3 models)
⚡ Consider caching for getProducts()
⚡ Extract magic numbers to constants
⚡ Add logging to error paths
### Single Model Concerns
🤔 GPT-4: Consider event sourcing pattern
🤔 Claude: Rename variables for clarity
🤔 Gemini: Optimize image loadingConfidence Scores
Each finding includes confidence:
- High (90%+): Clear issue, definite fix
- Medium (70-89%): Likely issue, suggested fix
- Low (50-69%): Potential issue, investigate
Part 4: Advanced Review Patterns
Pattern 1: Incremental Reviews
Review only changes:
# Review changes since last commit
/review --since HEAD~1
# Review specific files
/review src/api/**.ts --performance
# Review PR changes
/review --prPattern 2: Custom Review Profiles
Create .orchestre/review-profiles.json:
{
"pre-deploy": {
"focus": ["security", "performance", "errors"],
"threshold": "medium",
"models": ["gpt-4", "gemini"],
"fail-on": ["critical", "high"]
},
"daily": {
"focus": ["architecture", "maintainability"],
"threshold": "low",
"models": ["all"]
}
}Use profiles:
/review --profile pre-deployPattern 3: Review Automation
Create .orchestre/commands/review-before-deploy.md:
# /review-before-deploy
Comprehensive review before production deployment.
## Prompt
Perform multi-stage review:
1. **Security Review**
/review --security --threshold high
2. **Performance Review**
/review --performance --show-metrics
3. **Error Handling Review**
/review --focus "error handling, logging"
4. **Dependencies Review**
- Check for vulnerabilities
- Verify licenses
- Identify outdated packages
5. **Production Readiness**
- Environment variables
- Error tracking setup
- Monitoring configuration
- Backup procedures
Generate deployment checklist based on findings.Part 5: Acting on Reviews
Prioritization Strategy
/orchestrate "Fix issues from code review, prioritized by impact"Creates plan like:
## Review Remediation Plan
### Immediate (Security Critical)
1. Fix SQL injection in search
2. Add authentication to admin endpoints
3. Remove exposed API keys
### Next Sprint (Performance)
4. Implement caching layer
5. Optimize database queries
6. Add connection pooling
### Technical Debt (Architecture)
7. Refactor to repository pattern
8. Extract business logic
9. Improve test coverageAutomated Fixes
Some issues can be auto-fixed:
# Auto-fix simple issues
/execute-task "Apply automated fixes from review"This handles:
- Code formatting
- Simple security fixes
- Basic optimizations
- Import organization
Review Validation
After fixes:
# Verify fixes
/review --validate-fixesShows:
## Fix Validation
✅ SQL injection: FIXED
✅ Missing error handling: FIXED
⚠️ Performance optimization: PARTIAL
- Query optimization done
- Caching still pending
❌ Architecture refactor: NOT STARTEDReal-World Example
E-commerce Checkout Review
/review src/checkout --security --performanceFindings:
## Multi-LLM Review: Checkout System
### 🔴 Critical Issues
1. **Payment Token Exposure**
- Model: Gemini (Security)
- Confidence: 95%
- Location: checkout/payment.ts:134
```typescript
// BAD: Token in frontend
const token = await stripe.createToken(card)
// GOOD: Token on backend only
const { clientSecret } = await api.createPaymentIntent()- Race Condition in Inventory
- Models: GPT-4, Claude (Consensus)
- Confidence: 88%
- Location: checkout/inventory.ts:78
- Issue: No transaction isolation
- Fix: Use database transactions
🟡 Performance Concerns
- Sequential API Calls
- Model: Gemini (Performance)
- Impact: +800ms latency
typescript// Current: Sequential const shipping = await getShipping() const tax = await getTax() const total = await getTotal() // Better: Parallel const [shipping, tax, total] = await Promise.all([ getShipping(), getTax(), getTotal() ])
🟢 Best Practice Suggestions
- Add Idempotency
- Model: GPT-4 (Architecture)
- Prevent duplicate charges
- Add idempotency keys
## Custom Review Workflows
### Workflow 1: Pre-commit Hook
`.orchestre/commands/pre-commit-review.md`:
```markdown
# /pre-commit-review
Quick review before committing code.
## Prompt
Perform fast, focused review:
1. Check staged files only
2. Focus on:
- Obvious bugs
- Security issues
- Broken imports
- Console.logs left in
3. If issues found:
- Block commit
- Show fixes
- Offer to apply
Keep under 5 seconds.Workflow 2: PR Review Assistant
.orchestre/commands/pr-review.md:
# /pr-review
Comprehensive PR review with summary.
## Prompt
Review pull request:
1. **Change Analysis**
- What changed and why
- Impact assessment
- Risk evaluation
2. **Code Quality**
- Run full multi-LLM review
- Focus on changed files
- Check test coverage
3. **PR Summary**
Generate markdown summary:
- Key changes
- Risks
- Testing notes
- Deployment considerations
Post as PR comment.Best Practices
1. Regular Reviews
# Daily architecture review
/review --architecture --threshold low
# Pre-deploy security
/review --security --threshold high2. Focus on Consensus
Issues all models agree on are usually real problems.
3. Context Matters
# Different standards for different code
/review src/experiments --relaxed
/review src/payments --strict4. Track Progress
# Save review results
/review --output review-report.md
# Track improvements
/review --compare-to last-week5. Custom Training
Document your decisions:
<!-- CLAUDE.md -->
## Code Review Decisions
- We accept console.log in development files
- We require 80% test coverage
- We use functional components onlyTroubleshooting
Conflicting Recommendations
When models disagree:
- Consider the context
- Check model specialties
- Get human judgment
- Document decision
Too Many Issues
Focus reviews:
/review --critical-only
/review --focus "security"
/review --threshold highFalse Positives
Train the system:
/learn "This pattern is intentional because..."Practice Exercises
1. Security Audit
/review --security --output security-audit.md
/execute-task "Fix critical security issues"
/review --validate-fixes2. Performance Sprint
/review --performance
/orchestrate "Performance improvement sprint"
/performance-check --before-after3. Architecture Refactor
/review --architecture
/orchestrate "Refactor based on architecture review"
/review --compare-before-afterWhat You've Learned
✅ Leveraged multiple AI models for reviews ✅ Understood consensus mechanisms ✅ Created focused review workflows ✅ Acted on review findings ✅ Built custom review patterns
Next Steps
You now have AI-powered code quality assurance!
Continue to: Advanced: Complex Orchestration →
Enhance your reviews:
- Create team-specific profiles
- Integrate with CI/CD
- Build review dashboards
- Train on your patterns
Remember: Multi-LLM reviews are like having a team of expert reviewers available 24/7!
