Skip to content

Building "Raven" - Your AI Trading Agent

Roadmap from Current State to Aurora-Level Capabilities

Created: 2025-10-30 Inspiration: NexusTrade Aurora, OpenAI o1 Trading Strategy Target: Transform your trading infrastructure into an autonomous AI agent system


🎯 Vision

Build an AI trading agent ("Raven") that combines: - Your existing risk management discipline - Aurora's genetic optimization capabilities - LLM-powered natural language strategy creation - Autonomous testing and deployment workflow

Key principle: Enhance what you have, don't replace it.


📊 Current State Analysis

✅ What You Already Have (Strong Foundation)

Infrastructure (tools/portfolio_fetcher.py:1-238): - ✅ Multi-broker support (Schwab + Alpaca) - ✅ Portfolio tracking with P&L analysis - ✅ Paper trading capability (Alpaca $99k) - ✅ Trading Command Center dashboard (localhost:3001)

Backtesting (tools/mean_reversion_backtest.py:1-100): - ✅ Mean reversion strategy with Bollinger Bands + RSI - ✅ Walk-forward backtesting framework - ✅ Performance metrics (Sharpe, drawdown, win rate)

Risk Management (~/Documents/memory/entities/preferences/trading-preferences.md:1-202): - ✅ Mandatory 3-stage testing (Paper → Small Live → Full Live) - ✅ Stop loss discipline (9% crypto, 5-10% stocks) - ✅ Position limits ($1,000 default, 50% cash reserve) - ✅ Clear success criteria (50% win rate, positive Sharpe)

Current Performance: - Win rate: 37.5% (below 50% target) - Sharpe: -0.04 (negative) - Total P&L: -$11.05 - Max drawdown: $102.08

Analysis: Strong infrastructure and discipline, but strategy parameters need optimization.

❌ What You're Missing (Aurora's Advantages)

1. Parameter Optimization - ❌ Manual parameter selection (RSI thresholds, stop losses, etc.) - ❌ No genetic algorithms - ❌ No systematic parameter search - Impact: Suboptimal parameters → 37.5% win rate instead of 50%+

2. Agentic Orchestration - ❌ No ReAct framework (Reasoning + Acting loop) - ❌ No autonomous strategy generation - ❌ Manual workflow at each step - Impact: Slow iteration, requires human intervention

3. Natural Language Interface - ❌ Can't describe strategy in plain English → get code - ❌ No LLM-powered strategy creation - Impact: Must code every strategy variant manually

4. Systematic Validation - ❌ No automatic training/validation split - ❌ No overfitting detection - ❌ No automated paper → live promotion - Impact: Risk of deploying overfitted strategies


🚀 Three-Phase Roadmap

Phase 1: Genetic Optimization (2 weeks)

Goal: Add Aurora-style parameter optimization to existing strategies

What You'll Build: - Genetic algorithm optimizer (NSGA-II) - Multi-objective optimization (Sharpe + Return + Drawdown) - Training/validation split with overfitting detection - Automated strategy comparison dashboard

Expected Results: - Win rate: 37.5% → 50%+ - Sharpe: -0.04 → 1.0+ - Optimization time: 21 days → 7 minutes (4,500X faster)

Deliverables: 1. tools/genetic_optimizer.py - Core optimizer 2. tools/optimizable_strategy.py - Base class for strategies 3. tools/mean_reversion_optimized.py - Optimized mean reversion 4. CLI: uv run python3 tools/optimize_strategy.py mean_reversion

Success Criteria: - ✅ Optimized strategy beats manual on validation data - ✅ 1 week paper trading shows positive results - ✅ Win rate ≥50%, Sharpe ≥1.0, drawdown ≤20%

Time Investment: 9 days coding + 5 days validation = 2 weeks

Detailed Spec: specs/2025-10-30_genetic-strategy-optimizer.md


Phase 2: Agentic Workflow (2-3 weeks)

Goal: Add autonomous strategy creation and testing loop

What You'll Build: - ReAct framework orchestrator - FSM (Finite State Machine) for workflow states - LLM-powered strategy generation from natural language - Autonomous paper trading deployment

Architecture:

┌───────────────────────────────────────────────────────────┐
│                   Raven AI Agent (ReAct)                   │
├───────────────────────────────────────────────────────────┤
│                                                            │
│  User Input (Natural Language):                           │
│  "Create a mean reversion strategy for tech stocks"       │
│                                                            │
│  ┌─────────────────────────────────────────────────┐    │
│  │ State 1: RESEARCH                                │    │
│  │ - Query financial data (MCP)                     │    │
│  │ - Analyze market conditions                      │    │
│  │ - Identify relevant indicators                   │    │
│  └─────────────────────────────────────────────────┘    │
│                        ↓                                   │
│  ┌─────────────────────────────────────────────────┐    │
│  │ State 2: GENERATE                                │    │
│  │ - LLM generates strategy code                    │    │
│  │ - Define parameter space                         │    │
│  │ - Create multiple strategy variants              │    │
│  └─────────────────────────────────────────────────┘    │
│                        ↓                                   │
│  ┌─────────────────────────────────────────────────┐    │
│  │ State 3: BACKTEST                                │    │
│  │ - Run backtests on historical data               │    │
│  │ - Test across multiple time periods              │    │
│  │ - Identify promising strategies                  │    │
│  └─────────────────────────────────────────────────┘    │
│                        ↓                                   │
│  ┌─────────────────────────────────────────────────┐    │
│  │ State 4: OPTIMIZE                                │    │
│  │ - Launch genetic algorithm (Phase 1 tool)        │    │
│  │ - Find optimal parameters                        │    │
│  │ - Validate on unseen data                        │    │
│  └─────────────────────────────────────────────────┘    │
│                        ↓                                   │
│  ┌─────────────────────────────────────────────────┐    │
│  │ State 5: PAPER TRADE                             │    │
│  │ - Deploy to Alpaca paper account                 │    │
│  │ - Monitor for 1 week (mandatory)                 │    │
│  │ - Track win rate, Sharpe, drawdown               │    │
│  └─────────────────────────────────────────────────┘    │
│                        ↓                                   │
│  ┌─────────────────────────────────────────────────┐    │
│  │ State 6: DECISION                                │    │
│  │ IF paper trading successful (50%+ win rate):     │    │
│  │   → Recommend live deployment                    │    │
│  │ ELSE:                                             │    │
│  │   → Return to GENERATE with learnings            │    │
│  └─────────────────────────────────────────────────┘    │
│                                                            │
│  Tools Available:                                         │
│  - search_financial_data()                                │
│  - generate_strategy_code()                               │
│  - run_backtest()                                         │
│  - optimize_parameters()                                  │
│  - deploy_to_paper()                                      │
│  - check_paper_results()                                  │
│                                                            │
└───────────────────────────────────────────────────────────┘

Deliverables: 1. tools/raven_agent.py - ReAct orchestrator 2. tools/strategy_generator.py - LLM-powered code generation 3. tools/workflow_fsm.py - State machine implementation 4. CLI: uv run python3 tools/raven_agent.py "Create momentum strategy for FAANG stocks"

Success Criteria: - ✅ Agent generates valid strategy code from natural language - ✅ Autonomous workflow completes without human intervention - ✅ Generated strategies pass all validation gates

Time Investment: 2-3 weeks


Phase 3: Multi-Strategy Portfolio (2-3 weeks)

Goal: Optimize portfolio allocation across multiple strategies

What You'll Build: - Portfolio optimizer (Modern Portfolio Theory) - Strategy correlation analysis - Risk parity allocation - Ensemble trading system

Key Features: - Run 5-10 strategies simultaneously - Allocate capital based on Sharpe ratios - Rebalance monthly based on performance - Diversify across strategy types (momentum, mean reversion, breakout)

Deliverables: 1. tools/portfolio_optimizer.py - Multi-strategy allocation 2. tools/strategy_ensemble.py - Ensemble system 3. Dashboard updates - Multi-strategy view

Success Criteria: - ✅ Ensemble Sharpe > best individual strategy - ✅ Correlation between strategies <0.7 - ✅ Drawdown reduced by 30%+

Time Investment: 2-3 weeks


📋 Phase 1 Detailed Plan (Next 2 Weeks)

Week 1: Core Optimizer

Day 1 (TODAY): Design & Approval - ✅ Review roadmap with user - ✅ Get approval on Phase 1 scope - Install dependencies (pymoo, matplotlib, seaborn)

Day 2-3: Genetic Optimizer Core - Implement GeneticOptimizer class - Integrate NSGA-II algorithm - Test with dummy fitness function - Verify Pareto frontier generation

Day 4-5: Strategy Base Class - Create OptimizableStrategy abstract class - Define parameter space interface - Implement train/validation split - Add overfitting detection

Day 6-7: Mean Reversion Conversion - Convert mean_reversion_backtest.py to optimizable format - Define parameter bounds (6 parameters) - Implement fitness calculation (3 objectives)

Week 2: Optimization & Validation

Day 8-9: Full Optimization Run - Run genetic optimizer on Mean Reversion - 20 generations × 20 population = 400 backtests - Generate Pareto frontier plots - Analyze top 5 strategies

Day 10: Results Analysis - Compare optimized vs. manual parameters - Create optimization report (PDF) - Document parameter distributions

Day 11: Paper Trading Deployment - Deploy best strategy to Alpaca paper account - Set up monitoring dashboard - Configure alerts (win rate, drawdown)

Day 12-14: Paper Trading Monitoring - Track performance daily - Compare to backtest predictions - Detect any anomalies

Day 14 (End of Week 2): Go/No-Go Decision - IF successful (50%+ win rate, positive Sharpe): - Proceed to Phase 2 (Agentic Workflow) - Deploy to small live account ($100-500 positions) - IF unsuccessful: - Analyze failure modes - Iterate on optimizer (more generations, different objectives)


🔧 Technical Architecture

Technology Stack

Existing (keep): - Python 3.11+ - alpaca-py (broker API) - pandas, numpy (data processing) - FastMCP 2.0 (memory system) - Trading Command Center (Next.js dashboard)

New Dependencies (Phase 1):

pymoo = "^0.6.1"       # NSGA-II multi-objective optimization
matplotlib = "^3.8.0"  # Pareto frontier plots
seaborn = "^0.13.0"    # Statistical visualizations

New Dependencies (Phase 2):

langchain = "^0.1.0"   # LLM orchestration
litellm = "^1.0.0"     # Multi-LLM support
transitions = "^0.9.0" # FSM for workflow states

New Dependencies (Phase 3):

scipy = "^1.11.0"      # Optimization algorithms
cvxpy = "^1.4.0"       # Convex optimization

Data Flow (Phase 1)

Historical Data (Alpaca API)
Split: 70% Training / 30% Validation
Genetic Algorithm (NSGA-II)
Generate 20 random parameter sets (Generation 0)
For each parameter set:
    Run backtest on training data
    Calculate: Sharpe, Return, Drawdown
Non-dominated sorting (Pareto ranking)
Selection, Crossover, Mutation
Repeat for 20 generations
Pareto frontier of optimal strategies
Test top 5 on validation data
Best strategy deployed to paper trading

💡 Key Insights from Articles

1. Genetic Algorithms Work (NexusTrade Proof)

Evidence: - Austin Starks' UPRO-GLD strategy: 35% return, 3.26 Sharpe (optimized) - His Robinhood: +$30,872 YTD (123% return) - Tech rebalancing strategy: 43% return (6 months) vs SPY's 16%

Why it works: - Explores 1,200+ parameter combinations in 15 seconds - Discovers non-obvious patterns (e.g., RSI [28, 72] better than [30, 70]) - Multi-objective optimization finds balanced solutions

Your opportunity: - Current: 37.5% win rate (manual parameters) - Expected: 50%+ win rate (optimized parameters) - Potential: 2.0+ Sharpe ratio (vs -0.04 current)

2. OpenAI o1 + Agentic Workflow = Market-Beating

Evidence (from o1 trading article): - Strategy beat SPY 3X (268% vs ~90%) - Single request → full strategy generation - Iterative improvement loop

Key difference: o1 model can reason through complex problems - Not just predicting next word - Actually planning and testing hypotheses

Your advantage: - You have discipline (mandatory paper trading) - You have infrastructure (backtesting, multi-broker) - Adding agentic workflow = best of both worlds

3. MCP for Financial Data = Clean Architecture

Benefits (from Financial Modeling Prep article): - Unified interface across data sources - No API juggling - Structured data (JSON, ready for analysis) - Natural fit with LLM agents

Your opportunity: - Currently: Direct Alpaca API calls (coupled) - Future: MCP layer for Schwab + Alpaca + others - Enables: Easy strategy backtesting across brokers


🎯 Success Metrics

Phase 1 (Genetic Optimization)

Quantitative: - ✅ Win rate: 37.5% → 50%+ - ✅ Sharpe ratio: -0.04 → 1.0+ - ✅ Max drawdown: Reduce by 50% - ✅ Optimization time: <10 minutes (vs 21 days manual)

Qualitative: - ✅ Clear Pareto frontier visualization - ✅ Interpretable parameter recommendations - ✅ Confidence in parameter robustness

Phase 2 (Agentic Workflow)

Quantitative: - ✅ Strategy generation time: <2 minutes - ✅ Valid code generation: 90%+ success rate - ✅ End-to-end workflow: <10 minutes

Qualitative: - ✅ Natural language interface feels intuitive - ✅ Agent explains reasoning clearly - ✅ Workflow requires minimal human intervention

Phase 3 (Multi-Strategy Portfolio)

Quantitative: - ✅ Portfolio Sharpe > best individual strategy - ✅ Strategy correlation <0.7 - ✅ Drawdown reduction: 30%+

Qualitative: - ✅ Diversification benefits visible - ✅ Risk-adjusted returns improved - ✅ System feels robust


🚨 Risks & Mitigations

Risk 1: Overfitting (HIGH PROBABILITY)

Problem: Optimizer finds parameters perfect for training data, fails on validation/live

Evidence from articles: - NexusTrade acknowledges optimization can fail - Aurora detects performance degradation automatically - Training/validation split is mandatory

Your mitigation strategy: 1. Always use 70/30 train/validation split 2. Require validation Sharpe within 30% of training (e.g., 2.0 → 1.4 acceptable) 3. Mandate 1 week paper trading before live (catches real-world issues) 4. Use walk-forward windows (test across multiple time periods)

Example of acceptable degradation: - Training: Sharpe 2.15, Return 38.7%, Drawdown 12.3% - Validation: Sharpe 1.51, Return 27.2%, Drawdown 16.8% - Gap: 30% Sharpe drop, 30% return drop, 37% drawdown increase - Verdict: Acceptable (within limits)

Example of overfitting (REJECT): - Training: Sharpe 3.50, Return 80%, Drawdown 5% - Validation: Sharpe 0.40, Return 8%, Drawdown 45% - Gap: 89% Sharpe drop, 90% return drop, 9X drawdown - Verdict: Severely overfit, do not deploy

Risk 2: Market Regime Change (MEDIUM PROBABILITY)

Problem: Strategy optimized for bull market, fails in bear market

Evidence: - Your current performance: -$11.05 P&L (37.5% win rate) - Possible regime mismatch (optimized for different conditions)

Your mitigation strategy: 1. Include 2020 crash data in training (if available) 2. Test across multiple market regimes (bull, bear, sideways) 3. Monitor regime indicators (VIX, market breadth) 4. Re-optimize monthly (adapt to changing conditions)

Risk 3: Complexity Creep (MEDIUM PROBABILITY)

Problem: System becomes too complex to understand/maintain

Evidence from experience: - Aurora is powerful but proprietary - Your current system is maintainable (simple Python scripts)

Your mitigation strategy: 1. Keep Phase 1 simple (just genetic optimizer) 2. Don't over-engineer (avoid premature generalization) 3. Document everything (specs, decisions, learnings) 4. Maintain backward compatibility (keep existing tools working)

Risk 4: Parameter Instability (LOW PROBABILITY)

Problem: Optimal parameters change every week (not robust)

Your mitigation strategy: 1. Track parameter drift over time 2. Use ensemble of top 5 strategies (not just #1) 3. Set conservative parameter bounds (avoid extremes) 4. Require parameters stable across multiple optimization runs


📚 Learning from NexusTrade/Aurora

What Aurora Does Well (Copy This)

  1. Genetic Optimization
  2. Uses NSGA-II (proven algorithm)
  3. Multi-objective (not just max returns)
  4. 1,200+ backtests in 15 seconds (fast iteration)

  5. Workflow Automation

  6. Research → Generate → Backtest → Optimize → Paper Trade
  7. Finite State Machine (clear states)
  8. Autonomous decision-making (minimize human intervention)

  9. Overfitting Detection

  10. Training/validation split mandatory
  11. Performance degradation alerts
  12. Won't deploy if validation fails

  13. User Communication

  14. Clear output ("Best Sharpe: 2.15 | Return: 38.7%")
  15. Visualizations (Pareto frontier plots)
  16. Multiple options (top 5 strategies, not just #1)

What Aurora Doesn't Do (Your Advantage)

  1. Risk Management Discipline
  2. Aurora users can skip paper trading
  3. No mandatory stop losses
  4. No position size limits

Your advantage: Enforce these rules in code (no opt-out)

  1. Multi-Broker Support
  2. Aurora only supports certain brokers
  3. Not integrated with Schwab

Your advantage: Schwab + Alpaca, potential to add more

  1. Custom Strategies
  2. Aurora focuses on rebalancing strategies
  3. Limited strategy types

Your advantage: Build any strategy type (momentum, mean reversion, options, etc.)

  1. Open Source
  2. Aurora is proprietary
  3. Can't customize deeply

Your advantage: Full control over code, can adapt to your needs


🎓 Lessons from OpenAI o1 Trading Strategy

Key Insight: LLMs Can Reason About Trading

Evidence (from o1 article): - Beat market 3X (268% vs SPY's ~90%) - "Vibe-trading" - describe strategy → get code - Iterative improvement loop

What this means for you: 1. LLMs are ready for trading (not just toy examples) 2. Natural language interface is viable (Phase 2 goal) 3. Reasoning models (o1, Sonnet) can plan multi-step workflows

How to Apply This

Phase 1 (no LLM yet): - Focus on optimization (mechanical, no LLM needed) - Build foundation for Phase 2

Phase 2 (LLM integration): - Use LLM to generate strategy code from natural language - Use LLM to analyze backtest results - Use LLM to decide next actions (ReAct loop)

Phase 3 (advanced LLM): - Use LLM for market regime detection - Use LLM for portfolio allocation decisions - Use LLM for trade execution timing

Model Recommendations: - Strategy generation: Claude Sonnet 4 (best coding) - Reasoning/Planning: OpenAI o1 (best multi-step reasoning) - Data analysis: GPT-4 Turbo (fast, cheap)


🔄 Integration with Existing System

What Stays the Same

✅ Keep using: - portfolio_fetcher.py (multi-account tracking) - Trading Command Center dashboard (UI) - Schwab + Alpaca accounts (brokers) - Your risk management rules (stop losses, position limits) - Your testing protocol (Paper → Small Live → Full Live)

✅ Don't break: - Existing execution scripts (execute_*.py) - Existing backtest scripts (just enhance them) - Memory system (FastMCP)

What Changes

Phase 1 additions: - ✅ tools/genetic_optimizer.py (NEW) - ✅ tools/optimizable_strategy.py (NEW) - ✅ Enhance mean_reversion_backtest.py (add optimization interface)

Phase 2 additions: - ✅ tools/raven_agent.py (NEW) - ✅ tools/strategy_generator.py (NEW) - ✅ tools/workflow_fsm.py (NEW)

Phase 3 additions: - ✅ tools/portfolio_optimizer.py (NEW) - ✅ Dashboard updates (multi-strategy view)


💰 Cost Analysis

Current Costs (Baseline)

  • Alpaca Paper: $0 (free)
  • Alpaca Live: $0/month (no subscription)
  • Schwab: $0/month (no subscription)
  • LM Studio: $0 (local LLM)
  • Trading Command Center: $0 (self-hosted)

Total current: $0/month

Phase 1 Costs (Genetic Optimization)

  • New dependencies: $0 (all open-source)
  • Compute: $0 (local, 10 min optimizations)
  • No cloud APIs needed

Phase 1 total: $0/month

Phase 2 Costs (Agentic Workflow)

  • LLM API costs:
  • Claude Sonnet 4: ~$3/million input tokens
  • Strategy generation: ~20k tokens/request
  • Cost per strategy: ~$0.06
  • Expected: 10 strategies/month = $0.60/month

Phase 2 total: ~$1/month

Phase 3 Costs (Multi-Strategy Portfolio)

  • No additional API costs
  • Slightly more compute (multiple strategies running)

Phase 3 total: ~$1/month

Comparison to NexusTrade

  • NexusTrade Premium: $20/month
  • Your system: $1/month (95% cheaper)
  • Savings: $19/month × 12 = $228/year

📅 Timeline Summary

Phase Duration Start End Key Deliverable
Phase 1 2 weeks Week 1 Week 2 Genetic optimizer working, Mean Reversion optimized
Paper Trading 1 week Week 3 Week 3 Validate optimized strategy in paper account
Decision Point - End Week 3 - Go/No-Go for Phase 2
Phase 2 2-3 weeks Week 4 Week 6-7 Agentic workflow, natural language interface
Phase 3 2-3 weeks Week 7-8 Week 9-11 Multi-strategy portfolio optimizer

Total timeline: 7-11 weeks (2-3 months)


🎯 Next Steps (Today)

User Decisions Needed

1. Approve Phase 1 Scope? - Build genetic optimizer for Mean Reversion strategy - 2 weeks coding + 1 week validation - Expected: 50%+ win rate, 1.0+ Sharpe ratio

Your answer: [ ]

2. Primary Objective for Optimization? - Option A: Sharpe ratio (risk-adjusted returns) ← Your preference? - Option B: Total return (maximize profits, ignore risk) - Option C: Balanced (composite score)

Your answer: [ ]

3. Historical Data Period? - Option A: 2 years (2023-01-01 to 2025-10-30) - Option B: 5 years (more robust, if data available)

Your answer: [ ]

4. Paper Trading Duration? - Option A: 1 week minimum (fast iteration) - Option B: 2 weeks (more confidence)

Your answer: [ ]

5. Proceed with Implementation? - [ ] Yes - Start Phase 1 today - [ ] No - Need more information (what?) - [ ] Modified scope - Explain changes

Your answer: [ ]


Specifications: - specs/2025-10-30_genetic-strategy-optimizer.md - Full Phase 1 technical spec

Memory Entities: - ~/Documents/memory/entities/preferences/trading-preferences.md - Your risk rules - ~/Documents/memory/entities/projects/trading-command-center.md - Dashboard project

Existing Code: - tools/portfolio_fetcher.py - Multi-account tracking - tools/mean_reversion_backtest.py - Strategy to optimize - tools/strategy_tester.py - Paper trading execution

Inspiration Articles (Medium): - "How to backtest hundreds of strategies in 15 seconds" - Austin Starks - "I wanted to build an AI that trades stocks" - Austin Starks (Aurora) - "Using OpenAI o1 to create trading strategy" - 268% returns - "Create a Stock Screener with MCP Servers" - MCP for financial data


✅ Status

Document Status: ✅ Complete, awaiting user approval

Next Action: User review + answer 5 questions above

After Approval: Begin Phase 1 implementation (install pymoo, start coding optimizer)


Last updated: 2025-10-30 Questions? Review detailed spec: specs/2025-10-30_genetic-strategy-optimizer.md