HubSpot Developer Agent - Implementation Plan¶
Status: 🚧 In Progress Priority: P0 - CRITICAL (10 hrs/week time savings) Complexity: Medium Estimated Time: 16-24 hours ROI: Eliminate manual CRM data entry, proactive client health tracking
Table of Contents¶
- Executive Summary
- Current Infrastructure
- Architecture Overview
- Implementation Phases
- Code Implementation
- Testing Strategy
- Deployment
- Success Metrics
Executive Summary¶
Problem Statement¶
IT Raven generates 10,000+ client emails annually, but HubSpot CRM data entry is manual, time-consuming, and inconsistent. Critical client interactions aren't being tracked, leading to missed follow-ups and at-risk accounts.
Solution¶
Build an intelligent HubSpot Developer Agent that: - Auto-syncs M365 emails → HubSpot contacts/companies/deals - Extracts action items, commitments, and issues from email content - Tracks client health scores based on communication frequency - Alerts on at-risk clients (no contact >30 days) - Generates weekly client intelligence reports
Business Impact¶
- Time Savings: 10 hrs/week manual data entry eliminated
- Revenue Protection: Proactive at-risk client identification
- Data Quality: 100% of client interactions tracked
- Scalability: Handle 2x client growth without CRM admin overhead
Current Infrastructure¶
✅ What You Already Have¶
1. HubSpot MCP Server (tools/hubspot_mcp_server.py)
- FastMCP-based server with stdio/HTTP transport
- Tools: get_contacts, search_contacts, create_contact, etc.
- OpenTelemetry tracing integrated
- Credentials configured in .env.hubspot
2. IT Raven Automation (tools/it_raven_hubspot_automation.py)
- Auto-links contacts to companies by email domain
- Domain caching to reduce API calls
- Dry-run mode for testing
- Weekly LaunchAgent execution
3. HubSpot Credentials
HUBSPOT_ACCESS_TOKEN=pat-na1-f2786ae5-9e5a-408e-8ce8-0d24375f34d0
HUBSPOT_CLIENT_SECRET=8444640f-03d5-4c2c-b61a-06faa2b5c4ee
HUBSPOT_PORTAL_ID=443524610
4. M365 Email Data
- 10,000+ IT Raven client emails in ~/Documents/memory/entities/itraven-email/
- Already indexed in Qdrant
- Includes: sender, recipient, subject, body, timestamp
- Client discovery JSON with 100+ companies
5. Existing HubSpot Utilities (18 scripts) - CRM exporters, bulk importers, workflow fetchers - Audit tools, health checkers - Proven patterns for rate limiting and error handling
🔨 What Needs to Be Built¶
1. Email Intelligence Extractor - Parse M365 emails for CRM-relevant information - Extract: client names, action items, issues, commitments - Identify email type (support ticket, sales inquiry, meeting, etc.)
2. Smart CRM Sync Engine - Auto-create/update HubSpot contacts from email senders - Auto-create/update companies from email domains - Auto-create deals for new projects/opportunities - Log email interactions as HubSpot engagements
3. Client Health Monitor - Track days since last contact per client - Calculate health scores (email frequency, issue resolution time) - Alert on at-risk clients (>30 days, unresolved critical issues)
4. Lifecycle Stage Automation - Move contacts through stages based on email patterns - Handle backwards movement (active → inactive → churned) - Trigger workflows based on stage changes
5. Weekly Intelligence Report Generator - Top active clients this week - At-risk clients requiring attention - Unresolved action items - Communication trends
Architecture Overview¶
┌─────────────────────────────────────────────────────────────┐
│ M365 Email Source │
│ ~/Documents/memory/entities/itraven-email/*.md │
│ (10,000+ emails, indexed in Qdrant) │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Email Intelligence Extractor │
│ - Parse email metadata and content │
│ - Identify client company from domain/email │
│ - Extract action items using NER/LLM │
│ - Classify email type (support, sales, meeting) │
│ - Calculate urgency/priority │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Smart CRM Sync Engine │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Contacts: Create/update from email senders │ │
│ │ - Extract: firstname, lastname, email, phone │ │
│ │ - Set: lifecycle stage, last contact date │ │
│ └─────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Companies: Create/update from domains │ │
│ │ - Match domain → company name (Clearbit/manual) │ │
│ │ - Set: industry, size, website │ │
│ └─────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Deals: Create from project emails │ │
│ │ - Detect keywords: "quote", "proposal", "RFP" │ │
│ │ - Extract: deal amount, timeline, services │ │
│ └─────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Engagements: Log all email interactions │ │
│ │ - Type: EMAIL, timestamp, subject, body excerpt │ │
│ │ - Link to: contact + company + deal (if any) │ │
│ └─────────────────────────────────────────────────────┘ │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ HubSpot MCP Server │
│ (Existing: tools/hubspot_mcp_server.py) │
│ - FastMCP with get/create/update/search tools │
│ - Rate limiting with exponential backoff │
│ - Token refresh automation │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ HubSpot CRM (Portal: 443524610) │
│ - Contacts: IT Raven client contacts │
│ - Companies: Client organizations │
│ - Deals: Active projects/opportunities │
│ - Engagements: Email interaction log │
└─────────────────────────────────────────────────────────────┘
Data Flow¶
Email → CRM Sync (Hourly)
1. Scan new emails in ~/Documents/memory/entities/itraven-email/
2. Extract sender email + company domain
3. Check HubSpot: Contact exists? Company exists?
4. Create/update Contact with latest info
5. Link Contact ← → Company (if not already)
6. Create Engagement record for this email
7. Update "Last Contact Date" on both Contact + Company
8. Extract action items → Create Tasks in HubSpot
Client Health Monitor (Daily) 1. Query all IT Raven companies from HubSpot 2. Calculate days since last contact 3. Flag if >30 days (at-risk), >90 days (churned) 4. Check for unresolved critical issues (from emails) 5. Generate alert email with at-risk client list
Weekly Intelligence Report (Monday 8am) 1. Top 10 most active clients this week 2. At-risk clients requiring attention 3. New companies added 4. Open action items summary 5. Send to IT Raven team + store in Notion
Implementation Phases¶
Phase 1: Email Intelligence Extractor (4-6 hours)¶
Goal: Parse M365 emails and extract CRM-relevant structured data
Tasks:
1. ✅ Read email files from ~/Documents/memory/entities/itraven-email/
2. ✅ Parse metadata: From, To, Subject, Date, Body
3. ✅ Extract sender email and company domain
4. ✅ Classify email type (support, sales, meeting, general)
5. ✅ Extract action items using LLM (LM Studio local model)
6. ✅ Identify urgency/priority keywords
7. ✅ Store extracted data in structured JSON
Deliverables:
- tools/email_intelligence_extractor.py
- Output: JSON with parsed email data for CRM sync
Testing:
# Test on 10 sample emails
uv run python3 tools/email_intelligence_extractor.py --limit 10 --verbose
# Expected output: JSON with contacts, companies, action items extracted
Phase 2: Smart CRM Sync Engine (6-8 hours)¶
Goal: Auto-sync extracted email data to HubSpot CRM
Tasks: 1. ✅ Load extracted email intelligence JSON 2. ✅ For each email sender: - Check if contact exists (by email) - Create new contact if not found - Update existing contact (last contact date, lifecycle stage) 3. ✅ For each company domain: - Check if company exists (by domain property) - Create new company if not found - Update existing company (last contact date) 4. ✅ Link contact ← → company association 5. ✅ Create engagement record for this email interaction 6. ✅ Extract and create tasks for action items 7. ✅ Implement rate limiting (100 req/10s, with exponential backoff) 8. ✅ Handle lifecycle stage updates (including backwards movement)
Deliverables:
- tools/hubspot_email_sync.py
- Dry-run mode for safe testing
- Comprehensive logging
Testing:
# Dry run (no actual CRM changes)
uv run python3 tools/hubspot_email_sync.py --dry-run --limit 50
# Live run on small batch
uv run python3 tools/hubspot_email_sync.py --limit 10
# Full sync (after validation)
uv run python3 tools/hubspot_email_sync.py
Phase 3: Client Health Monitor (3-4 hours)¶
Goal: Track client engagement and identify at-risk accounts
Tasks: 1. ✅ Query all IT Raven companies from HubSpot 2. ✅ Calculate "days since last contact" for each 3. ✅ Categorize: - Active: <30 days - At-Risk: 30-90 days - Churned: >90 days 4. ✅ Check for unresolved critical issues (from email extraction) 5. ✅ Generate health score (0-100) per client 6. ✅ Send alert email with at-risk list 7. ✅ Store health data in Notion database
Deliverables:
- tools/client_health_monitor.py
- Daily LaunchAgent execution (8am)
- Alert email template
Testing:
# Generate health report
uv run python3 tools/client_health_monitor.py --verbose
# Expected: List of at-risk clients with days since last contact
Phase 4: Lifecycle Stage Automation (2-3 hours)¶
Goal: Auto-update contact lifecycle stages based on email patterns
Tasks: 1. ✅ Define stage transition rules: - Lead → MQL: First meaningful response - MQL → SQL: Meeting scheduled - SQL → Opportunity: Quote/proposal sent - Opportunity → Customer: Contract signed - Customer → Active: Regular communication (<30 days) - Active → Inactive: No contact 30-90 days - Inactive → Churned: No contact >90 days 2. ✅ Implement backwards movement workaround (clear + set) 3. ✅ Trigger HubSpot workflows on stage changes 4. ✅ Log stage transitions with reasoning
Deliverables:
- tools/lifecycle_stage_manager.py
- Integration with CRM sync engine
Testing:
Phase 5: Weekly Intelligence Report (2-3 hours)¶
Goal: Automated weekly summary of client activity and health
Tasks: 1. ✅ Query HubSpot for: - Top 10 most active clients (email count this week) - At-risk clients (30-90 days since contact) - New companies added this week - Open action items summary 2. ✅ Generate markdown report 3. ✅ Send via email to IT Raven team 4. ✅ Store in Notion "Weekly Reports" database 5. ✅ Create LaunchAgent for Monday 8am execution
Deliverables:
- tools/weekly_intelligence_report.py
- Email template (HTML + markdown)
- Notion integration
Testing:
# Generate test report
uv run python3 tools/weekly_intelligence_report.py --week 2025-11-11
# Expected: Markdown report with client activity summary
Code Implementation¶
Phase 1: Email Intelligence Extractor¶
File: tools/email_intelligence_extractor.py
Key Functions:
def parse_email_file(filepath: Path) -> Dict:
"""Parse email markdown file and extract structured data."""
pass
def extract_contact_info(email_data: Dict) -> Dict:
"""Extract firstname, lastname, email, company from sender."""
pass
def classify_email_type(subject: str, body: str) -> str:
"""Classify: support, sales, meeting, general."""
pass
def extract_action_items(body: str) -> List[Dict]:
"""Use LLM to extract action items with assignee and due date."""
pass
def calculate_urgency(subject: str, body: str) -> str:
"""Determine: critical, high, medium, low."""
pass
LLM Prompt for Action Item Extraction:
PROMPT = """
Extract action items from this email:
Subject: {subject}
Body: {body}
Return JSON array with:
- task: Brief description
- assignee: Who should do it (if mentioned)
- due_date: Deadline (if mentioned)
- priority: critical/high/medium/low
Example:
[
{
"task": "Update firewall rules for Acme Corp",
"assignee": "John",
"due_date": "2025-11-20",
"priority": "high"
}
]
"""
Phase 2: Smart CRM Sync Engine¶
File: tools/hubspot_email_sync.py
Key Classes:
class HubSpotEmailSyncer:
def __init__(self, access_token: str, dry_run: bool = False):
self.client = HubSpot(access_token=access_token)
self.dry_run = dry_run
self.rate_limiter = RateLimiter(max_req=100, window=10) # 100/10s
def sync_contact(self, email_data: Dict) -> str:
"""Create or update contact from email data."""
contact_id = self._find_contact(email_data['email'])
if contact_id:
return self._update_contact(contact_id, email_data)
else:
return self._create_contact(email_data)
def sync_company(self, domain: str, company_name: str) -> str:
"""Create or update company from domain."""
pass
def link_contact_to_company(self, contact_id: str, company_id: str):
"""Associate contact with company."""
pass
def create_engagement(self, contact_id: str, email_data: Dict):
"""Log email as engagement in HubSpot."""
pass
def update_lifecycle_stage(self, contact_id: str, new_stage: str):
"""Update lifecycle stage with backwards movement handling."""
# Clear stage first if moving backwards
current_stage = self._get_lifecycle_stage(contact_id)
if self._is_backwards_movement(current_stage, new_stage):
self._clear_lifecycle_stage(contact_id)
time.sleep(0.5) # Ensure clear completes
self._set_lifecycle_stage(contact_id, new_stage)
Rate Limiting Implementation:
class RateLimiter:
def __init__(self, max_req: int, window: int):
self.max_req = max_req
self.window = window
self.requests = []
async def acquire(self):
"""Wait if rate limit would be exceeded."""
now = time.time()
# Remove requests outside window
self.requests = [r for r in self.requests if now - r < self.window]
if len(self.requests) >= self.max_req:
sleep_time = self.window - (now - self.requests[0])
print(f"Rate limit reached. Sleeping {sleep_time:.2f}s...")
await asyncio.sleep(sleep_time)
self.requests.append(now)
Exponential Backoff for 429 Errors:
async def make_request_with_retry(self, func, *args, max_retries=5, **kwargs):
"""Execute HubSpot API call with exponential backoff."""
for attempt in range(max_retries):
try:
return await func(*args, **kwargs)
except ApiException as e:
if e.status == 429: # Rate limited
if attempt >= max_retries - 1:
raise
delay = 2 ** attempt # 1s, 2s, 4s, 8s, 16s
print(f"Rate limited. Retry {attempt+1}/{max_retries} after {delay}s...")
await asyncio.sleep(delay)
else:
raise
Phase 3: Client Health Monitor¶
File: tools/client_health_monitor.py
Health Score Algorithm:
def calculate_health_score(company: Dict) -> int:
"""Calculate 0-100 health score for client."""
score = 100
# Days since last contact (max penalty: -40 points)
days_since_contact = (datetime.now() - company['last_contact_date']).days
if days_since_contact > 90:
score -= 40
elif days_since_contact > 60:
score -= 30
elif days_since_contact > 30:
score -= 20
elif days_since_contact > 14:
score -= 10
# Unresolved critical issues (max penalty: -30 points)
critical_issues = company.get('critical_issues', 0)
score -= min(critical_issues * 15, 30)
# Response time to emails (max penalty: -20 points)
avg_response_time_hours = company.get('avg_response_time_hours', 0)
if avg_response_time_hours > 48:
score -= 20
elif avg_response_time_hours > 24:
score -= 10
elif avg_response_time_hours > 8:
score -= 5
# Communication frequency (max penalty: -10 points)
emails_this_month = company.get('emails_this_month', 0)
if emails_this_month == 0:
score -= 10
elif emails_this_month < 2:
score -= 5
return max(0, score) # Floor at 0
At-Risk Alert Email:
def generate_alert_email(at_risk_clients: List[Dict]) -> str:
"""Generate HTML email with at-risk client list."""
return f"""
<html>
<body>
<h2>🚨 Client Health Alert</h2>
<p>The following clients require attention:</p>
<table>
<tr>
<th>Company</th>
<th>Days Since Contact</th>
<th>Health Score</th>
<th>Critical Issues</th>
</tr>
{''.join([
f"<tr><td>{c['name']}</td><td>{c['days']}</td><td>{c['score']}</td><td>{c['issues']}</td></tr>"
for c in at_risk_clients
])}
</table>
<p>Review and reach out to prevent churn.</p>
</body>
</html>
"""
Testing Strategy¶
Unit Tests¶
# Test email parsing
pytest tests/test_email_intelligence_extractor.py
# Test CRM sync logic (with mocked HubSpot API)
pytest tests/test_hubspot_email_sync.py
# Test health score calculations
pytest tests/test_client_health_monitor.py
Integration Tests¶
# Test against HubSpot sandbox portal
HUBSPOT_ACCESS_TOKEN=<sandbox_token> pytest tests/integration/test_hubspot_sync.py
# Test full email → CRM workflow
uv run python3 tests/integration/test_full_workflow.py
Manual Testing Checklist¶
- Email Extraction (Phase 1)
- [ ] Parse 10 sample IT Raven emails
- [ ] Verify correct contact/company extraction
-
[ ] Check action item extraction quality
-
CRM Sync (Phase 2)
- [ ] Dry-run sync 50 emails (no CRM changes)
- [ ] Live sync 10 emails to test portal
- [ ] Verify contacts created correctly
- [ ] Verify companies linked to contacts
-
[ ] Verify engagements logged
-
Health Monitoring (Phase 3)
- [ ] Run health check on all companies
- [ ] Verify at-risk client identification
-
[ ] Test alert email generation
-
Lifecycle Stages (Phase 4)
- [ ] Test forwards movement (Lead → MQL)
- [ ] Test backwards movement (Customer → Inactive)
-
[ ] Verify workflows triggered correctly
-
Weekly Reports (Phase 5)
- [ ] Generate test report for past week
- [ ] Verify data accuracy
- [ ] Test email delivery
Deployment¶
LaunchAgents Setup¶
1. Hourly Email Sync
File: ~/Library/LaunchAgents/com.itraven.hubspot.email_sync.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.itraven.hubspot.email_sync</string>
<key>ProgramArguments</key>
<array>
<string>/Users/bertfrichot/.local/bin/uv</string>
<string>run</string>
<string>python3</string>
<string>/Users/bertfrichot/mem-agent-mcp/tools/hubspot_email_sync.py</string>
</array>
<key>StartInterval</key>
<integer>3600</integer> <!-- Every hour -->
<key>StandardOutPath</key>
<string>/tmp/hubspot_email_sync.log</string>
<key>StandardErrorPath</key>
<string>/tmp/hubspot_email_sync.error.log</string>
</dict>
</plist>
2. Daily Health Check
File: ~/Library/LaunchAgents/com.itraven.hubspot.health_monitor.plist
<!-- Similar structure, runs daily at 8am -->
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>8</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
3. Weekly Intelligence Report
File: ~/Library/LaunchAgents/com.itraven.hubspot.weekly_report.plist
<!-- Runs Mondays at 8am -->
<key>StartCalendarInterval</key>
<dict>
<key>Weekday</key>
<integer>1</integer> <!-- Monday -->
<key>Hour</key>
<integer>8</integer>
</dict>
Load LaunchAgents:
launchctl load ~/Library/LaunchAgents/com.itraven.hubspot.email_sync.plist
launchctl load ~/Library/LaunchAgents/com.itraven.hubspot.health_monitor.plist
launchctl load ~/Library/LaunchAgents/com.itraven.hubspot.weekly_report.plist
Success Metrics¶
Quantitative Metrics¶
Time Savings: - Manual CRM entry: 10 hrs/week → 0 hrs/week - ROI: $5,000/month (at $50/hr contractor rate)
Data Quality: - Email tracking: 0% → 100% - Contact completeness: 60% → 95% - Company linkage accuracy: 70% → 98%
Client Retention: - At-risk client identification: 0 → 15-20 clients/month - Churn reduction: TBD (measure after 3 months)
Qualitative Metrics¶
- [ ] IT Raven team no longer manually enters CRM data
- [ ] Proactive outreach to at-risk clients increases
- [ ] Client health trends visible in weekly reports
- [ ] Lifecycle stages accurately reflect client status
- [ ] Action items from emails don't get lost
Next Steps¶
Immediate Actions (Next 2 weeks)¶
- Week 1: Implement Phases 1-2 (Email extraction + CRM sync)
- Week 2: Implement Phases 3-5 (Health monitoring + reporting)
Phase 1 First Task¶
Start with tools/email_intelligence_extractor.py:
# Create the extractor
touch tools/email_intelligence_extractor.py
chmod +x tools/email_intelligence_extractor.py
# Test on 10 sample emails
uv run python3 tools/email_intelligence_extractor.py --limit 10 --verbose
Questions for User¶
- Client Prioritization: Should certain clients (e.g., top revenue) get higher health score weight?
- Action Item Assignment: Should we auto-assign tasks to specific IT Raven team members?
- Alert Thresholds: Is 30 days the right threshold for "at-risk" status?
- Report Recipients: Who should receive the weekly intelligence reports?
Document Version: 1.0 Last Updated: 2025-11-13 Author: Claude Code (with Bert Frichot) Implementation Guide Source: Medium article by @Saurabh Rai
Ready to start implementation? Run: uv run python3 tools/email_intelligence_extractor.py --help