The Proof — DocPro's Architecture, Code & Metrics

The short version

Three claims. The receipts are below.

Before the diagrams and the code: here is what DocPro actually does, in one line each. Skip to any section for the implementation behind it.

Claim 01

Memory is persistent.

Each team member keeps a per-account memory store that is appended every session and recalled at the next one — timestamped and source-tagged, so knowledge carries forward instead of resetting.

Claim 02

Review improves over time.

Code Review writes findings and anchors back to a persistent ledger. Repeat passes against a hardened tree return fewer findings, not the same list — the record is what makes that measurable.

Claim 03

Build Mode performs real work.

Build Mode dispatches a milestone as an isolated subagent, runs it against your real files, and chains to the next on completion — or calls your phone when it hits a blocker.

Architecture

The Memory Pipeline

Every session produces knowledge. The pipeline extracts it, stores it, enforces it, compresses it, and recalls it — across every future session.

Session IDE or web session generates conversation turns

Extraction Preferences, decisions, and knowledge isolated per team member

Memory Persistent per-member storage with timestamped sessions

Enforcement Client preferences injected into every session prompt

Compression AI-driven reduction preserving critical details and preferences

Recall Topic-relevant memory assembled per session start

Call Pipeline

Voice callbacks with dynamic context generation. The team calls your phone with session-aware openers — not scripts.

context_service event_service ide_integration onboarding_service

Build Mode

Autonomous multi-milestone project execution. Subagent-per-milestone dispatch with fresh context windows.

project_service advance_milestone orchestrator session_cap

Document Engine

Professional output generation — SOWs, SOPs, user manuals, frameworks. Template-driven with brand injection.

sow_builder sop_builder manual_builder framework_gen

Real Code

Production Patterns

Not cherry-picked examples. Actual patterns from the running system that make persistent memory work.

persona_memory_service.py Memory

# Every session appends to persistent memory
# with timestamped source tracking

timestamp = datetime.now(timezone.utc)
separator = f"\n\n--- {source} ({timestamp}) ---\n"

if memory:
    existing = memory.content.strip()
    memory.content = existing + separator + new_content
    memory.updated_at = datetime.now(timezone.utc)
else:
    memory = PersonaMemory(
        user_id=user_id,
        persona_key=persona_key,
        content=new_content,
    )

Memory grows per session, per team member. Each append is timestamped and source-tagged — the system knows where every piece of knowledge came from.

context_service.py Pipeline

# Call context adapts to WHY the call is happening

def get_trigger_context(trigger_type, call_topic=None):
    if trigger_type == "manual":
        if call_topic:
            return (
                f'The user asked you to call about '
                f'a SPECIFIC topic: "{call_topic}". '
                'Lead with the topic.'
            )
    elif trigger_type == "team_blocked":
        return "The team is working on a milestone..."

Six trigger types. Each one shapes how the team member opens the call — from "you requested this" to "the build hit a blocker." Context-aware, not scripted.

persona_memory_service.py Compression

# Compression preserves critical preferences
# MUST-ENFORCE count validated before and after

before_count = content.count("MUST-ENFORCE")
compressed = await compress_with_ai(content)
after_count = compressed.count("MUST-ENFORCE")

if after_count < before_count:
    logger.warning("Compression dropped preferences — aborting")
    return None  # Never lose client preferences

# Optimistic lock prevents concurrent corruption
if memory.updated_at != lock_timestamp:
    return None  # Someone wrote while we compressed

Memory gets sharper, not just smaller. AI compression with a hard rule: never lose a client preference. Optimistic locking prevents concurrent writes from corrupting the compressed result.

project_service.py Build Mode

# Each milestone runs as an isolated subagent
# with a fresh 1M context window

async def advance_milestone(project, action, status):
    if action == "complete":
        sign_off_milestone(current)
        next_ms = kick_off_milestone(project)
        context = build_milestone_context(next_ms)
        return {"action": "continue", "context": context}
    elif action == "blocked":
        project.build_mode_paused = True
        initiate_call(persona="carl")
        return {"action": "paused"}

Build Mode dispatches autonomous agents per milestone. When blocked, Carl calls your phone with the context. When complete, the chain continues. No manual intervention.

By the Numbers

Production Metrics

Not aspirational projections. Real numbers from the running system, verifiable against the codebase.

3,000+

Hours of Building

Sessions, commits, calls, architecture — the full investment

1,100+

Automated Tests

Full pytest suite — green before every deploy

73K+

Chars of Memory

Largest team member's persistent knowledge base

2,600+

Commits Shipped

git log — every change tracked, every decision documented

Last verified: June 2026 · Sources: git history, pytest suite, production memory store, release ledger.

Architecture

The Request Path

From your VS Code sidebar to the team and back — the real path every session, review, and call travels.

Your machine

VS Code extension

The DocPro sidebar — sessions, Build Mode, Code Review, calls

MCP server / Claude Code

Runs in user space with your own Anthropic key

HTTPS 443 · outbound

DocPro Cloud · AWS us-east-1

Session orchestration

Routes each turn; loads the right specialist and context

Memory · project context · review ledger

Per-specialist memory, field-encrypted at rest, recalled every session

server-side · selected workflows

External providers

Anthropic Claude API

Reasoning, review, and synthesis — via your own key

Gemini

Build Mode image generation — your key, server-side

Voice synthesis

Text-to-speech for team phone calls

…and back to your sidebar — results, review findings, and the team’s memory of the session.

Stack

What Powers Each Layer

Backend

Python 3.12 FastAPI SQLAlchemy asyncpg Alembic

Frontend

React 18 Vite Zustand Lucide Icons

Database

PostgreSQL 16 Fernet Encryption Async Sessions

AI Layer

Claude Sonnet 4.6 1M Context Streaming SSE Tool Execution

IDE

VS Code Extension MCP Server Session State Build Mode UI

Infrastructure

AWS Lightsail nginx systemd

How to verify

Claim, evidence, and the honest limit.

For the three big claims: what we assert, where to confirm it, and where the line sits. We'd rather hand you the limit than have you find it yourself.

“Memory is persistent and carries across sessions.”

Evidence Each session appends to a per-member store with a timestamped, source-tagged separator; a topic-relevant slice is recalled at the next session start. Pattern shown above from persona_memory_service.py.

Limit Memory is per-account and isolated — there is no shared or organizational memory today. The lookup metadata recall searches against (names, keyword tags, short summaries) stays in plaintext so search works; the conversation content itself is field-encrypted at rest.

See it Ask any team member on a later session what you decided last time, then delete that memory and ask again. The Security page documents exactly what is stored and what is plaintext.

“Review improves over time, not just per-run.”

Evidence Code Review writes findings and anchors to a persistent review ledger that is recalled on the next pass. Compression validates that no MUST-ENFORCE preference is dropped before it commits a smaller memory — pattern shown above.

Limit “Improves” means the team carries prior findings forward and converges on a hardened tree — it is not a formal accuracy benchmark. Review still runs on what you share in a session, not an automatic scan of your whole repo.

See it Run a review pass, fix the findings, and run it again on the same tree. The second pass should reference the first and return a shorter list, not repeat it.

“Build Mode performs real, autonomous work.”

Evidence Build Mode dispatches each milestone as an isolated subagent with a fresh context window, chains to the next on completion, and pauses + calls your phone on a blocker — pattern shown above from project_service.py.

Limit Build Mode reads the files a requested milestone needs and is user-initiated — it does not run on its own or touch files outside the work you start. Image generation in a build goes to Gemini server-side with your key.

See it Start a Build Mode project and watch the milestones advance in the sidebar; force a blocker and confirm the call comes through. The For IT page lays out exactly what leaves the machine.

Verify

Trace the Architecture

Every pattern mentioned on this page has a real implementation. Here's where to look.

Orchestrator Facade Pattern

When a 4,000-line module splits into six, every consumer keeps working. The facade re-exports the public API. Zero breaking changes across 11 import sites.

call_pipeline/__init__.py → 6 submodules

Memory Compression Engine

AI-driven compression with preference count validation. If a single MUST-ENFORCE preference is lost during compression, the operation aborts and the original is preserved.

persona_memory_service.py → compress_persona_memory()

Build Mode Dispatch

Each milestone runs as an isolated subagent with a fresh context window. The orchestrator chains completion calls — when one finishes, the next begins automatically.

ide.py → advance_milestone endpoint

POST-EXTRACTION RULE

After any component extraction, every JSX identifier is grep-verified against the import list. Three extraction bugs taught this rule. It's now enforced on every split.

Enforced across all frontend decompositions

Every claim. Verified.