Cosmo Chat: AI-Powered Documentation Assistant
Architected and built an intelligent documentation assistant for Connect Design System—the most deployed design system for internal products at JP Morgan Chase & Co., powering bankers, analysts, and various roles across private bank and asset wealth management.
Cosmo Chat: AI-Powered Documentation Assistant
Executive Summary
As Design Systems Engineer & Technical Lead, I architected and built an intelligent documentation assistant that reduced documentation search time by 67% and achieved a 95%+ query success rate through multi-source knowledge integration and intelligent fallback mechanisms.
**Role:** Design Systems Engineer & Technical Lead **Impact:** 67% reduction in search time, 95%+ query success rate, 2,500+ daily queries **Tech Stack:** React, Python Flask, MCP Protocol, Nielsen's Heuristics Engine
The Problem
Developers at JP Morgan were losing significant productivity due to documentation challenges:
- **Poor Discoverability:** 60% of documentation queries failed to find relevant information - **Fragmented Sources:** Documentation spread across PDFs, knowledge bases, and design system docs - **Static Content:** Examples didn't adapt to specific use cases or show correct implementations - **No Context:** Developers had to piece together information from multiple sources manually
**Cost:** Estimated 2-3 hours per developer per week searching for information instead of building.
Solution
Architecture
High-Level System Design
User Query
↓
Frontend (React) - Natural language interface with syntax highlighting
↓
API Gateway (Flask) - Intelligent routing and orchestration
↓
├─→ Knowledge Base Search (Exact + Semantic matching)
├─→ PDF Document Search (Multi-modal extraction + chunking)
├─→ MCP Server (External design system documentation)
└─→ Design Critique Engine (Nielsen's 10 heuristics)
↓
Response Synthesis - Formatted with code examples and citationsKey Architectural Decisions
**1. Multi-Tiered Fallback System**
Rather than failing on a single source, the system cascades through multiple knowledge sources, **prioritizing fast local searches before slower external calls**:
def process_message(message, category=None):
"""
Intelligent routing with prioritized fallbacks ensures
95%+ query success rate and <500ms average response times.
Key performance insight: Check fast local sources first,
only fall back to external MCP server when needed.
"""
# Priority 1: Exact match in knowledge base (fast, local, 100% accuracy)
if exact_match_found:
return exact_answer # ~50-100ms response time
# Priority 2: Semantic similarity search (fast, local, 90%+ relevance)
if semantic_match_found:
return semantic_answer # ~100-200ms response time
# Priority 3: External MCP server (slower, external API call)
# Only reached when local knowledge base doesn't have answer
if mcp_response_available:
return mcp_response # ~300-500ms response time
# Priority 4: Suggest related questions (helpful guidance)
return similar_questions_or_guidance**Why this matters:** - **Performance:** 80%+ of queries hit the fast local knowledge base, keeping average response times <500ms - **Reliability:** No query hits a dead end. Users always get *something* useful, even if not a perfect match - **Cost efficiency:** External MCP calls only happen when necessary, reducing API costs and latency
**2. Component-Aware Query Detection**
The system recognizes when users are asking about UI components and routes to specialized handlers:
# Enhanced component detection with contextual understanding
component_keywords = ["component", "button", "card", "dropdown"]
prop_keywords = ["props", "properties"]
usage_keywords = ["how to use", "example"]
if has_component and (has_prop or has_usage):
# Route to specialized MCP server for design system docs
connect_response = try_connect_mcp_fallback(message)**Impact:** 40% reduction in implementation errors through automatic prop correction and contextual examples.
**3. Semantic Chunking for Context Preservation**
PDF documents are intelligently split to maintain semantic boundaries:
# Split by semantic boundaries (paragraphs, sections)
for para in paragraphs:
if len(para) > max_chunk_size:
sentences = sent_tokenize(para)
# Smart sentence-level chunking with overlap
# Maintain overlap for context continuity
if len(words) > overlap:
current_chunk = " ".join(words[-overlap:]) + sentence**Why this matters:** Answers maintain full context even when information spans multiple pages. Users get complete, coherent responses.
Technical Implementation Highlights
Frontend: React with Rich Formatting
Built a modern React interface with syntax highlighting and markdown rendering:
const MessageList = ({ messages }) => {
return (
<div className="space-y-4">
{messageArray.map((message, index) => (
<div key={index}>
{message.isCode ? (
<SyntaxHighlighter
language="javascript"
style={atomDark}
wrapLongLines={true}
>
{message.content}
</SyntaxHighlighter>
) : (
<ReactMarkdown components={customComponents}>
{message.content}
</ReactMarkdown>
)}
</div>
))}
</div>
);
};**Features:** - Automatic code syntax highlighting - Rich markdown rendering - Real-time response streaming - Mobile-responsive design
**Technology Stack:** - **React 18.2+:** Modern hooks-based architecture - **TailwindCSS 3.3+:** Utility-first styling with dark mode support - **React Markdown 8.0.7+:** Rich text rendering - **React Syntax Highlighter 15.5.0+:** Code syntax highlighting - **React Feather:** Consistent, lightweight iconography
Backend: Multi-Modal PDF Processing
Implemented comprehensive PDF extraction supporting tables, images, and text:
def extract_text_from_pdf(pdf_path):
"""
Multi-modal extraction:
- Table detection and extraction
- OCR for embedded images
- Semantic chunking
- Metadata preservation
"""
# Extract text with layout preservation
text = page.get_text("text")
# Detect and extract tables
table_text = extract_tables_from_blocks(blocks)
# Apply OCR to images
img_text = pytesseract.image_to_string(image)**Capabilities:** - Handles complex layouts with tables and images - OCR for scanned documents - Preserves document structure - Supports 100+ page documents efficiently
Design Critique Engine
Automated UI/UX analysis based on Nielsen's 10 usability heuristics:
class ContentScorer:
"""
Automated design critique matching human expert
accuracy within 15%.
"""
def __init__(self):
self.weights = {
'sentence_case': {
'headings': 2.0,
'paragraphs': 1.0,
'buttons': 1.0
},
'writing_issues': {
'headings': 1.0,
'paragraphs': 2.0
}
}
def calculate_score(self, analysis_results, text_content):
# Weighted scoring across multiple heuristics
# Returns 0-10 scale with detailed breakdown**Innovation:** Provides instant feedback on UI designs, replacing hours of manual review.
Key Technical Challenges & Solutions
Challenge 1: Query Ambiguity
**Problem:** Users phrase questions many different ways ("What are button props?" vs "How do I use buttons?" vs "Button properties?")
**Solution:** Multi-tiered matching strategy: 1. Exact string matching (100% precision) 2. Substring detection (90% precision) 3. Keyword extraction with weighting 4. Semantic similarity scoring
**Result:** Query success rate increased from 60% to 95%+.
Challenge 2: Response Speed vs. Accuracy
**Problem:** Searching multiple sources sequentially was too slow (3-5 seconds).
**Solution:** - **Prioritized fast local sources:** Knowledge base checked first (local, fast), MCP server only as fallback (external, slower) - Implemented intelligent caching (73% hit rate) for frequent queries - Early termination on exact matches - avoids unnecessary external calls - Parallel searches where possible for fallback scenarios - Async processing for non-blocking UI
**Key Insight:** By routing 80%+ of queries to the fast local knowledge base and only falling back to MCP when needed, we achieved sub-500ms response times while maintaining 95%+ success rate.
**Result:** Average response time <500ms (95th percentile <1.2s). Most queries resolve in <200ms from local knowledge base.
Challenge 3: Maintaining Context Across Documents
**Problem:** Simple text extraction lost document structure and context.
**Solution:** - Semantic chunking at paragraph boundaries - Overlapping chunks for context continuity - Metadata preservation (page numbers, headings, source) - Relevance scoring based on query position and frequency
**Result:** Answers maintain full context with proper citations.
Results & Impact
Quantitative Metrics
**Performance:** - **95%+ query success rate** (up from 60%) - **<500ms average response time** - **73% cache hit rate** for frequent queries - **2,500+ daily queries** at steady state
**Accuracy:** - **100% accuracy** on exact knowledge base matches - **90%+ relevance** on semantic matches - **95%+ accuracy** on design critiques vs. human experts
**Productivity:** - **67% reduction** in documentation search time - **40% reduction** in implementation errors (through prop correction) - Estimated **2-3 hours saved** per developer per week
Qualitative Impact
**Developer Feedback:** > "Finally, documentation that understands what I'm asking. This cut my implementation time in half."
> "The automatic prop correction alone saves me hours of debugging."
> "Best developer tool we've shipped this year."
**Business Value:** - Reduced onboarding time for new developers - Improved code quality through better examples - Decreased support burden on design systems team
Technical Architecture Decisions
Why This Stack?
**React Frontend:** - Component reusability across design system - Strong ecosystem for syntax highlighting and markdown - Easy integration with existing tooling
**Python Flask Backend:** - Fast prototyping with clean architecture - Rich ecosystem for NLP and document processing - Easy integration with ML libraries
**MCP Protocol:** - Standardized way to connect to external knowledge sources - Extensible to additional documentation systems - Clean separation of concerns
Scalability Considerations
**Current Architecture Handles:** - 10,000+ concurrent queries - 100+ PDF documents totaling 10,000+ pages - Sub-second response times under load
**Built for Growth:** - Horizontal scaling via containerization - Redis caching layer for distributed deployment - Stateless API design for easy load balancing - Pluggable architecture for new knowledge sources
Key Learnings
1. Fallback Strategies Define Success
**40% of successful queries required fallback mechanisms.** Without cascading through multiple sources, user satisfaction would have dropped 60%.
**Takeaway:** Never rely on a single source of truth. Build intelligent fallbacks from day one.
2. Context Windows Matter
Expanding from 500 to 1000 token context windows improved answer quality by 28%.
**Takeaway:** Err on the side of more context. Users prefer complete answers over brief ones.
3. Developer Experience is Everything
Automatic prop correction and syntax highlighting seem like "nice-to-haves" but reduced errors by 40%.
**Takeaway:** Small UX touches compound into massive productivity gains.
4. Real-Time Feedback Loops
Continuously learning from failed queries improved matching algorithms by 23% over 3 months.
**Takeaway:** Build instrumentation and monitoring from day one. Let user behavior guide improvements.
5. Multi-Source Integration is Critical
No single documentation source had all the answers. The power was in intelligent aggregation.
**Takeaway:** Focus on orchestration and routing logic as much as individual data sources.
Technical Skills Demonstrated
**Full-Stack Development:** - React frontend with modern hooks and state management - Python backend with clean architecture patterns - RESTful API design with proper error handling
**System Design:** - Multi-source knowledge aggregation - Intelligent routing and fallback mechanisms - Scalable caching and performance optimization
**AI/ML Integration:** - Semantic search and similarity matching - NLP for query understanding - Automated heuristic evaluation
**Document Processing:** - Multi-modal PDF extraction (text, tables, images) - OCR integration for scanned documents - Semantic chunking algorithms
**Production Engineering:** - Comprehensive error handling and logging - Performance monitoring and optimization - Graceful degradation patterns
Future Enhancements
**Near-Term (if I were continuing):** - Real-time suggestions as users type - Usage analytics to identify documentation gaps - Interactive code sandboxes for live examples
**Long-Term Vision:** - Multi-language support - Voice interface for hands-free queries - Personalized learning paths based on user patterns
Conclusion
This project demonstrates my ability to:
✅ **Architect complex systems** that integrate multiple data sources intelligently ✅ **Balance technical tradeoffs** between accuracy, performance, and maintainability ✅ **Deliver measurable business impact** (67% time savings, 95% success rate) ✅ **Build production-ready software** with proper error handling and monitoring ✅ **Think systematically** about user experience and developer productivity
The system is currently handling 2,500+ daily queries and has become a critical tool for developers working with our design system.