nanoagent/README.md

405 lines
12 KiB
Markdown

# PostgreSQL Analyzer - Practical MCP + Skills Demo
A production-ready database analysis tool using **MCP (Model Context Protocol)** and **Skills**. Connect to any PostgreSQL database, explore schemas, query data, and generate insights.
## What This Does
```
┌────────────────────────────────────────────────────────────────────┐
│ User Request │
│ "Analyze my orders table and find revenue trends" │
└────────────────────────────────────┬───────────────────────────────┘
┌────────────────────────────────▼─────────────────────────────┐
│ Skill: postgres-analyzer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Triggers on: database, analysis, query, insights │ │
│ │ │ │
│ │ Workflow: │ │
│ │ 1. get_schema() → Understand table structure │ │
│ │ 2. get_table_stats() → Get row counts, samples │ │
│ │ 3. execute_query() → Run revenue analysis SQL │ │
│ │ 4. analyze_column() → Check date ranges, distributions │ │
│ │ 5. Synthesize → Generate insights report │ │
│ └─────────────────────────────────────────────────────────┘ │
└────────────────────────────────┬─────────────────────────────┘
│ MCP Protocol
┌────────────────────────────────▼─────────────────────────────┐
│ MCP Server: postgres-analyzer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Tools: │ │
│ │ • get_schema() - List tables/columns │ │
│ │ • execute_query() - Run SELECT queries │ │
│ │ • get_table_stats() - Stats + sample data │ │
│ │ • analyze_column() - Deep column analysis │ │
│ │ │ │
│ │ Safety: Read-only, query limits, injection protection │ │
│ └─────────────────────────────────────────────────────────┘ │
└────────────────────────────────┬─────────────────────────────┘
│ psycopg2
┌────────────────────────────────▼─────────────────────────────┐
│ PostgreSQL Database │
│ (Any accessible PostgreSQL instance) │
└──────────────────────────────────────────────────────────────┘
```
## Quick Start
### 1. Install Dependencies
```bash
cd pg_analyzer_demo
pip install -r pg_mcp_server/requirements.txt
```
### 2. Set Database Connection
```bash
export PG_CONNECTION_STRING="postgresql://user:password@host:port/database"
# Examples:
# Local database:
export PG_CONNECTION_STRING="postgresql://postgres:secret@localhost:5432/myapp"
# Supabase:
export PG_CONNECTION_STRING="postgresql://postgres.xxxx:password@aws-0-region.pooler.supabase.com:5432/postgres"
# Railway/Render:
export PG_CONNECTION_STRING="postgresql://user:pass@host.render.com:5432/dbname"
```
### 3. Test with Demo Client
```bash
python demo.py
```
This interactively guides you through:
- Schema discovery
- Table analysis
- Custom queries
- Column deep-dives
## Components
### MCP Server (`pg_mcp_server/server.py`)
Exposes 4 tools for safe database access:
| Tool | Parameters | Returns |
|------|------------|---------|
| `get_schema` | `table_name` (optional) | All tables or specific table schema |
| `execute_query` | `query`, `limit` | Query results as markdown table |
| `get_table_stats` | `table_name`, `sample_size` | Row count, column stats, sample rows |
| `analyze_column` | `table_name`, `column_name` | Distribution, nulls, top values |
**Safety Features:**
- Read-only: Rejects INSERT/UPDATE/DELETE/DROP/CREATE
- Query limits: Auto-limits to 100 rows (max 1000)
- Connection pooling: Proper cleanup
- SQL injection protection: Uses parameterized queries
### Skill (`pg_analyzer_skill/SKILL.md`)
Teaches the AI:
1. **When to use**: Database questions, analysis needs
2. **Workflow**: Discovery → Deep Dive → Insights
3. **SQL Patterns**: Common analysis queries
4. **Safety Rules**: Read-only, performance, PII warnings
5. **Output Format**: Structured insights with recommendations
### Helper Script (`pg_analyzer_skill/scripts/generate_report.py`)
Generates formatted markdown reports from analysis results.
## Using with Kimi Code CLI
### Setup
**1. Configure MCP Server**
Add to `~/.kimi/mcp.json`:
```json
{
"mcpServers": {
"postgres": {
"command": "python3",
"args": ["/absolute/path/to/pg_mcp_server/server.py"],
"env": {
"PG_CONNECTION_STRING": "postgresql://user:pass@host:5432/db"
}
}
}
}
```
**2. Install Skill**
```bash
mkdir -p ~/.config/agents/skills/
cp -r pg_analyzer_skill ~/.config/agents/skills/
```
Or use a local skill path:
```bash
kimi --skills-dir ./pg_analyzer_skill
```
**3. Start Kimi**
```bash
kimi
```
### Example Sessions
#### Schema Exploration
```
User: What tables are in my database?
→ get_schema()
← Shows all tables with column counts
Kimi: Your database has 4 tables:
• users (8 columns) - User accounts
• orders (10 columns) - Purchase orders
• products (6 columns) - Product catalog
• order_items (5 columns) - Line items
```
#### Data Analysis
```
User: Analyze the orders table
→ get_table_stats("orders", sample_size=5)
← Row count: 15,420 | Columns: 10 | Sample data...
→ analyze_column("orders", "total_amount")
← Min: $12.50 | Max: $2,450.00 | Avg: $187.30
→ analyze_column("orders", "status")
← completed: 9,252 (60%) | pending: 4,626 (30%) | cancelled: 1,542 (10%)
→ execute_query("SELECT DATE_TRUNC('month', created_at) as month,
SUM(total_amount) as revenue
FROM orders WHERE status = 'completed'
GROUP BY 1 ORDER BY 1 LIMIT 12")
← Monthly revenue trend data
Kimi: ## Orders Table Analysis
**Overview**: 15,420 orders with healthy volume
**Key Findings**:
- Average order value: $187.30 (healthy for retail)
- 10% cancellation rate - investigate why
- Revenue trending up 15% MoM
**Recommendations**:
1. Review cancellation process - rate is above industry avg (5-7%)
2. Promote products in $150-250 range (sweet spot)
3. Follow up on pending orders (30% of total)
```
#### Custom Queries
```
User: Who are my top 10 customers by revenue?
→ execute_query("""
SELECT u.email, SUM(o.total_amount) as total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.id, u.email
ORDER BY total DESC
LIMIT 10
""")
Kimi: Here are your top 10 customers:
| Email | Total Revenue |
|-------|---------------|
| john@example.com | $12,450 |
| sarah@example.com | $11,230 |
...
```
## Real-World Use Cases
### 1. Data Quality Audit
```
User: Check data quality in the users table
Kimi runs:
1. get_table_stats("users") - Overview
2. analyze_column("users", "email") - Check for nulls, duplicates
3. analyze_column("users", "created_at") - Date range validation
4. execute_query("SELECT COUNT(*) FROM users WHERE email NOT LIKE '%@%'")
Output: Data quality report with issues and recommendations
```
### 2. Business Metrics Dashboard
```
User: Give me a business overview
Kimi analyzes:
- User growth (signups by month)
- Revenue trends (completed orders)
- Product performance (top sellers)
- Churn indicators (inactive users)
Output: Executive summary with charts (as markdown tables)
```
### 3. Anomaly Detection
```
User: Find any unusual patterns in orders
Kimi checks:
- Orders with extreme amounts (outliers)
- Sudden spikes in cancellations
- Unusual time patterns (3am orders)
- Duplicate transactions
Output: Anomaly report with investigation queries
```
## Configuration Reference
### Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `PG_CONNECTION_STRING` | Yes | PostgreSQL connection URI |
| `PG_POOL_SIZE` | No | Connection pool size (default: 5) |
| `PG_QUERY_TIMEOUT` | No | Query timeout in seconds (default: 30) |
### Connection String Format
```
postgresql://[user[:password]@][host][:port][/dbname][?param1=value1&...]
Examples:
postgresql://localhost/mydb
postgresql://user:secret@localhost:5432/mydb?sslmode=require
postgresql://user:pass@host.supabase.co:5432/postgres?sslmode=require
```
## Security Considerations
### MCP Server Safety
1. **Read-Only Enforcement**: Only SELECT queries allowed
2. **Query Limits**: Max 1000 rows returned
3. **No DDL**: CREATE/ALTER/DROP rejected
4. **Connection Isolation**: Per-request connections
### Best Practices
- Use read-only database users
- Enable SSL for remote connections
- Monitor query logs
- Set appropriate query timeouts
## Extending
### Adding New Tools
Edit `pg_mcp_server/server.py`:
```python
Tool(
name="get_slow_queries",
description="Find slow running queries from pg_stat_statements",
inputSchema={
"type": "object",
"properties": {
"limit": {"type": "integer", "default": 10}
}
},
)
```
### Adding Analysis Patterns
Edit `pg_analyzer_skill/SKILL.md`:
```markdown
### Cohort Analysis
```sql
SELECT
DATE_TRUNC('month', first_order) as cohort,
COUNT(*) as users
FROM (
SELECT user_id, MIN(created_at) as first_order
FROM orders GROUP BY user_id
) first_orders
GROUP BY 1
```
```
## Troubleshooting
### Connection Issues
```bash
# Test connection manually
psql "$PG_CONNECTION_STRING" -c "SELECT 1"
# Check server is running
pg_isready -h localhost -p 5432
```
### Permission Errors
Create a read-only user:
```sql
CREATE USER analyst WITH PASSWORD 'safe_password';
GRANT CONNECT ON DATABASE mydb TO analyst;
GRANT USAGE ON SCHEMA public TO analyst;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO analyst;
```
### Performance
For large tables, add WHERE clauses:
```sql
-- Good: Limited time range
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '30 days'
-- Bad: Full table scan
SELECT * FROM orders
```
## Comparison: MCP vs Direct Connection
| Aspect | MCP + Skills | Direct SQL |
|--------|--------------|------------|
| **Safety** | ✅ Read-only enforced | ⚠️ User responsibility |
| **Guidance** | ✅ AI knows analysis patterns | ❌ Manual SQL writing |
| **Insights** | ✅ Automatic synthesis | ❌ Raw data only |
| **Reusability** | ✅ Skill applies to any DB | ❌ Custom each time |
| **Setup** | ⚠️ Requires configuration | ✅ Direct access |
## Resources
- [MCP Documentation](https://modelcontextprotocol.io/)
- [PostgreSQL Docs](https://www.postgresql.org/docs/)
- [psycopg2 Guide](https://www.psycopg.org/docs/)
## License
MIT - Use this as a foundation for your own database analysis tools!