405 lines
12 KiB
Markdown
405 lines
12 KiB
Markdown
# PostgreSQL Analyzer - Practical MCP + Skills Demo
|
|
|
|
A production-ready database analysis tool using **MCP (Model Context Protocol)** and **Skills**. Connect to any PostgreSQL database, explore schemas, query data, and generate insights.
|
|
|
|
## What This Does
|
|
|
|
```
|
|
┌────────────────────────────────────────────────────────────────────┐
|
|
│ User Request │
|
|
│ "Analyze my orders table and find revenue trends" │
|
|
└────────────────────────────────────┬───────────────────────────────┘
|
|
│
|
|
┌────────────────────────────────▼─────────────────────────────┐
|
|
│ Skill: postgres-analyzer │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ Triggers on: database, analysis, query, insights │ │
|
|
│ │ │ │
|
|
│ │ Workflow: │ │
|
|
│ │ 1. get_schema() → Understand table structure │ │
|
|
│ │ 2. get_table_stats() → Get row counts, samples │ │
|
|
│ │ 3. execute_query() → Run revenue analysis SQL │ │
|
|
│ │ 4. analyze_column() → Check date ranges, distributions │ │
|
|
│ │ 5. Synthesize → Generate insights report │ │
|
|
│ └─────────────────────────────────────────────────────────┘ │
|
|
└────────────────────────────────┬─────────────────────────────┘
|
|
│ MCP Protocol
|
|
┌────────────────────────────────▼─────────────────────────────┐
|
|
│ MCP Server: postgres-analyzer │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ Tools: │ │
|
|
│ │ • get_schema() - List tables/columns │ │
|
|
│ │ • execute_query() - Run SELECT queries │ │
|
|
│ │ • get_table_stats() - Stats + sample data │ │
|
|
│ │ • analyze_column() - Deep column analysis │ │
|
|
│ │ │ │
|
|
│ │ Safety: Read-only, query limits, injection protection │ │
|
|
│ └─────────────────────────────────────────────────────────┘ │
|
|
└────────────────────────────────┬─────────────────────────────┘
|
|
│ psycopg2
|
|
┌────────────────────────────────▼─────────────────────────────┐
|
|
│ PostgreSQL Database │
|
|
│ (Any accessible PostgreSQL instance) │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install Dependencies
|
|
|
|
```bash
|
|
cd pg_analyzer_demo
|
|
pip install -r pg_mcp_server/requirements.txt
|
|
```
|
|
|
|
### 2. Set Database Connection
|
|
|
|
```bash
|
|
export PG_CONNECTION_STRING="postgresql://user:password@host:port/database"
|
|
|
|
# Examples:
|
|
# Local database:
|
|
export PG_CONNECTION_STRING="postgresql://postgres:secret@localhost:5432/myapp"
|
|
|
|
# Supabase:
|
|
export PG_CONNECTION_STRING="postgresql://postgres.xxxx:password@aws-0-region.pooler.supabase.com:5432/postgres"
|
|
|
|
# Railway/Render:
|
|
export PG_CONNECTION_STRING="postgresql://user:pass@host.render.com:5432/dbname"
|
|
```
|
|
|
|
### 3. Test with Demo Client
|
|
|
|
```bash
|
|
python demo.py
|
|
```
|
|
|
|
This interactively guides you through:
|
|
- Schema discovery
|
|
- Table analysis
|
|
- Custom queries
|
|
- Column deep-dives
|
|
|
|
## Components
|
|
|
|
### MCP Server (`pg_mcp_server/server.py`)
|
|
|
|
Exposes 4 tools for safe database access:
|
|
|
|
| Tool | Parameters | Returns |
|
|
|------|------------|---------|
|
|
| `get_schema` | `table_name` (optional) | All tables or specific table schema |
|
|
| `execute_query` | `query`, `limit` | Query results as markdown table |
|
|
| `get_table_stats` | `table_name`, `sample_size` | Row count, column stats, sample rows |
|
|
| `analyze_column` | `table_name`, `column_name` | Distribution, nulls, top values |
|
|
|
|
**Safety Features:**
|
|
- Read-only: Rejects INSERT/UPDATE/DELETE/DROP/CREATE
|
|
- Query limits: Auto-limits to 100 rows (max 1000)
|
|
- Connection pooling: Proper cleanup
|
|
- SQL injection protection: Uses parameterized queries
|
|
|
|
### Skill (`pg_analyzer_skill/SKILL.md`)
|
|
|
|
Teaches the AI:
|
|
|
|
1. **When to use**: Database questions, analysis needs
|
|
2. **Workflow**: Discovery → Deep Dive → Insights
|
|
3. **SQL Patterns**: Common analysis queries
|
|
4. **Safety Rules**: Read-only, performance, PII warnings
|
|
5. **Output Format**: Structured insights with recommendations
|
|
|
|
### Helper Script (`pg_analyzer_skill/scripts/generate_report.py`)
|
|
|
|
Generates formatted markdown reports from analysis results.
|
|
|
|
## Using with Kimi Code CLI
|
|
|
|
### Setup
|
|
|
|
**1. Configure MCP Server**
|
|
|
|
Add to `~/.kimi/mcp.json`:
|
|
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"postgres": {
|
|
"command": "python3",
|
|
"args": ["/absolute/path/to/pg_mcp_server/server.py"],
|
|
"env": {
|
|
"PG_CONNECTION_STRING": "postgresql://user:pass@host:5432/db"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**2. Install Skill**
|
|
|
|
```bash
|
|
mkdir -p ~/.config/agents/skills/
|
|
cp -r pg_analyzer_skill ~/.config/agents/skills/
|
|
```
|
|
|
|
Or use a local skill path:
|
|
|
|
```bash
|
|
kimi --skills-dir ./pg_analyzer_skill
|
|
```
|
|
|
|
**3. Start Kimi**
|
|
|
|
```bash
|
|
kimi
|
|
```
|
|
|
|
### Example Sessions
|
|
|
|
#### Schema Exploration
|
|
|
|
```
|
|
User: What tables are in my database?
|
|
|
|
→ get_schema()
|
|
← Shows all tables with column counts
|
|
|
|
Kimi: Your database has 4 tables:
|
|
• users (8 columns) - User accounts
|
|
• orders (10 columns) - Purchase orders
|
|
• products (6 columns) - Product catalog
|
|
• order_items (5 columns) - Line items
|
|
```
|
|
|
|
#### Data Analysis
|
|
|
|
```
|
|
User: Analyze the orders table
|
|
|
|
→ get_table_stats("orders", sample_size=5)
|
|
← Row count: 15,420 | Columns: 10 | Sample data...
|
|
|
|
→ analyze_column("orders", "total_amount")
|
|
← Min: $12.50 | Max: $2,450.00 | Avg: $187.30
|
|
|
|
→ analyze_column("orders", "status")
|
|
← completed: 9,252 (60%) | pending: 4,626 (30%) | cancelled: 1,542 (10%)
|
|
|
|
→ execute_query("SELECT DATE_TRUNC('month', created_at) as month,
|
|
SUM(total_amount) as revenue
|
|
FROM orders WHERE status = 'completed'
|
|
GROUP BY 1 ORDER BY 1 LIMIT 12")
|
|
← Monthly revenue trend data
|
|
|
|
Kimi: ## Orders Table Analysis
|
|
|
|
**Overview**: 15,420 orders with healthy volume
|
|
|
|
**Key Findings**:
|
|
- Average order value: $187.30 (healthy for retail)
|
|
- 10% cancellation rate - investigate why
|
|
- Revenue trending up 15% MoM
|
|
|
|
**Recommendations**:
|
|
1. Review cancellation process - rate is above industry avg (5-7%)
|
|
2. Promote products in $150-250 range (sweet spot)
|
|
3. Follow up on pending orders (30% of total)
|
|
```
|
|
|
|
#### Custom Queries
|
|
|
|
```
|
|
User: Who are my top 10 customers by revenue?
|
|
|
|
→ execute_query("""
|
|
SELECT u.email, SUM(o.total_amount) as total
|
|
FROM users u
|
|
JOIN orders o ON u.id = o.user_id
|
|
WHERE o.status = 'completed'
|
|
GROUP BY u.id, u.email
|
|
ORDER BY total DESC
|
|
LIMIT 10
|
|
""")
|
|
|
|
Kimi: Here are your top 10 customers:
|
|
| Email | Total Revenue |
|
|
|-------|---------------|
|
|
| john@example.com | $12,450 |
|
|
| sarah@example.com | $11,230 |
|
|
...
|
|
```
|
|
|
|
## Real-World Use Cases
|
|
|
|
### 1. Data Quality Audit
|
|
|
|
```
|
|
User: Check data quality in the users table
|
|
|
|
Kimi runs:
|
|
1. get_table_stats("users") - Overview
|
|
2. analyze_column("users", "email") - Check for nulls, duplicates
|
|
3. analyze_column("users", "created_at") - Date range validation
|
|
4. execute_query("SELECT COUNT(*) FROM users WHERE email NOT LIKE '%@%'")
|
|
|
|
Output: Data quality report with issues and recommendations
|
|
```
|
|
|
|
### 2. Business Metrics Dashboard
|
|
|
|
```
|
|
User: Give me a business overview
|
|
|
|
Kimi analyzes:
|
|
- User growth (signups by month)
|
|
- Revenue trends (completed orders)
|
|
- Product performance (top sellers)
|
|
- Churn indicators (inactive users)
|
|
|
|
Output: Executive summary with charts (as markdown tables)
|
|
```
|
|
|
|
### 3. Anomaly Detection
|
|
|
|
```
|
|
User: Find any unusual patterns in orders
|
|
|
|
Kimi checks:
|
|
- Orders with extreme amounts (outliers)
|
|
- Sudden spikes in cancellations
|
|
- Unusual time patterns (3am orders)
|
|
- Duplicate transactions
|
|
|
|
Output: Anomaly report with investigation queries
|
|
```
|
|
|
|
## Configuration Reference
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Required | Description |
|
|
|----------|----------|-------------|
|
|
| `PG_CONNECTION_STRING` | Yes | PostgreSQL connection URI |
|
|
| `PG_POOL_SIZE` | No | Connection pool size (default: 5) |
|
|
| `PG_QUERY_TIMEOUT` | No | Query timeout in seconds (default: 30) |
|
|
|
|
### Connection String Format
|
|
|
|
```
|
|
postgresql://[user[:password]@][host][:port][/dbname][?param1=value1&...]
|
|
|
|
Examples:
|
|
postgresql://localhost/mydb
|
|
postgresql://user:secret@localhost:5432/mydb?sslmode=require
|
|
postgresql://user:pass@host.supabase.co:5432/postgres?sslmode=require
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
### MCP Server Safety
|
|
|
|
1. **Read-Only Enforcement**: Only SELECT queries allowed
|
|
2. **Query Limits**: Max 1000 rows returned
|
|
3. **No DDL**: CREATE/ALTER/DROP rejected
|
|
4. **Connection Isolation**: Per-request connections
|
|
|
|
### Best Practices
|
|
|
|
- Use read-only database users
|
|
- Enable SSL for remote connections
|
|
- Monitor query logs
|
|
- Set appropriate query timeouts
|
|
|
|
## Extending
|
|
|
|
### Adding New Tools
|
|
|
|
Edit `pg_mcp_server/server.py`:
|
|
|
|
```python
|
|
Tool(
|
|
name="get_slow_queries",
|
|
description="Find slow running queries from pg_stat_statements",
|
|
inputSchema={
|
|
"type": "object",
|
|
"properties": {
|
|
"limit": {"type": "integer", "default": 10}
|
|
}
|
|
},
|
|
)
|
|
```
|
|
|
|
### Adding Analysis Patterns
|
|
|
|
Edit `pg_analyzer_skill/SKILL.md`:
|
|
|
|
```markdown
|
|
### Cohort Analysis
|
|
|
|
```sql
|
|
SELECT
|
|
DATE_TRUNC('month', first_order) as cohort,
|
|
COUNT(*) as users
|
|
FROM (
|
|
SELECT user_id, MIN(created_at) as first_order
|
|
FROM orders GROUP BY user_id
|
|
) first_orders
|
|
GROUP BY 1
|
|
```
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Connection Issues
|
|
|
|
```bash
|
|
# Test connection manually
|
|
psql "$PG_CONNECTION_STRING" -c "SELECT 1"
|
|
|
|
# Check server is running
|
|
pg_isready -h localhost -p 5432
|
|
```
|
|
|
|
### Permission Errors
|
|
|
|
Create a read-only user:
|
|
|
|
```sql
|
|
CREATE USER analyst WITH PASSWORD 'safe_password';
|
|
GRANT CONNECT ON DATABASE mydb TO analyst;
|
|
GRANT USAGE ON SCHEMA public TO analyst;
|
|
GRANT SELECT ON ALL TABLES IN SCHEMA public TO analyst;
|
|
```
|
|
|
|
### Performance
|
|
|
|
For large tables, add WHERE clauses:
|
|
|
|
```sql
|
|
-- Good: Limited time range
|
|
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '30 days'
|
|
|
|
-- Bad: Full table scan
|
|
SELECT * FROM orders
|
|
```
|
|
|
|
## Comparison: MCP vs Direct Connection
|
|
|
|
| Aspect | MCP + Skills | Direct SQL |
|
|
|--------|--------------|------------|
|
|
| **Safety** | ✅ Read-only enforced | ⚠️ User responsibility |
|
|
| **Guidance** | ✅ AI knows analysis patterns | ❌ Manual SQL writing |
|
|
| **Insights** | ✅ Automatic synthesis | ❌ Raw data only |
|
|
| **Reusability** | ✅ Skill applies to any DB | ❌ Custom each time |
|
|
| **Setup** | ⚠️ Requires configuration | ✅ Direct access |
|
|
|
|
## Resources
|
|
|
|
- [MCP Documentation](https://modelcontextprotocol.io/)
|
|
- [PostgreSQL Docs](https://www.postgresql.org/docs/)
|
|
- [psycopg2 Guide](https://www.psycopg.org/docs/)
|
|
|
|
## License
|
|
|
|
MIT - Use this as a foundation for your own database analysis tools!
|