nanoagent/pg_analyzer_skill/SKILL.md

194 lines
4.3 KiB
Markdown

---
name: postgres-analyzer
description: PostgreSQL database analysis and querying. Use when the user needs to explore database schema, query data, analyze table statistics, or get insights from PostgreSQL databases. Requires PG_CONNECTION_STRING environment variable to be set.
---
# PostgreSQL Analyzer Skill
Guide for analyzing PostgreSQL databases and extracting insights.
## When to Use
- Exploring database structure and schema
- Querying data with SQL
- Analyzing table statistics and data quality
- Understanding column distributions
- Generating data insights and reports
## Available Tools
| Tool | Purpose | Use When |
|------|---------|----------|
| `get_schema` | List tables and columns | Starting analysis, understanding structure |
| `execute_query` | Run SELECT queries | Getting specific data, custom analysis |
| `get_table_stats` | Table statistics + sample | Understanding a table's data profile |
| `analyze_column` | Deep column analysis | Investigating specific column patterns |
## Analysis Workflow
### 1. Discovery Phase
Always start with schema exploration:
```
User: "Analyze my database"
Step 1: get_schema() → See all tables
Step 2: For each interesting table, get_table_stats()
```
### 2. Deep Dive Phase
Focus on specific areas:
```
Step 3: analyze_column() for key columns
Step 4: execute_query() for custom analysis
```
### 3. Insight Phase
Synthesize findings:
```
Step 5: Identify patterns, anomalies, recommendations
```
## Query Patterns
### Common Analysis Queries
**Time-based aggregation:**
```sql
SELECT
DATE_TRUNC('month', created_at) as month,
COUNT(*) as count,
AVG(amount) as avg_amount
FROM orders
GROUP BY 1
ORDER BY 1
```
**Distribution analysis:**
```sql
SELECT
CASE
WHEN age < 18 THEN 'Under 18'
WHEN age BETWEEN 18 AND 30 THEN '18-30'
WHEN age BETWEEN 31 AND 50 THEN '31-50'
ELSE '50+'
END as age_group,
COUNT(*) as count
FROM users
GROUP BY 1
```
**Correlation check:**
```sql
SELECT
category,
CORR(price, quantity) as price_qty_correlation
FROM sales
GROUP BY category
```
## Safety Rules
1. **Read-Only**: The MCP server only allows SELECT queries
2. **Row Limits**: Queries auto-limit to 100 rows (max 1000)
3. **No PII**: Warn users if analyzing tables with potential PII
4. **Performance**: Add appropriate WHERE clauses for large tables
## Analysis Templates
### Data Quality Report
```
For table X:
1. get_table_stats(table_name="X")
2. analyze_column() for each key column
3. Check for:
- High null percentages
- Duplicate values
- Outliers in numeric columns
- Date ranges
```
### User Activity Analysis
```sql
-- Active users over time
SELECT
DATE_TRUNC('week', last_login) as week,
COUNT(DISTINCT user_id) as active_users
FROM users
WHERE last_login >= NOW() - INTERVAL '90 days'
GROUP BY 1
ORDER BY 1
```
### Revenue Analysis
```sql
-- Monthly revenue trends
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(total_amount) as revenue,
COUNT(*) as orders,
AVG(total_amount) as aov
FROM orders
WHERE status = 'completed'
GROUP BY 1
ORDER BY 1
```
## Common Insights to Look For
### Data Quality Issues
- Columns with >50% nulls
- Duplicate primary keys
- Outliers (use percentiles)
- Stale data (old max dates)
### Business Patterns
- Growth trends (time series)
- Seasonality (day of week, month)
- User segmentation
- Product/category performance
### Anomalies
- Sudden spikes/drops
- Unexpected distributions
- Missing expected data
## Output Format
Present findings with:
1. **Executive Summary**: Key findings in 2-3 sentences
2. **Data Overview**: Tables analyzed, row counts
3. **Key Insights**: Bullet points with supporting numbers
4. **Recommendations**: Actionable next steps
5. **Queries Used**: For reproducibility
## Example Session
```
User: "What's in my database?"
→ get_schema()
← Shows 3 tables: users, orders, products
→ get_table_stats("users")
← 10,000 users, created 2020-2024
→ analyze_column("users", "created_at")
← Growth peaked in 2022, slowed in 2023
→ execute_query("SELECT status, COUNT(*) FROM orders GROUP BY status")
← 60% completed, 30% pending, 10% cancelled
Insight: "Your database has healthy order flow but
cancellation rate (10%) is above industry
average (5-7%). Consider investigating
cancellation reasons."
```