194 lines
4.3 KiB
Markdown
194 lines
4.3 KiB
Markdown
---
|
|
name: postgres-analyzer
|
|
description: PostgreSQL database analysis and querying. Use when the user needs to explore database schema, query data, analyze table statistics, or get insights from PostgreSQL databases. Requires PG_CONNECTION_STRING environment variable to be set.
|
|
---
|
|
|
|
# PostgreSQL Analyzer Skill
|
|
|
|
Guide for analyzing PostgreSQL databases and extracting insights.
|
|
|
|
## When to Use
|
|
|
|
- Exploring database structure and schema
|
|
- Querying data with SQL
|
|
- Analyzing table statistics and data quality
|
|
- Understanding column distributions
|
|
- Generating data insights and reports
|
|
|
|
## Available Tools
|
|
|
|
| Tool | Purpose | Use When |
|
|
|------|---------|----------|
|
|
| `get_schema` | List tables and columns | Starting analysis, understanding structure |
|
|
| `execute_query` | Run SELECT queries | Getting specific data, custom analysis |
|
|
| `get_table_stats` | Table statistics + sample | Understanding a table's data profile |
|
|
| `analyze_column` | Deep column analysis | Investigating specific column patterns |
|
|
|
|
## Analysis Workflow
|
|
|
|
### 1. Discovery Phase
|
|
|
|
Always start with schema exploration:
|
|
|
|
```
|
|
User: "Analyze my database"
|
|
Step 1: get_schema() → See all tables
|
|
Step 2: For each interesting table, get_table_stats()
|
|
```
|
|
|
|
### 2. Deep Dive Phase
|
|
|
|
Focus on specific areas:
|
|
|
|
```
|
|
Step 3: analyze_column() for key columns
|
|
Step 4: execute_query() for custom analysis
|
|
```
|
|
|
|
### 3. Insight Phase
|
|
|
|
Synthesize findings:
|
|
|
|
```
|
|
Step 5: Identify patterns, anomalies, recommendations
|
|
```
|
|
|
|
## Query Patterns
|
|
|
|
### Common Analysis Queries
|
|
|
|
**Time-based aggregation:**
|
|
```sql
|
|
SELECT
|
|
DATE_TRUNC('month', created_at) as month,
|
|
COUNT(*) as count,
|
|
AVG(amount) as avg_amount
|
|
FROM orders
|
|
GROUP BY 1
|
|
ORDER BY 1
|
|
```
|
|
|
|
**Distribution analysis:**
|
|
```sql
|
|
SELECT
|
|
CASE
|
|
WHEN age < 18 THEN 'Under 18'
|
|
WHEN age BETWEEN 18 AND 30 THEN '18-30'
|
|
WHEN age BETWEEN 31 AND 50 THEN '31-50'
|
|
ELSE '50+'
|
|
END as age_group,
|
|
COUNT(*) as count
|
|
FROM users
|
|
GROUP BY 1
|
|
```
|
|
|
|
**Correlation check:**
|
|
```sql
|
|
SELECT
|
|
category,
|
|
CORR(price, quantity) as price_qty_correlation
|
|
FROM sales
|
|
GROUP BY category
|
|
```
|
|
|
|
## Safety Rules
|
|
|
|
1. **Read-Only**: The MCP server only allows SELECT queries
|
|
2. **Row Limits**: Queries auto-limit to 100 rows (max 1000)
|
|
3. **No PII**: Warn users if analyzing tables with potential PII
|
|
4. **Performance**: Add appropriate WHERE clauses for large tables
|
|
|
|
## Analysis Templates
|
|
|
|
### Data Quality Report
|
|
|
|
```
|
|
For table X:
|
|
1. get_table_stats(table_name="X")
|
|
2. analyze_column() for each key column
|
|
3. Check for:
|
|
- High null percentages
|
|
- Duplicate values
|
|
- Outliers in numeric columns
|
|
- Date ranges
|
|
```
|
|
|
|
### User Activity Analysis
|
|
|
|
```sql
|
|
-- Active users over time
|
|
SELECT
|
|
DATE_TRUNC('week', last_login) as week,
|
|
COUNT(DISTINCT user_id) as active_users
|
|
FROM users
|
|
WHERE last_login >= NOW() - INTERVAL '90 days'
|
|
GROUP BY 1
|
|
ORDER BY 1
|
|
```
|
|
|
|
### Revenue Analysis
|
|
|
|
```sql
|
|
-- Monthly revenue trends
|
|
SELECT
|
|
DATE_TRUNC('month', order_date) as month,
|
|
SUM(total_amount) as revenue,
|
|
COUNT(*) as orders,
|
|
AVG(total_amount) as aov
|
|
FROM orders
|
|
WHERE status = 'completed'
|
|
GROUP BY 1
|
|
ORDER BY 1
|
|
```
|
|
|
|
## Common Insights to Look For
|
|
|
|
### Data Quality Issues
|
|
- Columns with >50% nulls
|
|
- Duplicate primary keys
|
|
- Outliers (use percentiles)
|
|
- Stale data (old max dates)
|
|
|
|
### Business Patterns
|
|
- Growth trends (time series)
|
|
- Seasonality (day of week, month)
|
|
- User segmentation
|
|
- Product/category performance
|
|
|
|
### Anomalies
|
|
- Sudden spikes/drops
|
|
- Unexpected distributions
|
|
- Missing expected data
|
|
|
|
## Output Format
|
|
|
|
Present findings with:
|
|
1. **Executive Summary**: Key findings in 2-3 sentences
|
|
2. **Data Overview**: Tables analyzed, row counts
|
|
3. **Key Insights**: Bullet points with supporting numbers
|
|
4. **Recommendations**: Actionable next steps
|
|
5. **Queries Used**: For reproducibility
|
|
|
|
## Example Session
|
|
|
|
```
|
|
User: "What's in my database?"
|
|
|
|
→ get_schema()
|
|
← Shows 3 tables: users, orders, products
|
|
|
|
→ get_table_stats("users")
|
|
← 10,000 users, created 2020-2024
|
|
|
|
→ analyze_column("users", "created_at")
|
|
← Growth peaked in 2022, slowed in 2023
|
|
|
|
→ execute_query("SELECT status, COUNT(*) FROM orders GROUP BY status")
|
|
← 60% completed, 30% pending, 10% cancelled
|
|
|
|
Insight: "Your database has healthy order flow but
|
|
cancellation rate (10%) is above industry
|
|
average (5-7%). Consider investigating
|
|
cancellation reasons."
|
|
```
|