--- name: postgres-analyzer description: PostgreSQL database analysis and querying. Use when the user needs to explore database schema, query data, analyze table statistics, or get insights from PostgreSQL databases. Requires PG_CONNECTION_STRING environment variable to be set. --- # PostgreSQL Analyzer Skill Guide for analyzing PostgreSQL databases and extracting insights. ## When to Use - Exploring database structure and schema - Querying data with SQL - Analyzing table statistics and data quality - Understanding column distributions - Generating data insights and reports ## Available Tools | Tool | Purpose | Use When | |------|---------|----------| | `get_schema` | List tables and columns | Starting analysis, understanding structure | | `execute_query` | Run SELECT queries | Getting specific data, custom analysis | | `get_table_stats` | Table statistics + sample | Understanding a table's data profile | | `analyze_column` | Deep column analysis | Investigating specific column patterns | ## Analysis Workflow ### 1. Discovery Phase Always start with schema exploration: ``` User: "Analyze my database" Step 1: get_schema() → See all tables Step 2: For each interesting table, get_table_stats() ``` ### 2. Deep Dive Phase Focus on specific areas: ``` Step 3: analyze_column() for key columns Step 4: execute_query() for custom analysis ``` ### 3. Insight Phase Synthesize findings: ``` Step 5: Identify patterns, anomalies, recommendations ``` ## Query Patterns ### Common Analysis Queries **Time-based aggregation:** ```sql SELECT DATE_TRUNC('month', created_at) as month, COUNT(*) as count, AVG(amount) as avg_amount FROM orders GROUP BY 1 ORDER BY 1 ``` **Distribution analysis:** ```sql SELECT CASE WHEN age < 18 THEN 'Under 18' WHEN age BETWEEN 18 AND 30 THEN '18-30' WHEN age BETWEEN 31 AND 50 THEN '31-50' ELSE '50+' END as age_group, COUNT(*) as count FROM users GROUP BY 1 ``` **Correlation check:** ```sql SELECT category, CORR(price, quantity) as price_qty_correlation FROM sales GROUP BY category ``` ## Safety Rules 1. **Read-Only**: The MCP server only allows SELECT queries 2. **Row Limits**: Queries auto-limit to 100 rows (max 1000) 3. **No PII**: Warn users if analyzing tables with potential PII 4. **Performance**: Add appropriate WHERE clauses for large tables ## Analysis Templates ### Data Quality Report ``` For table X: 1. get_table_stats(table_name="X") 2. analyze_column() for each key column 3. Check for: - High null percentages - Duplicate values - Outliers in numeric columns - Date ranges ``` ### User Activity Analysis ```sql -- Active users over time SELECT DATE_TRUNC('week', last_login) as week, COUNT(DISTINCT user_id) as active_users FROM users WHERE last_login >= NOW() - INTERVAL '90 days' GROUP BY 1 ORDER BY 1 ``` ### Revenue Analysis ```sql -- Monthly revenue trends SELECT DATE_TRUNC('month', order_date) as month, SUM(total_amount) as revenue, COUNT(*) as orders, AVG(total_amount) as aov FROM orders WHERE status = 'completed' GROUP BY 1 ORDER BY 1 ``` ## Common Insights to Look For ### Data Quality Issues - Columns with >50% nulls - Duplicate primary keys - Outliers (use percentiles) - Stale data (old max dates) ### Business Patterns - Growth trends (time series) - Seasonality (day of week, month) - User segmentation - Product/category performance ### Anomalies - Sudden spikes/drops - Unexpected distributions - Missing expected data ## Output Format Present findings with: 1. **Executive Summary**: Key findings in 2-3 sentences 2. **Data Overview**: Tables analyzed, row counts 3. **Key Insights**: Bullet points with supporting numbers 4. **Recommendations**: Actionable next steps 5. **Queries Used**: For reproducibility ## Example Session ``` User: "What's in my database?" → get_schema() ← Shows 3 tables: users, orders, products → get_table_stats("users") ← 10,000 users, created 2020-2024 → analyze_column("users", "created_at") ← Growth peaked in 2022, slowed in 2023 → execute_query("SELECT status, COUNT(*) FROM orders GROUP BY status") ← 60% completed, 30% pending, 10% cancelled Insight: "Your database has healthy order flow but cancellation rate (10%) is above industry average (5-7%). Consider investigating cancellation reasons." ```