nanoagent/pg_analyzer_skill/SKILL.md

4.3 KiB

name description
postgres-analyzer PostgreSQL database analysis and querying. Use when the user needs to explore database schema, query data, analyze table statistics, or get insights from PostgreSQL databases. Requires PG_CONNECTION_STRING environment variable to be set.

PostgreSQL Analyzer Skill

Guide for analyzing PostgreSQL databases and extracting insights.

When to Use

  • Exploring database structure and schema
  • Querying data with SQL
  • Analyzing table statistics and data quality
  • Understanding column distributions
  • Generating data insights and reports

Available Tools

Tool Purpose Use When
get_schema List tables and columns Starting analysis, understanding structure
execute_query Run SELECT queries Getting specific data, custom analysis
get_table_stats Table statistics + sample Understanding a table's data profile
analyze_column Deep column analysis Investigating specific column patterns

Analysis Workflow

1. Discovery Phase

Always start with schema exploration:

User: "Analyze my database"
Step 1: get_schema() → See all tables
Step 2: For each interesting table, get_table_stats()

2. Deep Dive Phase

Focus on specific areas:

Step 3: analyze_column() for key columns
Step 4: execute_query() for custom analysis

3. Insight Phase

Synthesize findings:

Step 5: Identify patterns, anomalies, recommendations

Query Patterns

Common Analysis Queries

Time-based aggregation:

SELECT 
    DATE_TRUNC('month', created_at) as month,
    COUNT(*) as count,
    AVG(amount) as avg_amount
FROM orders
GROUP BY 1
ORDER BY 1

Distribution analysis:

SELECT 
    CASE 
        WHEN age < 18 THEN 'Under 18'
        WHEN age BETWEEN 18 AND 30 THEN '18-30'
        WHEN age BETWEEN 31 AND 50 THEN '31-50'
        ELSE '50+'
    END as age_group,
    COUNT(*) as count
FROM users
GROUP BY 1

Correlation check:

SELECT 
    category,
    CORR(price, quantity) as price_qty_correlation
FROM sales
GROUP BY category

Safety Rules

  1. Read-Only: The MCP server only allows SELECT queries
  2. Row Limits: Queries auto-limit to 100 rows (max 1000)
  3. No PII: Warn users if analyzing tables with potential PII
  4. Performance: Add appropriate WHERE clauses for large tables

Analysis Templates

Data Quality Report

For table X:
1. get_table_stats(table_name="X")
2. analyze_column() for each key column
3. Check for:
   - High null percentages
   - Duplicate values
   - Outliers in numeric columns
   - Date ranges

User Activity Analysis

-- Active users over time
SELECT 
    DATE_TRUNC('week', last_login) as week,
    COUNT(DISTINCT user_id) as active_users
FROM users
WHERE last_login >= NOW() - INTERVAL '90 days'
GROUP BY 1
ORDER BY 1

Revenue Analysis

-- Monthly revenue trends
SELECT 
    DATE_TRUNC('month', order_date) as month,
    SUM(total_amount) as revenue,
    COUNT(*) as orders,
    AVG(total_amount) as aov
FROM orders
WHERE status = 'completed'
GROUP BY 1
ORDER BY 1

Common Insights to Look For

Data Quality Issues

  • Columns with >50% nulls
  • Duplicate primary keys
  • Outliers (use percentiles)
  • Stale data (old max dates)

Business Patterns

  • Growth trends (time series)
  • Seasonality (day of week, month)
  • User segmentation
  • Product/category performance

Anomalies

  • Sudden spikes/drops
  • Unexpected distributions
  • Missing expected data

Output Format

Present findings with:

  1. Executive Summary: Key findings in 2-3 sentences
  2. Data Overview: Tables analyzed, row counts
  3. Key Insights: Bullet points with supporting numbers
  4. Recommendations: Actionable next steps
  5. Queries Used: For reproducibility

Example Session

User: "What's in my database?"

→ get_schema()
← Shows 3 tables: users, orders, products

→ get_table_stats("users")
← 10,000 users, created 2020-2024

→ analyze_column("users", "created_at")
← Growth peaked in 2022, slowed in 2023

→ execute_query("SELECT status, COUNT(*) FROM orders GROUP BY status")
← 60% completed, 30% pending, 10% cancelled

Insight: "Your database has healthy order flow but 
          cancellation rate (10%) is above industry 
          average (5-7%). Consider investigating 
          cancellation reasons."