
You’ve built impressive n8n automations. But can you answer this: how many workflows executed in the last hour? What’s your current CPU usage? If you hesitated, keep reading.
This guide transforms raw operational data into actionable intelligence. By the end, you’ll have dashboards showing exactly what’s happening inside your automation infrastructure.
Running Grafana and Prometheus alongside n8n requires a VPS with stable performance and reliable resource allocation. The comparison table below highlights VPS hosting providers that support monitoring stacks and real time metrics collection efficiently. These providers help ensure better visibility into workflow performance, uptime, and system health. Explore our recommended VPS hosting options.
VPS Hosting Providers Ready for Grafana, Prometheus, and n8n Monitoring Stacks
| Provider | User Rating | Recommended For | |
|---|---|---|---|
![]() | 4.8 | Scalability | Visit Kamatera |
![]() | 4.6 | Affordability | Visit Hostinger |
![]() | 4.7 | Developers | Visit IONOS |
The Ultimate Guide to Using Grafana And Prometheus With n8n
Integrating n8n with industry-standard monitoring tools provides enterprise-grade observability for your automation infrastructure. This isn’t optional for production environments. It’s essential.
This integration transforms raw operational data into actionable intelligence, ensuring maximum uptime and reliability. Whether you’re running a single instance or a complex multi-worker setup, this powerful combination gives you complete visibility into system health.
Pro Tip: To maximize the performance of this setup, always deploy your automation stack on reliable, high-performance VPS or dedicated hosting environments.
Why Your n8n Setup Needs a Robust Monitoring Stack
Production environments require real-time visibility into workflow execution performance and system bottlenecks. Without it, you’re guessing.
Here’s a sobering fact: a single n8n instance with a PostgreSQL database can handle approximately 220 workflow executions per second before response times exceed one second. That sounds impressive until you realize you have no idea where you currently stand.
Without proper monitoring, identifying when you hit these throughput ceilings becomes nearly impossible. The same goes for detecting when an external API causes latency spikes. A dedicated monitoring stack allows operators to make data-driven decisions about scaling, resource allocation, and workflow optimization.
The Synergy of Grafana Prometheus Integration
Prometheus and Grafana serve fundamentally different but highly complementary purposes in the observability ecosystem.
Prometheus is a time-series database engine. It actively scrapes metrics from endpoints via a pull-based architecture. Think of it as the collector that gathers all your execution data.
Grafana is the visualization layer. It transforms raw data into professional-grade dashboards without altering the data collection process.
This pure Grafana Prometheus addition means Grafana reads data that Prometheus has already stored. The result? A seamless, non-intrusive monitoring loop that tells you exactly what’s happening.
Understanding the Core Components
The Role of Prometheus in Metrics Collection

Prometheus connects directly to n8n’s internal /metrics endpoint to pull operational data. It operates without requiring any modifications to your actual n8n workflows.
This matters because it ensures zero interference with your business logic. Your automations keep running normally while Prometheus quietly collects information in the background.
The system utilizes the prom-client library to translate n8n’s internal events into a standardized, Prometheus-compatible format. Everything from workflow starts to completion times gets captured automatically.
The Role of Grafana in Visualization
Grafana extends Prometheus by offering comprehensive visualization, multi-user support, and role-based access control.
It supports multiple dashboard types within a single instance. Platform engineers can see system-level metrics. Application teams can track workflow performance. Executive management can view high-level KPIs. Everyone gets what they need.
Pros: Highly customizable, industry-standard, and excellent for long-term trend analysis.
Cons: Requires initial setup time and an understanding of PromQL (Prometheus Query Language) to build complex panels.
Step 1: Enabling Prometheus Metrics in n8n
Configuring Environment Variables
By default, n8n disables the /metrics endpoint to save resources. You must enable it via environment variables before anything else works.
Once enabled, metrics are exposed at http://your-n8n-url:5678/metrics. Below are the critical environment variables required for a production-grade setup:
| Environment Variable | Purpose |
|---|---|
| N8N_METRICS=true | Enables the core /metrics endpoint. |
| N8N_METRICS_INCLUDE_WORKFLOW_ID_LABEL=true | Adds workflow identifiers for per-workflow tracking. |
| N8N_METRICS_INCLUDE_DEFAULT_METRICS=true | Exposes Node.js process metrics (CPU, memory, garbage collection). |
| N8N_METRICS_INCLUDE_QUEUE_METRICS=true | Essential for queue mode; tracks job states and bottlenecks. |
Understanding Prometheus Metrics and Data Types
Prometheus exposes data in three fundamental types. Each requires a specific analytical approach:
- Counters: Cumulative totals that only increase. Example: n8n_workflow_started_total. Use these to track how many workflows have run overall.
- Gauges: Point-in-time values that fluctuate up and down. Example: active connections, memory usage. Perfect for current status monitoring.
- Histograms: Distribution data over time. Ideal for tracking execution durations and latency percentiles.
Community best practices recommend starting with basic metrics like n8n_workflow_executions_total before moving to complex queries. Walk before you run.
Step 2: Configuring Prometheus to Scrape Data
Setting Up the prometheus.yml File

Prometheus uses a prometheus.yml configuration file to define scraping targets and parameters. This config file tells Prometheus where to look and how often.
The standard scrape_interval for production n8n deployments is typically set to 15 seconds. This balances precision with storage efficiency.
Faster intervals (like 5 seconds) increase precision but consume significantly more storage space. They also add load to the endpoint. Choose wisely based on your needs.
Deploying Your Stack with Docker Compose
When using Docker Compose, keep Prometheus on the same internal Docker network as your n8n containers. This simplifies communication dramatically.
This setup allows Prometheus to address targets by their container names (like n8n-main:5678) rather than relying on static IP addresses. If container IPs change due to restarts or rebalancing, the named addressing ensures continuous, unbroken monitoring.
For more details on containerized deployments, check our guide on scaling n8n on VPS covering Docker, backups, and reverse proxy setup.
Tracking Host Health with Node Exporter
While n8n provides application metrics, you must also monitor the underlying server infrastructure. This is where Node Exporter becomes essential.
Integrating Node Exporter into your prometheus.yml provides crucial host-level metrics. You get CPU usage, disk I/O, network bandwidth, and more. This holistic view helps determine if a workflow failure stems from n8n logic or a bottleneck on your hosting provider’s server.
For self hosted deployments, this distinction saves hours of troubleshooting. You’ll know immediately whether to investigate your workflows or your infrastructure.
Step 3: Setting Up Grafana Data Sources
Connecting Prometheus Data Sources to Grafana
Grafana includes built-in support for Prometheus. No additional plugins needed for initial setup.
Navigate to the Grafana Connections menu, add a new data source, and input your Prometheus server URL (like http://prometheus:9090). The connection process takes about two minutes.
Optimization Tip: Set the query_timeout to 60 seconds. This protects your database from runaway queries that exceed normal execution times.
Ensure the Grafana scrape_interval matches your Prometheus configuration. Mismatched settings cause unnecessary data requests and can skew your dashboards.
Step 4: Building the Perfect Grafana Dashboard
Essential Visualization Types for n8n
A comprehensive Grafana dashboard relies on mixing different visualization panels. Each tells part of the story:
- Time-Series Panels: Best for tracking historical trends and execution rates over time. See patterns emerge.
- Stat Panels: Ideal for displaying high-level KPIs. Show total enabled workflows or current success rates at a glance.
- Gauge Panels: Perfect for showing real-time values relative to acceptable thresholds. Memory limits become instantly visible.
Using Pre-Built Community Dashboards

You don’t have to build from scratch. The n8n community provides robust templates that save hours of work.
Dashboard ID 24475: The n8n Workflow & Execution Analytics Dashboard connects to n8n’s Postgres database for deep workflow analytics. It’s ready to install and customize.
Node.js Metrics Dashboard: Focuses heavily on process metrics like heap usage, event loop lag, and open file descriptors. Essential for understanding system-level behavior.
Isolating An n8n Workflow Performance Bottleneck
Use the N8N_METRICS_INCLUDE_NODE_TYPE_LABEL to analyze if specific node types cause slowdowns. Some nodes are more resource-intensive than others.
Track the n8n_workflow_execution_duration_seconds histogram to see if most workflows complete quickly or suffer from long-tail latencies. This output reveals patterns invisible without proper monitoring.
By filtering labels, you can isolate specific workflows and identify exactly which external API or data transformation consumes CPU resources. No more guessing.
Monitoring AI Agent Workflows in n8n
As automation evolves, tracking complex, multi-step AI agent workflows becomes critical. These behave differently than standard automations.
AI workflows often rely on external LLM APIs, making latency tracking essential. One slow API response can cascade through your entire process.
Use custom PromQL queries to monitor success rates and execution durations specifically for workflows tagged with AI-related labels. This targeted approach keeps you informed without drowning in irrelevant data.
Advanced Monitoring for Queue Mode
Tracking Queue Metrics in Distributed Deployments
In horizontally scaled setups, the main n8n instance queues jobs to Redis while worker instances execute them. Understanding queue mode vs regular mode helps you choose the right architecture.
Monitor n8n_scaling_mode_queue_jobs_waiting to see how many jobs are queued. High numbers indicate insufficient worker capacity. You need more workers or faster ones.
Monitor n8n_scaling_mode_queue_jobs_active to track jobs currently processing. This shows your actual throughput.
Crucial Fact: Redis latency must be monitored closely. High latency here degrades end-to-end workflow performance regardless of n8n’s health. Don’t overlook this component.
Multi-Instance Coordination and Leadership
In high-availability setups with multiple main instances, metrics must be aggregated carefully. Double-counting leads to incorrect dashboards.
n8n automatically handles leadership election among main instances. This ensures only one instance manages certain tasks at any time.
Use the instance_role_leader gauge in your PromQL queries. Filtering for instance_role_leader == 1 ensures only the active queue leader reports queue metrics. Clean data, accurate dashboards.
Proactive Incident Detection
Building an Early Warning System for n8n

An effective early warning system detects degradation before users experience timeouts or failures. Prevention beats reaction every time.
Key Metric: Node.js event loop lag. If the 99th percentile (P99) of event loop lag exceeds 500 milliseconds, it signals severe performance degradation. Your instance is struggling.
Monitoring database connection pool exhaustion is also vital. It prevents new executions from starting even if CPU and memory appear healthy. The status might look fine while everything is actually broken.
Configuring Alert Rules for Proactive Monitoring
Grafana’s alerting engine transforms passive dashboards into active operational tools. Instead of constantly watching screens, let the system notify you.
Set up alert rules based on specific query conditions and thresholds. Test them thoroughly before relying on them in production.
Critical Alert: Configure a rule that fires if Prometheus cannot successfully query the n8n /metrics endpoint for more than two minutes. This indicates a complete system outage requiring immediate attention.
Embed dynamic information directly into the alert payload. Include current metric values and runbook links to accelerate troubleshooting. Every second matters during incidents.
Alert Severity and Routing Strategies
Implement a tiered severity system to prevent alert fatigue among your engineering team. Not every alert deserves the same response:
- P1 (Priority 1): Complete system outages or database connection failures. Requires immediate paging.
- P2 (Priority 2): Performance degradation or high queue wait times. Requires timely investigation.
- P3 (Priority 3): Informational trends for capacity planning. Logged for later review.
Route database alerts to database admins. Send workflow failure alerts to integration teams. Smart routing through appropriate channels ensures the right people respond to each incident type.
Security and Best Practices
Securing Your Monitoring Infrastructure
The /metrics endpoint contains sensitive system performance data and internal architectural details. Treat it accordingly.
Best Practice: Never expose the metrics endpoint publicly through internet-facing reverse proxy configurations. Keep Prometheus strictly on the internal network where clients can access it securely.
Place Grafana behind a secure reverse proxy with strong authentication. Use HTTPS, OAuth, or SAML to protect your dashboards. Basic auth alone isn’t sufficient for sensitive environments.
Enable proper authentication for all services in your stack. Unauthorized access to monitoring data reveals your infrastructure’s weaknesses.
Simplifying Setup with One Script Automation
To maintain consistency across environments, integrate your monitoring setup into CI/CD pipelines. Manual configuration introduces errors.
Use Infrastructure as Code (IaC) to deploy your Prometheus configs and Grafana dashboards automatically. Every environment stays identical.
Implement synthetic monitoring to verify test workflows post-deployment. If tests fail or latency spikes, one script automation can trigger an immediate deployment rollback. This automates recovery from failed releases.
n8n-trace vs. Prometheus/Grafana: Pros and Cons
Prometheus/Grafana:
- Pros: Industry-standard, excellent for infrastructure monitoring, deep system-level insights.
- Cons: Can be overly complex for non-technical business users; raw data access requires strict security.
n8n-trace:
- Pros: Self-hosted observability designed specifically for business teams; tracks workflow analytics without exposing sensitive payload data.
- Cons: Lacks deep infrastructure and server-level metrics.
Verdict: Use Prometheus/Grafana for your platform engineers. Deploy n8n-trace alongside it for business analysts. Both tools serve different audiences with different needs.
Choosing the Right Hosting Foundation

Your monitoring stack is only as reliable as the infrastructure running it. Prometheus, Grafana, and n8n all require consistent compute resources. Running them on underpowered or unreliable hosting leads to gaps in your metrics.
For n8n hosting options, look for VPS providers offering dedicated CPU cores and sufficient memory. Your monitoring volumes grow over time. Plan for that growth from the start.
Consider your storage requirements carefully. Prometheus retains time-series data that accumulates quickly with 15-second scrape intervals. A solid VPS setup with expandable storage options saves headaches later.
When comparing automation platforms, the monitoring requirements differ. Understanding n8n vs Airflow helps you choose the right tool before building your entire observability stack.
Performance Optimization Strategies
Once your monitoring is running, use the data to optimize. Learn more about [performance tuning n8n] for large workflow volumes.
Watch your JSON input and output sizes. Large payloads consume memory and slow processing. Your dashboards reveal these patterns.
Monitor heap usage trends over time. Gradual increases suggest memory leaks requiring investigation. Sudden spikes indicate resource-intensive workflows.
Track DNS resolution times if your workflows connect to many external services. Slow DNS adds latency to every API call.
Conclusion
Building a complete monitoring stack with Grafana and Prometheus transforms how you operate n8n. You move from reactive firefighting to proactive optimization.
Problems become visible before they impact users. Scaling decisions rely on data instead of guesses. The initial setup investment pays dividends through prevented downtime, faster troubleshooting, and confident infrastructure decisions.
Next Steps: What Now?
- Enable Prometheus metrics in your n8n environment variables today.
- Deploy Prometheus and Grafana using Docker Compose on the same network.
- Install Node Exporter for host-level monitoring of your server.
- Import community dashboards as starting templates.
- Configure at least three alert rules for critical conditions.
- Secure your stack behind proper authentication and network isolation.
- Document your setup for team knowledge sharing.



