Remote OpenClaw

Remote OpenClaw Blog

Best MCP Servers for Monitoring and Observability

6 min read ·

Monitoring and observability are the backbone of reliable software. When something breaks at 2 AM, you need dashboards that show what happened, alerts that fire before users notice, and logs that tell you why. MCP servers now let your AI coding agent tap directly into the tools that power your observability stack — Prometheus, Grafana, Datadog, and PagerDuty — so you can query metrics, build dashboards, and manage incidents without leaving your editor.

This guide covers the best MCP servers for monitoring and observability, with setup instructions, feature breakdowns, and practical use cases for each.

Why MCP Servers for Observability?

Traditionally, monitoring workflows involve jumping between multiple browser tabs, memorizing PromQL syntax, and manually configuring alert thresholds. MCP servers for observability change this by exposing metrics queries, dashboard management, and incident response as tools your AI agent can call directly.

Instead of writing a PromQL query from scratch, you describe what you want to measure in plain language. Instead of clicking through Grafana menus to build a panel, you tell your agent what dashboard you need. The MCP server handles the API calls, authentication, and data formatting behind the scenes.

This approach has three major benefits. First, it reduces context switching — you stay in your editor. Second, it lowers the learning curve for complex query languages. Third, it makes observability accessible to every developer on the team, not just the SRE who memorized PromQL.

Prometheus MCP Server

Prometheus is the industry standard for metrics collection and alerting in cloud-native environments. The Prometheus MCP server connects your agent to any Prometheus instance and exposes its query and metadata APIs as callable tools.

Key Features

  • PromQL query execution: Run instant and range queries through your agent using natural language descriptions that get translated to PromQL.
  • Metric discovery: List all available metrics, their types, and their labels without browsing the Prometheus UI.
  • Alert rule inspection: View active alerts, pending alerts, and alerting rule configurations.
  • Target health monitoring: Check the status of all scrape targets to see which exporters are up and which are failing.
  • Recording rule management: Inspect and understand pre-computed recording rules in your Prometheus setup.

Setup

Install the Prometheus MCP server and point it at your instance:

openclaw skill install mcp-prometheus

Configure the connection in your MCP server settings:

{
  "mcpServers": {
    "prometheus": {
      "command": "mcp-server-prometheus",
      "args": ["--url", "http://localhost:9090"]
    }
  }
}

If your Prometheus instance requires authentication, pass a bearer token or basic auth credentials through environment variables:

{
  "mcpServers": {
    "prometheus": {
      "command": "mcp-server-prometheus",
      "args": ["--url", "https://prometheus.internal.example.com"],
      "env": {
        "PROMETHEUS_TOKEN": "your-bearer-token"
      }
    }
  }
}

Use Cases

  • Asking your agent to find the P99 latency for a specific service over the past hour.
  • Debugging a spike in error rates by querying HTTP status code distributions.
  • Reviewing all firing alerts before a deployment.
  • Generating PromQL queries for custom metrics without memorizing the syntax.

Grafana MCP Server

Grafana is the visualization layer that sits on top of Prometheus, Loki, Tempo, and dozens of other data sources. The Grafana MCP server lets your agent create, modify, and query Grafana dashboards and data sources programmatically.

Key Features

  • Dashboard management: Create new dashboards, list existing ones, and modify panels without touching the Grafana UI.
  • Panel creation: Add time-series graphs, stat panels, tables, and heatmaps by describing what you want to visualize.
  • Data source queries: Run queries against any configured Grafana data source, including Prometheus, Loki, InfluxDB, and Elasticsearch.
  • Annotation management: Add and query annotations to mark deployments, incidents, and other events on your dashboards.
  • Alert rule configuration: Create and modify Grafana-managed alert rules with notification policies.

Setup

openclaw skill install mcp-grafana

Configure the server with your Grafana instance URL and an API key:

{
  "mcpServers": {
    "grafana": {
      "command": "mcp-server-grafana",
      "args": ["--url", "http://localhost:3000"],
      "env": {
        "GRAFANA_API_KEY": "your-grafana-api-key"
      }
    }
  }
}

Generate an API key in Grafana under Configuration, then API Keys. Use an Editor or Admin role for full dashboard management capabilities.

Use Cases

  • Building a service health dashboard with latency, error rate, and throughput panels from a single prompt.
  • Adding a deployment annotation across all dashboards when you ship a release.
  • Querying Loki logs through Grafana to correlate log entries with metric spikes.
  • Cloning an existing dashboard and modifying it for a new microservice.

Datadog MCP Server

Datadog provides a fully managed monitoring platform that covers metrics, traces, logs, and infrastructure monitoring. The Datadog MCP server brings this entire platform into your agent workflow.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Key Features

  • Metric queries: Run Datadog metric queries using natural language and get back time-series data.
  • Dashboard creation: Build and modify Datadog dashboards with widgets for metrics, logs, and traces.
  • Monitor management: Create, update, and silence Datadog monitors directly from your editor.
  • Log search: Search and filter logs using Datadog log query syntax through your agent.
  • APM trace inspection: Query distributed traces to debug latency issues across services.
  • Infrastructure inventory: List hosts, containers, and services monitored by your Datadog account.

Setup

openclaw skill install mcp-datadog

Configure the server with your Datadog API and application keys:

{
  "mcpServers": {
    "datadog": {
      "command": "mcp-server-datadog",
      "env": {
        "DD_API_KEY": "your-datadog-api-key",
        "DD_APP_KEY": "your-datadog-app-key",
        "DD_SITE": "datadoghq.com"
      }
    }
  }
}

Set the DD_SITE variable to match your Datadog region. Common values include datadoghq.com for US1, datadoghq.eu for EU, and us3.datadoghq.com for US3.

Use Cases

  • Investigating a production incident by querying metrics, logs, and traces from a single conversation.
  • Creating a new monitor for a recently deployed service with appropriate thresholds and notification channels.
  • Silencing noisy alerts during a maintenance window.
  • Building a comprehensive service dashboard that combines infrastructure metrics with application-level APM data.

PagerDuty MCP Server

PagerDuty is the incident management platform that connects your monitoring alerts to on-call schedules and escalation policies. The PagerDuty MCP server lets your agent manage incidents, check on-call rotations, and interact with your incident response workflows.

Key Features

  • Incident management: Create, acknowledge, and resolve incidents directly from your agent.
  • On-call lookup: Check who is currently on call for any service or escalation policy.
  • Service status: View the current status of all PagerDuty services and their recent incidents.
  • Escalation management: Trigger escalations and reassign incidents without opening the PagerDuty app.
  • Incident notes: Add notes and status updates to active incidents for better communication during outages.
  • Maintenance windows: Schedule and manage maintenance windows to suppress alerts during planned work.

Setup

openclaw skill install mcp-pagerduty

Configure the server with your PagerDuty API token:

{
  "mcpServers": {
    "pagerduty": {
      "command": "mcp-server-pagerduty",
      "env": {
        "PAGERDUTY_API_TOKEN": "your-pagerduty-api-token"
      }
    }
  }
}

Use a read-write API token if you want your agent to manage incidents. A read-only token works for querying on-call schedules and viewing incident history.

Use Cases

  • Checking who is on call before paging someone about a production issue.
  • Acknowledging an incident and adding initial triage notes without switching apps.
  • Reviewing all open incidents across services to understand the current state of your infrastructure.
  • Scheduling a maintenance window before a planned deployment to prevent false alerts.

Combining MCP Servers for a Full Observability Stack

The real power of monitoring MCP servers comes from combining them. You can connect Prometheus, Grafana, Datadog, and PagerDuty simultaneously and let your agent orchestrate across all of them.

For example, during an incident you might ask your agent to query Prometheus for the error rate spike, search Datadog logs for the root cause, create a Grafana annotation marking the incident start time, and acknowledge the PagerDuty incident — all in a single conversation.

This integrated approach turns your AI agent into an observability copilot that understands your entire monitoring stack and can take action across every tool in your pipeline.

Getting Started

Pick the MCP servers that match your existing monitoring stack. If you are running Prometheus and Grafana, start with those two. If you are on Datadog, the single Datadog MCP server covers metrics, logs, and traces in one integration. Add PagerDuty when you are ready to bring incident management into your agent workflow.

Each server installs in under a minute, and the configuration is a few lines of JSON. Once connected, your AI agent becomes a natural language interface to your entire observability platform.


Browse the Skills Directory

Find the right skill for your workflow. The OpenClaw Bazaar skills directory has over 2,300 community-rated skills — searchable, sortable, and free to install.

Browse Skills →

Personas Include MCP Servers

OpenClaw personas come with pre-configured MCP server connections — no manual setup needed. Pick a persona and the right servers are already wired in. Compare personas →