Remote OpenClaw

Remote OpenClaw Blog

How to Use OpenClaw for API Testing and Monitoring

9 min read ·

APIs break in subtle ways. An endpoint returns 200 but the response shape changed. A query parameter that used to be optional is now required. Response times crept from 50ms to 500ms and nobody noticed until customers complained. OpenClaw skills help you build a testing and monitoring setup that catches these problems before they reach production — or alerts you the moment they appear in production.

This guide covers five layers of API quality: endpoint testing, contract testing, load testing, health checks, and alerting. Each layer uses OpenClaw skills from the OpenClaw Bazaar skills directory to generate test code, configuration files, and monitoring infrastructure.

Installing API Testing Skills

Set up your testing stack with the relevant skills:

# Core API testing
openclaw skill install api-test-generator
openclaw skill install contract-testing

# Performance and load testing
openclaw skill install load-test-patterns
openclaw skill install performance-benchmarks

# Monitoring and alerting
openclaw skill install health-check-generator
openclaw skill install alerting-config

These skills cover the full API quality lifecycle: writing tests during development, validating contracts between services, stress-testing before releases, and monitoring after deployment.

Endpoint Testing

Endpoint tests verify that your API handles requests correctly — right status codes, right response shapes, right error messages. The api-test-generator skill teaches your agent to produce thorough endpoint tests from your route handlers.

# Ask your agent:
# "Generate endpoint tests for the user routes in src/routes/users.ts"

The agent analyzes your routes and generates tests:

// __tests__/routes/users.test.ts
import request from "supertest";
import { app } from "../../src/app";
import { createTestDatabase, seedTestData, cleanup } from "../helpers";

describe("POST /api/v1/users", () => {
  beforeAll(async () => {
    await createTestDatabase();
  });

  afterEach(async () => {
    await cleanup("users");
  });

  it("creates a user with valid data", async () => {
    const response = await request(app)
      .post("/api/v1/users")
      .send({
        email: "jane@example.com",
        name: "Jane Doe",
        role: "member",
      })
      .expect(201);

    expect(response.body.data).toMatchObject({
      email: "jane@example.com",
      name: "Jane Doe",
      role: "member",
    });
    expect(response.body.data.id).toBeDefined();
    expect(response.body.data.createdAt).toBeDefined();
    // Password hash should never appear in response
    expect(response.body.data).not.toHaveProperty("passwordHash");
  });

  it("rejects invalid email addresses", async () => {
    const response = await request(app)
      .post("/api/v1/users")
      .send({
        email: "not-an-email",
        name: "Jane Doe",
      })
      .expect(400);

    expect(response.body.error).toContain("email");
  });

  it("rejects duplicate email addresses", async () => {
    await seedTestData("users", [
      { email: "existing@example.com", name: "Existing User" },
    ]);

    const response = await request(app)
      .post("/api/v1/users")
      .send({
        email: "existing@example.com",
        name: "Duplicate User",
      })
      .expect(409);

    expect(response.body.error).toBe("Email already exists");
  });

  it("applies default role when not specified", async () => {
    const response = await request(app)
      .post("/api/v1/users")
      .send({
        email: "defaults@example.com",
        name: "Default Role User",
      })
      .expect(201);

    expect(response.body.data.role).toBe("member");
  });
});

describe("GET /api/v1/users/:id", () => {
  it("returns the user for a valid ID", async () => {
    const user = await seedTestData("users", [
      { email: "fetch@example.com", name: "Fetch Me" },
    ]);

    const response = await request(app)
      .get(\`/api/v1/users/\${user[0].id}\`)
      .expect(200);

    expect(response.body.data.email).toBe("fetch@example.com");
  });

  it("returns 404 for a non-existent user", async () => {
    const response = await request(app)
      .get("/api/v1/users/00000000-0000-0000-0000-000000000000")
      .expect(404);

    expect(response.body.error).toBe("User not found");
  });

  it("returns 400 for an invalid UUID format", async () => {
    await request(app)
      .get("/api/v1/users/not-a-uuid")
      .expect(400);
  });
});

The agent generates tests for happy paths, validation failures, conflict handling, default values, and edge cases. It reads your validation schemas to know what invalid inputs look like, checks your error handling to match expected error messages, and verifies that sensitive fields like password hashes are excluded from responses.

Contract Testing

Contract tests verify that services agree on API shapes. When your frontend team expects a response field called userName but your backend sends name, a contract test catches it before deployment. The contract-testing skill generates Pact-compatible contract tests:

// __tests__/contracts/user-api.pact.ts
import { PactV4, MatchersV3 } from "@pact-foundation/pact";

const { like, eachLike, uuid, iso8601DateTimeWithMillis } = MatchersV3;

const provider = new PactV4({
  consumer: "WebApp",
  provider: "UserAPI",
});

describe("User API Contract", () => {
  it("returns a user by ID", async () => {
    await provider
      .addInteraction()
      .given("a user with ID abc-123 exists")
      .uponReceiving("a request for user abc-123")
      .withRequest("GET", "/api/v1/users/abc-123")
      .willRespondWith(200, (builder) => {
        builder.jsonBody({
          data: {
            id: uuid("abc-123"),
            email: like("user@example.com"),
            name: like("Jane Doe"),
            role: like("member"),
            createdAt: iso8601DateTimeWithMillis(),
          },
        });
      })
      .executeTest(async (mockServer) => {
        const response = await fetch(
          \`\${mockServer.url}/api/v1/users/abc-123\`
        );
        const body = await response.json();

        expect(body.data.id).toBeDefined();
        expect(body.data.email).toBeDefined();
        expect(body.data.name).toBeDefined();
      });
  });

  it("returns a list of users with pagination", async () => {
    await provider
      .addInteraction()
      .given("multiple users exist")
      .uponReceiving("a request for the user list")
      .withRequest("GET", "/api/v1/users", (builder) => {
        builder.query({ page: "1", limit: "10" });
      })
      .willRespondWith(200, (builder) => {
        builder.jsonBody({
          data: eachLike({
            id: uuid(),
            email: like("user@example.com"),
            name: like("Jane Doe"),
            role: like("member"),
          }),
          pagination: {
            page: like(1),
            limit: like(10),
            total: like(42),
            totalPages: like(5),
          },
        });
      })
      .executeTest(async (mockServer) => {
        const response = await fetch(
          \`\${mockServer.url}/api/v1/users?page=1&limit=10\`
        );
        const body = await response.json();

        expect(body.data).toBeInstanceOf(Array);
        expect(body.pagination).toBeDefined();
      });
  });
});

Contract tests are generated from both sides. The consumer side (frontend) defines what it expects. The provider side (backend) verifies it can produce that shape. The contract-testing skill generates both sides and sets up the Pact broker configuration for sharing contracts between teams.

Load Testing

Before a release, you want to know how your API handles traffic. The load-test-patterns skill generates k6 load testing scripts:

# Ask your agent:
# "Create a load test for the user API with ramp-up and sustained load phases"
// load-tests/user-api.js
import http from "k6/http";
import { check, sleep } from "k6";
import { Rate, Trend } from "k6/metrics";

const errorRate = new Rate("error_rate");
const responseTime = new Trend("response_time");

export const options = {
  stages: [
    { duration: "2m", target: 50 },    // Ramp up to 50 users
    { duration: "5m", target: 50 },    // Sustained load
    { duration: "2m", target: 200 },   // Ramp to peak load
    { duration: "5m", target: 200 },   // Sustained peak
    { duration: "2m", target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ["p(95)<300", "p(99)<500"],
    error_rate: ["rate<0.01"],
    http_req_failed: ["rate<0.01"],
  },
};

const BASE_URL = __ENV.API_URL || "http://localhost:3000";

export default function () {
  // Simulate realistic user behavior
  const createResponse = http.post(
    \`\${BASE_URL}/api/v1/users\`,
    JSON.stringify({
      email: \`loadtest-\${Date.now()}-\${__VU}@example.com\`,
      name: "Load Test User",
    }),
    { headers: { "Content-Type": "application/json" } }
  );

  check(createResponse, {
    "create returns 201": (r) => r.status === 201,
    "create response has id": (r) => JSON.parse(r.body).data.id !== undefined,
  });

  errorRate.add(createResponse.status !== 201);
  responseTime.add(createResponse.timings.duration);

  if (createResponse.status === 201) {
    const userId = JSON.parse(createResponse.body).data.id;

    const getResponse = http.get(\`\${BASE_URL}/api/v1/users/\${userId}\`);

    check(getResponse, {
      "get returns 200": (r) => r.status === 200,
      "get response time < 100ms": (r) => r.timings.duration < 100,
    });

    errorRate.add(getResponse.status !== 200);
    responseTime.add(getResponse.timings.duration);
  }

  sleep(1); // Simulate think time between requests
}

The agent configures realistic test scenarios with ramp-up phases, sustained load, and peak testing. The thresholds are set based on common SLA targets: 95th percentile under 300ms, error rate under 1%. Adjust these to match your own SLAs.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Health Checks

Health check endpoints let monitoring systems verify your API is operational. The health-check-generator skill creates comprehensive health checks that go beyond simple ping responses:

// src/routes/health.ts
import { Router } from "express";
import { Pool } from "pg";
import { Redis } from "ioredis";

const router = Router();

interface HealthStatus {
  status: "healthy" | "degraded" | "unhealthy";
  timestamp: string;
  version: string;
  uptime: number;
  checks: Record<string, ComponentHealth>;
}

interface ComponentHealth {
  status: "pass" | "fail" | "warn";
  responseTime: number;
  message?: string;
}

async function checkDatabase(pool: Pool): Promise<ComponentHealth> {
  const start = Date.now();
  try {
    await pool.query("SELECT 1");
    return { status: "pass", responseTime: Date.now() - start };
  } catch (error) {
    return {
      status: "fail",
      responseTime: Date.now() - start,
      message: "Database connection failed",
    };
  }
}

async function checkRedis(redis: Redis): Promise<ComponentHealth> {
  const start = Date.now();
  try {
    await redis.ping();
    return { status: "pass", responseTime: Date.now() - start };
  } catch (error) {
    return {
      status: "fail",
      responseTime: Date.now() - start,
      message: "Redis connection failed",
    };
  }
}

router.get("/health", async (req, res) => {
  const checks = {
    database: await checkDatabase(req.app.locals.db),
    redis: await checkRedis(req.app.locals.redis),
  };

  const hasFailure = Object.values(checks).some((c) => c.status === "fail");
  const hasWarning = Object.values(checks).some((c) => c.status === "warn");

  const status: HealthStatus = {
    status: hasFailure ? "unhealthy" : hasWarning ? "degraded" : "healthy",
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || "unknown",
    uptime: process.uptime(),
    checks,
  };

  const httpStatus = hasFailure ? 503 : 200;
  res.status(httpStatus).json(status);
});

// Lightweight liveness probe for Kubernetes
router.get("/health/live", (req, res) => {
  res.status(200).json({ status: "alive" });
});

// Readiness probe — only returns 200 when all dependencies are available
router.get("/health/ready", async (req, res) => {
  const db = await checkDatabase(req.app.locals.db);
  const redis = await checkRedis(req.app.locals.redis);

  if (db.status === "fail" || redis.status === "fail") {
    return res.status(503).json({ status: "not ready" });
  }

  res.status(200).json({ status: "ready" });
});

export default router;

The agent generates three health endpoints: a detailed health check that reports component status, a liveness probe for Kubernetes that confirms the process is running, and a readiness probe that confirms all dependencies are available. This follows the standard pattern for containerized deployments.

Alerting Configuration

Monitoring is useless without alerts. The alerting-config skill generates alerting rules for popular monitoring platforms:

# monitoring/alerts/api-alerts.yaml
# Prometheus alerting rules

groups:
  - name: api-health
    interval: 30s
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m]))
          / sum(rate(http_requests_total[5m])) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "API error rate above 5%"
          description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes"

      - alert: HighLatency
        expr: |
          histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "95th percentile latency above 500ms"
          description: "p95 latency is {{ $value | humanizeDuration }}"

      - alert: HealthCheckFailing
        expr: |
          probe_success{job="api-health-check"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "API health check failing"
          description: "Health check has been failing for over 1 minute"

      - alert: HighMemoryUsage
        expr: |
          process_resident_memory_bytes / 1024 / 1024 > 512
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "API memory usage above 512MB"
          description: "Memory usage is {{ $value | humanize }}MB"

      - alert: DatabaseConnectionPoolExhausted
        expr: |
          pg_pool_active_connections / pg_pool_max_connections > 0.9
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Database connection pool near capacity"
          description: "{{ $value | humanizePercentage }} of connections in use"

For teams using Datadog instead of Prometheus, the agent generates the equivalent Datadog monitor configuration:

{
  "monitors": [
    {
      "name": "API Error Rate > 5%",
      "type": "metric alert",
      "query": "sum(last_5m):sum:http.requests.errors{service:api} / sum:http.requests.total{service:api} > 0.05",
      "message": "API error rate is above 5%. Check logs for details. @pagerduty-api-team",
      "options": {
        "thresholds": { "critical": 0.05, "warning": 0.02 },
        "notify_no_data": true,
        "no_data_timeframe": 10,
        "renotify_interval": 15
      }
    }
  ]
}

The alerting skill adapts its output to your monitoring platform. It also follows alerting best practices: critical alerts have short evaluation windows, warnings have longer ones to avoid noise, and every alert includes a descriptive message that helps the on-call engineer understand the problem without digging through dashboards.

Putting It All Together

A complete API testing and monitoring setup uses all five layers:

# Development: endpoint and contract tests run on every PR
openclaw skill install api-test-generator
openclaw skill install contract-testing

# Pre-release: load tests run before deployments
openclaw skill install load-test-patterns

# Production: health checks and alerts run continuously
openclaw skill install health-check-generator
openclaw skill install alerting-config

Each layer catches different failure modes. Endpoint tests catch logic bugs. Contract tests catch integration mismatches. Load tests catch performance regressions. Health checks catch infrastructure failures. Alerts catch everything else.

The result is an API you can deploy with confidence, knowing that problems will be caught at the right stage — and if something slips through to production, you will know about it before your users do.


Browse the Skills Directory

Find the right skill for your workflow. The OpenClaw Bazaar skills directory has over 2,300 community-rated skills — searchable, sortable, and free to install.

Browse Skills →

Try a Pre-Built Persona

Don't want to configure everything from scratch? OpenClaw personas come pre-loaded with skills, memory templates, and workflows designed for specific roles. Compare personas →