gradient-inference

DevOps & Cloud
v0.1.3
Benign

Community skill (unofficial) for DigitalOcean Gradient AI Serverless Inference.

478 downloads478 installsby @simondelorean

Setup & Installation

Install command

clawhub install simondelorean/gradient-inference

If the CLI is not installed:

Install command

npx clawhub@latest install simondelorean/gradient-inference

Or install with OpenClaw CLI:

Install command

openclaw skills install simondelorean/gradient-inference

or paste the repo link into your assistant's chat

Install command

https://github.com/openclaw/skills/tree/main/skills/simondelorean/gradient-inference

What This Skill Does

Connects to DigitalOcean's Gradient AI Serverless Inference API to run chat completions, generate images, and look up available models and pricing. The endpoint is OpenAI-compatible. Prompt caching is available via the Responses API to reduce costs on repeated context.

Because the endpoint is OpenAI-compatible, existing SDK code works with only a base URL change, and prompt caching via the Responses API reduces spend on follow-up queries that reuse the same context.

When to Use It

  • Running LLM queries without managing GPU servers
  • Checking model pricing before committing to a workload
  • Generating images from text prompts via serverless API
  • Using prompt caching to cut costs on repeated context
  • Browsing and filtering available models before hardcoding an ID
View original SKILL.md file
# ๐Ÿฆž Gradient AI โ€” Serverless Inference

> โš ๏ธ **This is an unofficial community skill**, not maintained by DigitalOcean. Use at your own risk.

> *"Why manage GPUs when the ocean provides?" โ€” ancient lobster proverb*

Use DigitalOcean's [Gradient Serverless Inference](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/) to call large language models without managing infrastructure. The API is **OpenAI-compatible**, so standard SDKs and patterns work โ€” just point at `https://inference.do-ai.run/v1` and swim.

## Authentication

All requests need a **Model Access Key** in the `Authorization: Bearer` header.

```bash
export GRADIENT_API_KEY="your-model-access-key"
```

**Where to get one:** [DigitalOcean Console](https://cloud.digitalocean.com) โ†’ Gradient AI โ†’ Model Access Keys โ†’ Create Key.

๐Ÿ“– *[Full auth docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#create-a-model-access-key)*

---

## Tools

### ๐Ÿ” List Available Models

Window-shop for LLMs before you swipe the card.

```bash
python3 gradient_models.py                    # Pretty table
python3 gradient_models.py --json             # Machine-readable
python3 gradient_models.py --filter "llama"   # Search by name
```

Use this before hardcoding model IDs โ€” models are added and deprecated over time.

**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/models \
  -H "Authorization: Bearer $GRADIENT_API_KEY" | python3 -m json.tool
```

๐Ÿ“– *[Models reference](https://docs.digitalocean.com/products/gradient-ai-platform/details/models/)*

---

### ๐Ÿ’ฌ Chat Completions

The classic. Send structured messages (system/user/assistant roles), get a response. OpenAI-compatible, so you probably already know how this works.

```bash
python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --system "You are a helpful assistant." \
  --prompt "Explain serverless inference in one paragraph."

# Different model
python3 gradient_chat.py \
  --model "llama3.3-70b-instruct" \
  --prompt "Write a haiku about cloud computing."
```

**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-120b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'
```

๐Ÿ“– *[Chat Completions docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#chat-completions)*

---

### โšก Responses API (Recommended)

DigitalOcean's [recommended endpoint](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#responses-api) for new integrations. Simpler request format and supports **prompt caching** โ€” a.k.a. "stop paying twice for the same context."

```bash
# Basic usage
python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --prompt "Summarize this earnings report." \
  --responses-api

# With prompt caching (saves cost on follow-up queries)
python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --prompt "Now compare it to last quarter." \
  --responses-api --cache
```

**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-120b",
    "input": "Explain prompt caching.",
    "store": true
  }'
```

**When to use which:**
| | Chat Completions | Responses API |
|---|---|---|
| **Request format** | Array of messages with roles | Single `input` string |
| **Prompt caching** | โŒ | โœ… via `store: true` |
| **Multi-step tool use** | Manual | Built-in |
| **Best for** | Structured conversations | Simple queries, cost savings |

๐Ÿ“– *[Responses API docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#responses-api)*

---

### ๐Ÿ–ผ๏ธ Generate Images

Turn text prompts into images. Because sometimes a chart isn't enough.

```bash
python3 gradient_image.py --prompt "A lobster trading stocks on Wall Street"
python3 gradient_image.py --prompt "Sunset over the NYSE" --output sunset.png
python3 gradient_image.py --prompt "Fintech logo" --json
```

**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/images/generations \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A lobster analyzing candlestick charts",
    "n": 1
  }'
```

๐Ÿ“– *[Image generation docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#image-generation)*

---

## ๐Ÿง  Model Selection Guide

Not all models are created equal. Choose wisely, young crustacean:

| Model | Best For | Speed | Quality | Context |
|-------|----------|-------|---------|---------|
| `openai-gpt-oss-120b` | Complex reasoning, analysis, writing | Medium | โ˜…โ˜…โ˜…โ˜…โ˜… | 128K |
| `llama3.3-70b-instruct` | General tasks, instruction following | Fast | โ˜…โ˜…โ˜…โ˜… | 128K |
| `deepseek-r1-distill-llama-70b` | Math, code, step-by-step reasoning | Slow | โ˜…โ˜…โ˜…โ˜…โ˜… | 128K |
| `qwen3-32b` | Quick triage, short tasks | Fastest | โ˜…โ˜…โ˜… | 32K |

> **๐Ÿฆž Pro tip: Cost-aware routing.** Use a fast model (e.g., `qwen3-32b`) to score or triage, then only escalate to a strong model (e.g., `openai-gpt-oss-120b`) when depth is needed. Enable prompt caching for repeated context.

Always run `python3 gradient_models.py` to check what's currently available โ€” the menu changes.

๐Ÿ“– *[Available models](https://docs.digitalocean.com/products/gradient-ai-platform/details/models/)*

---

### ๐Ÿ’ฐ Model Pricing Lookup

Check what models cost *before* you rack up a bill. Scrapes the official [DigitalOcean pricing page](https://docs.digitalocean.com/products/gradient-ai-platform/details/pricing/) โ€” no API key needed.

```bash
python3 gradient_pricing.py                    # Pretty table
python3 gradient_pricing.py --json             # Machine-readable
python3 gradient_pricing.py --model "llama"    # Filter by model name
python3 gradient_pricing.py --no-cache         # Skip cache, fetch live
```

**How it works:**
- Fetches live pricing from DigitalOcean's docs (public page, no auth)
- Caches results for 24 hours in `/tmp/gradient_pricing_cache.json`
- Falls back to a bundled snapshot if the live fetch fails

> **๐Ÿฆž Pro tip:** Run `python3 gradient_pricing.py --model "gpt-oss"` before choosing a model to see the cost difference between `gpt-oss-120b` ($0.10/$0.70) and `gpt-oss-20b` ($0.05/$0.45) per 1M tokens.

๐Ÿ“– *[Pricing docs](https://docs.digitalocean.com/products/gradient-ai-platform/details/pricing/)*

---

## CLI Reference

All scripts accept `--json` for machine-readable output.

```
gradient_models.py   [--json] [--filter QUERY]
gradient_chat.py     --prompt TEXT [--model ID] [--system TEXT]
                     [--responses-api] [--cache] [--temperature F]
                     [--max-tokens N] [--json]
gradient_image.py    --prompt TEXT [--model ID] [--output PATH]
                     [--size WxH] [--json]
gradient_pricing.py  [--json] [--model QUERY] [--no-cache]
```

## External Endpoints

| Endpoint | Purpose |
|----------|---------|
| `https://inference.do-ai.run/v1/models` | List available models |
| `https://inference.do-ai.run/v1/chat/completions` | Chat Completions API |
| `https://inference.do-ai.run/v1/responses` | Responses API (recommended) |
| `https://inference.do-ai.run/v1/images/generations` | Image generation |
| `https://docs.digitalocean.com/.../pricing/` | Pricing page (scraped, public) |

## Security & Privacy

- All requests go to `inference.do-ai.run` โ€” DigitalOcean's own endpoint
- Your `GRADIENT_API_KEY` is sent as a Bearer token in the Authorization header
- No other credentials or local data leave the machine
- Model Access Keys are scoped to inference only โ€” they can't manage your DO account
- Prompt caching entries are scoped to your account and automatically expire

## Trust Statement

> By using this skill, prompts and data are sent to DigitalOcean's Gradient Inference API.
> Only install if you trust DigitalOcean with the content you send to their LLMs.

## Important Notes

- Run `python3 gradient_models.py` before assuming a model exists โ€” they rotate
- All scripts exit with code 1 and print errors to stderr on failure

Example Workflow

Here's how your AI assistant might use this skill in practice.

INPUT

User asks: Running LLM queries without managing GPU servers

AGENT
  1. 1Running LLM queries without managing GPU servers
  2. 2Checking model pricing before committing to a workload
  3. 3Generating images from text prompts via serverless API
  4. 4Using prompt caching to cut costs on repeated context
  5. 5Browsing and filtering available models before hardcoding an ID
OUTPUT
Community skill (unofficial) for DigitalOcean Gradient AI Serverless Inference.

Share this skill

Security Audits

VirusTotalBenign
OpenClawBenign
View full report

These signals reflect official OpenClaw status values. A Suspicious status means the skill should be used with extra caution.

Details

LanguageMarkdown
Last updatedFeb 26, 2026