gradient-inference
Community skill (unofficial) for DigitalOcean Gradient AI Serverless Inference.
Setup & Installation
Install command
clawhub install simondelorean/gradient-inferenceIf the CLI is not installed:
Install command
npx clawhub@latest install simondelorean/gradient-inferenceOr install with OpenClaw CLI:
Install command
openclaw skills install simondelorean/gradient-inferenceor paste the repo link into your assistant's chat
Install command
https://github.com/openclaw/skills/tree/main/skills/simondelorean/gradient-inferenceWhat This Skill Does
Connects to DigitalOcean's Gradient AI Serverless Inference API to run chat completions, generate images, and look up available models and pricing. The endpoint is OpenAI-compatible. Prompt caching is available via the Responses API to reduce costs on repeated context.
Because the endpoint is OpenAI-compatible, existing SDK code works with only a base URL change, and prompt caching via the Responses API reduces spend on follow-up queries that reuse the same context.
When to Use It
- Running LLM queries without managing GPU servers
- Checking model pricing before committing to a workload
- Generating images from text prompts via serverless API
- Using prompt caching to cut costs on repeated context
- Browsing and filtering available models before hardcoding an ID
View original SKILL.md file
# ๐ฆ Gradient AI โ Serverless Inference
> โ ๏ธ **This is an unofficial community skill**, not maintained by DigitalOcean. Use at your own risk.
> *"Why manage GPUs when the ocean provides?" โ ancient lobster proverb*
Use DigitalOcean's [Gradient Serverless Inference](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/) to call large language models without managing infrastructure. The API is **OpenAI-compatible**, so standard SDKs and patterns work โ just point at `https://inference.do-ai.run/v1` and swim.
## Authentication
All requests need a **Model Access Key** in the `Authorization: Bearer` header.
```bash
export GRADIENT_API_KEY="your-model-access-key"
```
**Where to get one:** [DigitalOcean Console](https://cloud.digitalocean.com) โ Gradient AI โ Model Access Keys โ Create Key.
๐ *[Full auth docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#create-a-model-access-key)*
---
## Tools
### ๐ List Available Models
Window-shop for LLMs before you swipe the card.
```bash
python3 gradient_models.py # Pretty table
python3 gradient_models.py --json # Machine-readable
python3 gradient_models.py --filter "llama" # Search by name
```
Use this before hardcoding model IDs โ models are added and deprecated over time.
**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/models \
-H "Authorization: Bearer $GRADIENT_API_KEY" | python3 -m json.tool
```
๐ *[Models reference](https://docs.digitalocean.com/products/gradient-ai-platform/details/models/)*
---
### ๐ฌ Chat Completions
The classic. Send structured messages (system/user/assistant roles), get a response. OpenAI-compatible, so you probably already know how this works.
```bash
python3 gradient_chat.py \
--model "openai-gpt-oss-120b" \
--system "You are a helpful assistant." \
--prompt "Explain serverless inference in one paragraph."
# Different model
python3 gradient_chat.py \
--model "llama3.3-70b-instruct" \
--prompt "Write a haiku about cloud computing."
```
**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/chat/completions \
-H "Authorization: Bearer $GRADIENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai-gpt-oss-120b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1000
}'
```
๐ *[Chat Completions docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#chat-completions)*
---
### โก Responses API (Recommended)
DigitalOcean's [recommended endpoint](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#responses-api) for new integrations. Simpler request format and supports **prompt caching** โ a.k.a. "stop paying twice for the same context."
```bash
# Basic usage
python3 gradient_chat.py \
--model "openai-gpt-oss-120b" \
--prompt "Summarize this earnings report." \
--responses-api
# With prompt caching (saves cost on follow-up queries)
python3 gradient_chat.py \
--model "openai-gpt-oss-120b" \
--prompt "Now compare it to last quarter." \
--responses-api --cache
```
**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/responses \
-H "Authorization: Bearer $GRADIENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai-gpt-oss-120b",
"input": "Explain prompt caching.",
"store": true
}'
```
**When to use which:**
| | Chat Completions | Responses API |
|---|---|---|
| **Request format** | Array of messages with roles | Single `input` string |
| **Prompt caching** | โ | โ
via `store: true` |
| **Multi-step tool use** | Manual | Built-in |
| **Best for** | Structured conversations | Simple queries, cost savings |
๐ *[Responses API docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#responses-api)*
---
### ๐ผ๏ธ Generate Images
Turn text prompts into images. Because sometimes a chart isn't enough.
```bash
python3 gradient_image.py --prompt "A lobster trading stocks on Wall Street"
python3 gradient_image.py --prompt "Sunset over the NYSE" --output sunset.png
python3 gradient_image.py --prompt "Fintech logo" --json
```
**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/images/generations \
-H "Authorization: Bearer $GRADIENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "dall-e-3",
"prompt": "A lobster analyzing candlestick charts",
"n": 1
}'
```
๐ *[Image generation docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#image-generation)*
---
## ๐ง Model Selection Guide
Not all models are created equal. Choose wisely, young crustacean:
| Model | Best For | Speed | Quality | Context |
|-------|----------|-------|---------|---------|
| `openai-gpt-oss-120b` | Complex reasoning, analysis, writing | Medium | โ
โ
โ
โ
โ
| 128K |
| `llama3.3-70b-instruct` | General tasks, instruction following | Fast | โ
โ
โ
โ
| 128K |
| `deepseek-r1-distill-llama-70b` | Math, code, step-by-step reasoning | Slow | โ
โ
โ
โ
โ
| 128K |
| `qwen3-32b` | Quick triage, short tasks | Fastest | โ
โ
โ
| 32K |
> **๐ฆ Pro tip: Cost-aware routing.** Use a fast model (e.g., `qwen3-32b`) to score or triage, then only escalate to a strong model (e.g., `openai-gpt-oss-120b`) when depth is needed. Enable prompt caching for repeated context.
Always run `python3 gradient_models.py` to check what's currently available โ the menu changes.
๐ *[Available models](https://docs.digitalocean.com/products/gradient-ai-platform/details/models/)*
---
### ๐ฐ Model Pricing Lookup
Check what models cost *before* you rack up a bill. Scrapes the official [DigitalOcean pricing page](https://docs.digitalocean.com/products/gradient-ai-platform/details/pricing/) โ no API key needed.
```bash
python3 gradient_pricing.py # Pretty table
python3 gradient_pricing.py --json # Machine-readable
python3 gradient_pricing.py --model "llama" # Filter by model name
python3 gradient_pricing.py --no-cache # Skip cache, fetch live
```
**How it works:**
- Fetches live pricing from DigitalOcean's docs (public page, no auth)
- Caches results for 24 hours in `/tmp/gradient_pricing_cache.json`
- Falls back to a bundled snapshot if the live fetch fails
> **๐ฆ Pro tip:** Run `python3 gradient_pricing.py --model "gpt-oss"` before choosing a model to see the cost difference between `gpt-oss-120b` ($0.10/$0.70) and `gpt-oss-20b` ($0.05/$0.45) per 1M tokens.
๐ *[Pricing docs](https://docs.digitalocean.com/products/gradient-ai-platform/details/pricing/)*
---
## CLI Reference
All scripts accept `--json` for machine-readable output.
```
gradient_models.py [--json] [--filter QUERY]
gradient_chat.py --prompt TEXT [--model ID] [--system TEXT]
[--responses-api] [--cache] [--temperature F]
[--max-tokens N] [--json]
gradient_image.py --prompt TEXT [--model ID] [--output PATH]
[--size WxH] [--json]
gradient_pricing.py [--json] [--model QUERY] [--no-cache]
```
## External Endpoints
| Endpoint | Purpose |
|----------|---------|
| `https://inference.do-ai.run/v1/models` | List available models |
| `https://inference.do-ai.run/v1/chat/completions` | Chat Completions API |
| `https://inference.do-ai.run/v1/responses` | Responses API (recommended) |
| `https://inference.do-ai.run/v1/images/generations` | Image generation |
| `https://docs.digitalocean.com/.../pricing/` | Pricing page (scraped, public) |
## Security & Privacy
- All requests go to `inference.do-ai.run` โ DigitalOcean's own endpoint
- Your `GRADIENT_API_KEY` is sent as a Bearer token in the Authorization header
- No other credentials or local data leave the machine
- Model Access Keys are scoped to inference only โ they can't manage your DO account
- Prompt caching entries are scoped to your account and automatically expire
## Trust Statement
> By using this skill, prompts and data are sent to DigitalOcean's Gradient Inference API.
> Only install if you trust DigitalOcean with the content you send to their LLMs.
## Important Notes
- Run `python3 gradient_models.py` before assuming a model exists โ they rotate
- All scripts exit with code 1 and print errors to stderr on failure
Example Workflow
Here's how your AI assistant might use this skill in practice.
User asks: Running LLM queries without managing GPU servers
- 1Running LLM queries without managing GPU servers
- 2Checking model pricing before committing to a workload
- 3Generating images from text prompts via serverless API
- 4Using prompt caching to cut costs on repeated context
- 5Browsing and filtering available models before hardcoding an ID
Community skill (unofficial) for DigitalOcean Gradient AI Serverless Inference.
Security Audits
These signals reflect official OpenClaw status values. A Suspicious status means the skill should be used with extra caution.