curated-search
Domain-restricted full-text search over curated technical documentation.
Setup & Installation
Install command
clawhub install qsmtco/curated-searchIf the CLI is not installed:
Install command
npx clawhub@latest install qsmtco/curated-searchOr install with OpenClaw CLI:
Install command
openclaw skills install qsmtco/curated-searchor paste the repo link into your assistant's chat
Install command
https://github.com/openclaw/skills/tree/main/skills/qsmtco/curated-searchWhat This Skill Does
Domain-restricted full-text search over a curated whitelist of technical documentation sources such as MDN and the Python docs. After an initial crawl builds a local index, all searches run offline with no external network calls. Results include title, URL, snippet, and a BM25 relevance score.
Queries run fully offline against a pre-approved domain whitelist, so results come only from authoritative sources rather than general web search noise.
When to Use It
- Looking up CSS property syntax without SEO-spam results
- Searching Python docs offline during travel
- Filtering documentation queries to a single trusted domain
- Paginating through large result sets inside an agent workflow
- Checking async/await behavior across MDN and Python references in one query
View original SKILL.md file
# Curated Search Skill
## Summary
Domain-restricted full-text search over a curated whitelist of technical documentation (MDN, Python docs, etc.). Provides clean, authoritative results without web spam.
## External Endpoints
This skill does **not** call any external network endpoints during search operations. The crawler optionally makes outbound HTTP requests during index builds (one‑time setup), but those are user‑initiated (`npm run crawl`) and respect the configured domain whitelist.
## Security & Privacy
- **Search is fully local** – After the index is built, all queries run offline; no data leaves your machine.
- **Crawling is optional and whitelist‑scoped** – The crawler only accesses domains you explicitly list in `config.yaml`. It respects `robots.txt` and configurable delays.
- **No telemetry** – No usage data is transmitted externally.
- **Configuration** is read from local `config.yaml` and the index file in `data/`.
## Model Invocation Note
The `curated-search.search` tool is invoked **only when the user explicitly calls it**. It does not run autonomously. OpenClaw calls the tool handler (`scripts/search.js`) when the user asks to search the curated index.
## Trust Statement
By using this skill, you trust that the code operates locally and only crawls domains you approve. The skill does not send your queries or workspace data to any third party. Review the open‑source implementation before installing.
---
## Tool: curated-search.search
Search the curated index.
### Parameters
| Name | Type | Required | Default | Description |
|------|------|----------|---------|-------------|
| `query` | string | yes | — | Search query terms |
| `limit` | number | no | 5 | Maximum results (capped by config.max_limit, typically 100) |
| `domain` | string | no | null | Filter to specific domain (e.g., `docs.python.org`) |
| `min_score` | number | no | 0.0 | Minimum relevance score (0.0–1.0); filters out low-quality matches |
| `offset` | number | no | 0 | Pagination offset (skip first N results) |
### Response
JSON array of result objects:
```json
[
{
"title": "Python Tutorial",
"url": "https://docs.python.org/3/tutorial/",
"snippet": "Python is an easy to learn, powerful programming language...",
"domain": "docs.python.org",
"score": 0.87,
"crawled_at": 1707712345678
}
]
```
**Fields:**
- `title` — Document title (cleaned)
- `url` — Source URL (canonical)
- `snippet` — Excerpt (~200 chars) from content
- `domain` — Hostname of source
- `score` — BM25 relevance score (higher is better; not normalized 0–1 but typically 0–1 range)
- `crawled_at` — Unix timestamp when page was crawled
### Example Agent Calls
```
search CuratedSearch for "python tutorial"
search CuratedSearch for "async await" limit=3 domain=developer.mozilla.org
search CuratedSearch for "linux man page" min_score=0.3
```
### Errors
If an error occurs, the tool exits non-zero and prints a JSON error object to stderr, e.g.:
```json
{
"error": "index_not_found",
"message": "Search index not found. The index has not been built yet.",
"suggestion": "Run the crawler first: npm run crawl",
"details": { "path": "data/index.json" }
}
```
Common error codes:
| Code | Meaning | Suggested Fix |
|------|---------|---------------|
| `config_missing` | Configuration file not found | Specify `--config` path or ensure config.yaml exists |
| `config_invalid` | YAML parsing failed | Check syntax in config.yaml |
| `config_missing_index_path` | `index.path` not set | Add index.path to config |
| `index_not_found` | Index file missing | Run `npm run crawl` to build index |
| `index_corrupted` | Index file incompatible or corrupted | Rebuild index with `npm run crawl` |
| `index_init_failed` | Unexpected index initialization error | Check permissions, reinstall dependencies |
| `missing_query` | No query provided | Provide `--query` argument |
| `query_too_long` | Query exceeds 1000 characters | Shorten the query |
| `limit_exceeded` | Limit > config.max_limit | Use a smaller limit |
| `invalid_domain` | Domain filter malformed | Use format like `docs.python.org` |
| `conflicting_flags` | Mutually exclusive flags used (e.g., `--stats` with `--query`) | Use flags correctly |
| `stats_failed` | Could not retrieve index stats | Ensure index is accessible |
| `search_failed` | Search execution threw an error | Check query and index integrity |
## Configuration
Edit `config.yaml` in the skill directory. Key sections:
- `domains` — whitelist of allowed domains (required)
- `seeds` — starting URLs for crawling
- `crawl` — depth, delay, timeout, max_documents
- `content` — min_content_length, max_content_length
- `index` — path to index files
- `search` — default_limit, max_limit, min_score
See `README.md` for full configuration docs.
## Support
- Full documentation: `README.md`
- Technical specs: `specs/`
- Build plan: `PLAN.md`
- Contributor guide: `CONTRIBUTING.md`
- Issues: Report on GitHub (or via OpenClaw maintainers)
Example Workflow
Here's how your AI assistant might use this skill in practice.
User asks: Looking up CSS property syntax without SEO-spam results
- 1Looking up CSS property syntax without SEO-spam results
- 2Searching Python docs offline during travel
- 3Filtering documentation queries to a single trusted domain
- 4Paginating through large result sets inside an agent workflow
- 5Checking async/await behavior across MDN and Python references in one query
Domain-restricted full-text search over curated technical documentation.
Security Audits
These signals reflect official OpenClaw status values. A Suspicious status means the skill should be used with extra caution.