webscraper-pulpminer

Web & Frontend Development
v1.0.1
Benign

Convert any webpage into structured JSON data using AI.

4893 downloads893 installsby @melvin2016

Setup & Installation

Install command

clawhub install melvin2016/webscraper-pulpminer

If the CLI is not installed:

Install command

npx clawhub@latest install melvin2016/webscraper-pulpminer

Or install with OpenClaw CLI:

Install command

openclaw skills install melvin2016/webscraper-pulpminer

or paste the repo link into your assistant's chat

Install command

https://github.com/openclaw/skills/tree/main/skills/melvin2016/webscraper-pulpminer

What This Skill Does

PulpMiner converts any webpage into structured JSON using AI. You provide a URL and an optional JSON schema, and it scrapes the page, processes it through an LLM, and returns clean data. APIs can be saved and called programmatically with static or dynamic URLs.

Handles JavaScript-rendered SPAs and lets you define exact output schemas, which removes the need to write and maintain custom scraper code.

When to Use It

  • Track competitor prices across e-commerce pages
  • Extract lead contact info from company directories
  • Monitor job listings for new postings
  • Pull structured product data from vendor catalogs
  • Scrape paginated search results with dynamic query variables
View original SKILL.md file
# PulpMiner — AI Web Scraping & JSON API

PulpMiner converts any webpage into structured JSON using AI. You provide a URL and optionally a JSON template, and PulpMiner scrapes the page, runs it through an LLM, and returns clean structured data.

## Authentication

All API calls require the `apikey` header:

```
apikey: <PULPMINER_API_KEY>
```

Get your API key from https://pulpminer.com/api — click "Regenerate Key" if you don't have one.

## Core Workflow

PulpMiner works in two phases:

1. **Create a saved API** — Configure a URL, scraper, LLM, and optional JSON template via the PulpMiner dashboard at https://pulpminer.com/api
2. **Call the saved API** — Use the external endpoint with your API key to fetch structured JSON

## Calling a Saved API

### Static API (fixed URL)

```bash
curl -X GET "https://api.pulpminer.com/external/<apiId>" \
  -H "apikey: <PULPMINER_API_KEY>"
```

Returns JSON extracted from the configured webpage.

### Dynamic API (URL with variables)

For APIs saved with template URLs like `https://example.com/search?q={{query}}&page={{page}}`:

```bash
curl -X POST "https://api.pulpminer.com/external/<apiId>" \
  -H "apikey: <PULPMINER_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"query": "javascript frameworks", "page": "1"}'
```

The `{{variable}}` placeholders in the saved URL get replaced with the values you provide.

## Response Format

Successful responses return:

```json
{
  "data": { ... },
  "errors": null
}
```

Error responses return:

```json
{
  "data": null,
  "errors": "Error message describing what went wrong"
}
```

## Caching

- API responses are cached for **24 hours** by default
- If cache is older than 15 minutes, PulpMiner serves the cached version while refreshing in the background
- Cache can be disabled per-API in the dashboard settings

## Configuration Options (Set in Dashboard)

When creating a saved API at https://pulpminer.com/api, you can configure:

| Option | Description |
|--------|-------------|
| **URL** | The webpage to scrape |
| **JSON Template** | Optional JSON structure for the LLM to follow (e.g., `{"name": "", "price": ""}`) |
| **Render JS** | Enable for SPAs and JS-heavy pages (uses headless browser) |
| **CSS Selector** | Extract only a specific part of the page (e.g., `.product-list`, `#main-content`) |
| **Extra Instructions** | Additional guidance for the AI (e.g., "Only extract items with prices above $50") |
| **Dynamic URL** | Enable template variables in the URL with `{{variable}}` syntax |
| **Cache** | Toggle response caching on/off |

## Integration with Zapier

For async scraping in Zapier workflows:

```bash
# Static API
curl -X POST "https://api.pulpminer.com/external/zapier/get/<apiId>" \
  -H "apikey: <PULPMINER_API_KEY>" \
  -d '{"callbackURL": "https://hooks.zapier.com/..."}'

# Dynamic API
curl -X POST "https://api.pulpminer.com/external/zapier/post/<apiId>" \
  -H "apikey: <PULPMINER_API_KEY>" \
  -d '{"callbackURL": "https://hooks.zapier.com/...", "query": "value"}'
```

Returns `201` immediately. Sends scraped data to the callback URL when complete.

## Integration with n8n

Verify authentication:

```bash
curl -X GET "https://api.pulpminer.com/external/n8n/auth" \
  -H "apikey: <PULPMINER_API_KEY>"
```

Then use the standard `/external/<apiId>` endpoints for data fetching.

## Credits

- Each API call costs **0.25–0.4 credits** depending on the endpoint
- JavaScript rendering adds **0.1 credits** extra
- New users get **5 free credits**
- Purchase more at https://pulpminer.com/credits

## Tips

- Use **CSS selectors** to narrow down the scraped content and improve accuracy
- Provide a **JSON template** for consistent, predictable output structures
- Enable **JS rendering** only when needed — static pages scrape faster and cost fewer credits
- Use **extra instructions** to guide the AI (e.g., "Return dates in ISO 8601 format")
- For monitoring use cases, keep **caching enabled** to reduce credit usage
- Use the **playground** first to verify a URL is scrapable before saving an API config
- Dynamic APIs are ideal for search pages, paginated content, and parameterized URLs

## Links

- Website: https://pulpminer.com
- API Dashboard: https://pulpminer.com/api

Example Workflow

Here's how your AI assistant might use this skill in practice.

INPUT

User asks: Track competitor prices across e-commerce pages

AGENT
  1. 1Track competitor prices across e-commerce pages
  2. 2Extract lead contact info from company directories
  3. 3Monitor job listings for new postings
  4. 4Pull structured product data from vendor catalogs
  5. 5Scrape paginated search results with dynamic query variables
OUTPUT
Convert any webpage into structured JSON data using AI.

Share this skill

Security Audits

VirusTotalBenign
OpenClawBenign
View full report

These signals reflect official OpenClaw status values. A Suspicious status means the skill should be used with extra caution.

Details

LanguageMarkdown
Last updatedFeb 26, 2026