url-fetcher
Simple web content fetching without API keys or external dependencies.
Setup & Installation
Install command
clawhub install johstracke/url-fetcherIf the CLI is not installed:
Install command
npx clawhub@latest install johstracke/url-fetcherOr install with OpenClaw CLI:
Install command
openclaw skills install johstracke/url-fetcheror paste the repo link into your assistant's chat
Install command
https://github.com/openclaw/skills/tree/main/skills/johstracke/url-fetcherWhat This Skill Does
Fetches web content using Python's standard library with no external dependencies or API keys. Supports HTML retrieval, basic HTML-to-markdown conversion, and safe file writes with path and URL validation built in.
No API key, no pip installs, and no cost make it the fastest way to add web fetching to an agent workflow with zero setup.
When to Use It
- Saving articles to local markdown files for offline reading
- Batch-downloading documentation pages into a workspace folder
- Extracting plain text from blog posts for further processing
- Previewing raw HTML of a URL before deciding to save it
- Collecting multiple research pages into organized local files
View original SKILL.md file
# URL Fetcher
Fetch web content without API keys or external dependencies. Uses Python standard library only.
## Quick Start
```bash
url_fetcher.py fetch <url>
url_fetcher.py fetch --markdown <url> [output_file]
```
**Examples:**
```bash
# Fetch and preview
url_fetcher.py fetch https://example.com
# Fetch and save HTML
url_fetcher.py fetch https://example.com ~/workspace/page.html
# Fetch and convert to basic markdown
url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md
```
## Features
- **No dependencies** - Uses Python stdlib (urllib) only
- **No API keys** - Completely free to use
- **URL validation** - Blocks localhost/internal networks
- **Basic markdown conversion** - Extract content from HTML
- **Path validation** - Safe file writes only (workspace, home, /tmp)
- **Error handling** - Timeout and network error handling
## When to Use
- **Content aggregation** - Collect pages for processing
- **Research collection** - Save articles/pages locally
- **Simple scraping** - Extract text from web pages
- **Markdown conversion** - Basic HTML to text/markdown
- **No-API alternatives** - When you can't use paid APIs
## Limitations
- **Basic markdown** - Simple regex-based conversion (not a full parser)
- **No JavaScript** - Only fetches static HTML
- **Rate limiting** - No built-in rate limiting (add your own if needed)
- **Bot detection** - Some sites may block the default User-Agent
## Security Features
### URL Validation
- ✅ Allows: http/https URLs
- ❌ Blocks: file://, data://, javascript: URLs
- ❌ Blocks: localhost, 127.0.0.1, ::1 (internal networks)
### File Path Validation
- ✅ Allows: workspace, home directory, /tmp
- ❌ Blocks: system paths (/etc, /usr, /var, etc.)
- ❌ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)
### Error Handling
- Timeout after 10 seconds
- HTTP error handling
- Network error handling
- Character encoding handling
## Usage Patterns
### Collecting Research
```bash
# Fetch multiple articles
url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md
url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md
# Convert to markdown for reading
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md
```
### Content Aggregation
```bash
# Fetch pages for processing
url_fetcher.py fetch https://news.example.com ~/workspace/content/latest.html
# Extract text
url_fetcher.py fetch --markdown https://blog.example.com ~/workspace/content/post.md
```
### Quick Preview
```bash
# Just preview content (no file save)
url_fetcher.py fetch https://example.com
```
## Advanced Usage
### Batch Fetching
```bash
#!/bin/bash
# batch_fetch.sh
URLS=(
"https://example.com/page1"
"https://example.com/page2"
"https://example.com/page3"
)
OUTPUT_DIR="$HOME/workspace/fetched"
mkdir -p "$OUTPUT_DIR"
for url in "${URLS[@]}"; do
filename=$(echo $url | sed 's|/||g')
url_fetcher.py fetch --markdown "$url" "$OUTPUT_DIR/$filename.md"
sleep 1 # Be nice to servers
done
```
### Integration with Other Skills
**Combine with research-assistant:**
```bash
# Fetch article
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/article.md
# Extract key points
# Then use research-assistant to organize findings
```
**Combine with task-runner:**
```bash
# Add task to fetch content
task_runner.py add "Fetch article on topic X" "research"
# Fetch when ready
url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md
```
## Troubleshooting
### Connection Timeout
```
Error: Request timeout after 10s
```
**Solution:** The server is slow or unreachable. Try again later or check the URL.
### HTTP 403/429 Errors
```
Error: HTTP 403: Forbidden
```
**Solution:** The site blocks automated requests. Try:
- Add delay between requests
- Use a different User-Agent (modify source)
- Respect robots.txt
- Consider using an API if available
### Encoding Issues
```
Error with special characters
```
**Solution:** The tool uses UTF-8 with error-ignore. Some characters may be lost.
### Markdown Quality
```
Note: Basic markdown extraction
```
**Solution:** This tool uses simple regex for HTML→MD conversion. For better results:
- Use dedicated markdown parsers
- Or post-process the output
- Or use a paid API with better parsing
## Best Practices
1. **Be respectful** - Add delays between requests (don't hammer servers)
2. **Check robots.txt** - Respect site's crawling policies
3. **Rate limit yourself** - Don't fetch too fast
4. **Validate URLs** - Only fetch from trusted sources
5. **Save safely** - Always use path-validated outputs
6. **Preview first** - Use preview mode before saving
## Integration Examples
### Python Integration
```python
from pathlib import Path
import subprocess
def fetch_and_process(url):
"""Fetch URL and process"""
output = Path.home() / "workspace" / "fetched" / "page.md"
output.parent.mkdir(parents=True, exist_ok=True)
# Fetch
subprocess.run([
"python3",
"/path/to/url_fetcher.py",
"fetch",
"--markdown",
url,
str(output)
])
# Process content
content = output.read_text()
return content
```
### Bash Integration
```bash
# Function for fetching
fetch_content() {
local url="$1"
local output="$2"
python3 ~/workspace/skills/url-fetcher/scripts/url_fetcher.py \
fetch --markdown "$url" "$output"
}
# Usage
fetch_content "https://example.com" ~/workspace/example.md
```
## Alternatives
### When You Need More Features
**For full-featured scraping:**
- Use `requests` + `beautifulsoup4` (requires pip install)
- Or use `scrapy` framework (requires pip install)
- Or use paid APIs (Firecrawl, Apify)
**For better markdown:**
- `markdownify` library (requires pip install)
- Or use AI-based parsing (OpenAI, Anthropic APIs)
**For complex workflows:**
- Browser automation (OpenClaw browser tool)
- Headless Chrome (Puppeteer, Playwright)
- Or use scraping APIs (Zyte, ScraperAPI)
## Zero-Cost Advantage
This skill requires:
- ✅ Python 3 (included with OpenClaw)
- ✅ No API keys
- ✅ No external packages
- ✅ No paid services
- ✅ No rate limiting (other than what you add)
Perfect for autonomous agents with budget constraints.
## Contributing
If you improve this skill, please:
1. Test with security-checker
2. Document new features
3. Publish to ClawHub with credit
## License
Use freely in your OpenClaw skills and workflows.
Example Workflow
Here's how your AI assistant might use this skill in practice.
User asks: Saving articles to local markdown files for offline reading
- 1Saving articles to local markdown files for offline reading
- 2Batch-downloading documentation pages into a workspace folder
- 3Extracting plain text from blog posts for further processing
- 4Previewing raw HTML of a URL before deciding to save it
- 5Collecting multiple research pages into organized local files
Simple web content fetching without API keys or external dependencies.
Security Audits
These signals reflect official OpenClaw status values. A Suspicious status means the skill should be used with extra caution.