AI Web Search and Scraping(github.com)

2 pointsby larryste7 hours ago1 comment

larryste7 hours ago
# Show HN: web-search-tool – Search/scrape web with AI-friendly output
*Project:* https://github.com/larryste1/web-search-tool *PyPI:* https://pypi.org/project/web-search-tool/
## The Problem
Building AI assistants needs: reliable search with fallback, clean content extraction, API flexibility, and structured JSON output. Existing solutions are single-backend (breaks when API fails), too complex, or output raw HTML.
## The Solution
`web-search-tool` searches/scrapes with clean, AI-friendly output:
```bash pip install web-search-tool web-search "Python async best practices" # Search with AI answer web-search "React hooks tutorial" --scrape # Full article content web-search "machine learning" --include-domain arxiv.org # Filter domain web-search "API design" --json # JSON output ```
## Features
- *3 Backends with Auto-Fallback*: Tavily → Serper → DuckDuckGo - *Content Scraping*: Extract main article text via BeautifulSoup - *Domain Filtering*: Include/exclude specific domains - *Search Depth*: Basic or advanced - *AI-Friendly Output*: Structured results with optional AI answers - *JSON Output*: Pipe to jq or parse in scripts
## How It Works
``` Query → Tavily (AI, needs key) → Serper (Google, needs key) → DuckDuckGo (free) ```
## Examples
```bash # AI Research with Answer $ web-search "What is Rust ownership?"
Search: What is Rust ownership? Backend: Tavily Answer: Rust ownership manages memory allocation. Each value has one owner...
# Scrape Full Articles $ web-search "Python decorators" --scrape --num 3
# Domain-Specific $ web-search "type hints" --include-domain realpython.com --include-domain docs.python.org
# Programmatic Use from web_search_tool import search_web result = search_web("Python best practices", scrape_urls=True) ```
## API Keys
| Backend | Key | Get Key | |---------|-----|---------| | Tavily | Optional | https://tavily.com/ | | Serper | Optional | https://serper.dev/ | | DuckDuckGo | None | Free |
```bash export TAVILY_API_KEY=your-key-here export SERPER_API_KEY=your-key-here ```
Without keys, falls back to DuckDuckGo automatically.
## Why I Built This
Building AI assistants, I hit: single point of failure, messy output, no fallback. This tool tries multiple backends, extracts clean text, returns structured JSON, works without API keys.
## Tech Stack
Requests, BeautifulSoup4, Tavily API, Serper API, DuckDuckGo HTML
## Try It
```bash pip install web-search-tool web-search "Python tutorials" # No API key needed ```
*GitHub:* https://github.com/larryste1/web-search-tool
*Feedback:* What backends should I add? How do you handle web search in AI projects?
--- Built after too many API failures with single-backend tools.