2 pointsby larryste7 hours ago1 comment
  • larryste7 hours ago
    # Show HN: web-search-tool – Search/scrape web with AI-friendly output

    *Project:* https://github.com/larryste1/web-search-tool *PyPI:* https://pypi.org/project/web-search-tool/

    ## The Problem

    Building AI assistants needs: reliable search with fallback, clean content extraction, API flexibility, and structured JSON output. Existing solutions are single-backend (breaks when API fails), too complex, or output raw HTML.

    ## The Solution

    `web-search-tool` searches/scrapes with clean, AI-friendly output:

    ```bash pip install web-search-tool web-search "Python async best practices" # Search with AI answer web-search "React hooks tutorial" --scrape # Full article content web-search "machine learning" --include-domain arxiv.org # Filter domain web-search "API design" --json # JSON output ```

    ## Features

    - *3 Backends with Auto-Fallback*: Tavily → Serper → DuckDuckGo - *Content Scraping*: Extract main article text via BeautifulSoup - *Domain Filtering*: Include/exclude specific domains - *Search Depth*: Basic or advanced - *AI-Friendly Output*: Structured results with optional AI answers - *JSON Output*: Pipe to jq or parse in scripts

    ## How It Works

    ``` Query → Tavily (AI, needs key) → Serper (Google, needs key) → DuckDuckGo (free) ```

    ## Examples

    ```bash # AI Research with Answer $ web-search "What is Rust ownership?"

    Search: What is Rust ownership? Backend: Tavily Answer: Rust ownership manages memory allocation. Each value has one owner...

    # Scrape Full Articles $ web-search "Python decorators" --scrape --num 3

    # Domain-Specific $ web-search "type hints" --include-domain realpython.com --include-domain docs.python.org

    # Programmatic Use from web_search_tool import search_web result = search_web("Python best practices", scrape_urls=True) ```

    ## API Keys

    | Backend | Key | Get Key | |---------|-----|---------| | Tavily | Optional | https://tavily.com/ | | Serper | Optional | https://serper.dev/ | | DuckDuckGo | None | Free |

    ```bash export TAVILY_API_KEY=your-key-here export SERPER_API_KEY=your-key-here ```

    Without keys, falls back to DuckDuckGo automatically.

    ## Why I Built This

    Building AI assistants, I hit: single point of failure, messy output, no fallback. This tool tries multiple backends, extracts clean text, returns structured JSON, works without API keys.

    ## Tech Stack

    Requests, BeautifulSoup4, Tavily API, Serper API, DuckDuckGo HTML

    ## Try It

    ```bash pip install web-search-tool web-search "Python tutorials" # No API key needed ```

    *GitHub:* https://github.com/larryste1/web-search-tool

    *Feedback:* What backends should I add? How do you handle web search in AI projects?

    --- Built after too many API failures with single-backend tools.