1 pointby jimmyjonny8 hours ago1 comment
  • jimmyjonny8 hours ago
    This is the most critical post you will make. Hacker News (HN) can crash your server with traffic if you get to the front page, so be ready.

    The Golden Rule of HN: Do not "market." Explain how you built it. They care about the architecture, the code, and the hardware—not the "product benefits."

    Here is the exact template to use. The Submission Fields

    Title:

        Show HN: I built a public web interface that tunnels to my home RTX 3090 for inference
    
    Url:

        [Your Render Link]
    
    Text (The "First Comment"): (You must post this immediately after submitting the link. This is where you win them over.)

    Hello HN,

    I wanted to run a privacy-focused LLM service without paying for H100s or leaking data to OpenAI, so I built a hybrid architecture.

    The Architecture:

        Frontend/Gateway: Hosted on Render. This handles the public traffic, auth, and UI.
    
        Inference: Hosted under my desk on my personal rig (RTX 3090).
    
        The Tunnel: The Render server acts as a dumb WebSocket relay. It accepts user prompts, encrypts them, and tunnels them to my local Python worker.
    
        Processing: My local worker uses Ollama for the LLM (Qwen 2.5 14B/32B) and Playwright for live web scraping/RAG.
    
    The Stack:

        Backend: FastAPI + Python
    
        Protocol: Secure WebSockets (WSS)
    
        Inference: Ollama (Local)
    
        RAG: PyMuPDF (for PDF analysis) + Playwright (for "Deep Research" browsing)
    
    Why I did this: I wanted "Sovereign AI." The cloud server never stores the unencrypted chat logs; it just passes packets. The actual intelligence lives on my hardware. It’s essentially a free way to expose a local LLM to the web for personal use (and for friends).