Show HN: I taught GPT-OSS-120B to see using Google Lens and OpenCV

26 pointsby vkaufmann5 hours ago5 comments

l1am0an hour ago
I don't get this. Isn't this the same as saying "I taught my 5 year old to calculate integrals, by typing them into Wolfram Alpha"...so the actual relevant cognitive task (integrals in my example, "seeing" in yours) is outsources to an external API.
Why do I need gpt-oss-120B at all in this scenario? Couldn't I just directly call e.g. gemini-3-pro api from the python script?
- reedf113 minutes ago
  'Calculating' an integral, is usually done by applying a series of sort of abstract mathematical tricks. There is usually no deeper meaning applied to the solving. If you have profound intuition you can guess the solution to an integral, by 'inspection'.
  What part here is the knowing or understanding? Does solving an integral symbolically provide more knowledge than numerically or otherwise?
  Understanding the underlying functions themselves and the areas they sweep; has substitution or by-parts, actually provided you with this?
magic_hamster2 hours ago
> GPT-OSS-120B, a text-only model with zero vision support, correctly identified an NVIDIA DGX Spark and a SanDisk USB drive from a desk photo.
But wasn't it Google Lens that actually identified them?
N_Lens3 hours ago
Looks like a TOS violation to me to scrape google directly like that. While the concept of giving a text only model 'pseudo vision' is clever, I think the solution in its current form is a bit fragile. The SerpAPI, Google Custom Search API, etc. exist for a reason; For anything beyond personal tinkering, this is a liability.
- embedding-shape2 hours ago
  > Looks like a TOS violation to me to scrape google directly like that
  If something was built by violating TOS' and you use that to do more TOS violations against the ones who initially did the TOS violations to build the thing, do they cancel out each other?
  Not about GPT-OSS specifically, but say you used Gemma for the same purpose instead for this hypothetical.
- speedgoose3 hours ago
  Isn’t SerpAPI about scraping Google through residential proxies as a service ?
  - peddling-brink3 hours ago
    And they are getting sued. https://blog.google/innovation-and-ai/technology/safety-secu...
    interloxia2 hours ago
    105 comments from a few months ago.
    https://news.ycombinator.com/item?id=46329109
    mt42or2 hours ago
    All ia models are built on stolen data, that's fair war.
tanduv3 hours ago
you eventually get hit with captcha with the playwright approach
TZubiri4 hours ago
have you tried Llama? In my experience it has been strictly better than GPT OSS, but it might depend on specifically how it is used.
- embedding-shape2 hours ago
  Have you tried GPT-OSS-120b MXFP4 with reasoning effort set to high? Out of all models I can run within 96GB, it seems to consistently give better results. What exact llama model (+ quant I suppose) is it that you've had better results against, and what did you compare it against, the 120b or 20b variant?
  - magic_hamster2 hours ago
    How are you running this? I've had issues with Opencode formulating bad messages when the model runs on llama.cpp. Jinja threw a bunch of errors and GPT-OSS couldn't make tool calls. There's an issue for this on Opencode's repo but seems like it's been waiting or a couple of weeks.
    > What exact llama model (+ quant I suppose) is it that you've had better results against
    Not llama, but Qwen3-coder-next is on top of my list right now. Q8_K_XL. It's incredible (not just for coding).
    embedding-shapean hour ago
    Again, you're not specifying what GPT-OSS you're talking about, there are two versions, 20b and 120b. Not to mention if you have a consumer GPU, you're most likely running it with additional quantization too, but you're not saying what version.
    > Jinja threw a bunch of errors and GPT-OSS couldn't make tool calls.
    This was an issue for a week or two when GPT-OSS initially launched, as none of the inference engines had properly implemented support for it, especially around tool calling. I'm running GPT-OSS-120b MXFP4 with LM Studio and directly with llama.cpp, the recent versions handle it well and I have no errors.
    However, when I've tried either 120b or 20b with additional quantization (not the "native" MXFP4 ones), I've seen that they're having troubles with the tool syntax too.
    > Not llama
    What does your original comment mean then? You said llama was "strictly" better than GPT-OSS, which specific model variant are you talking about or you miswrote somehow?