Visually I also often confuse rho and sigma, and math texts will use psi ψ and phi φ in weird fonts and I can never tell them apart.
Now, that being said, I don't really care about all of this.
The USA population is equivalent to approximately 4% of the total world population.
I am sure people here see it better than I do, so what new class of problems is this genAI going to solve?
They are at best nice features or capabilities to have in wider enterprise application suite for say HR or Accountancy but on their own its just a lot of smart people working extremely hard to sell one tiny solution that is small cogwheel for a much larger problem.
My gut is telling me that very soon, if not now, there will be an opportunity for savvy VCs to sweep up some of these cogwheels and package them together into a real business and it's something I am exploring with a few other angels. Curious what others think on this. Feel free to DM me (details on profile).
I was involved very early with automated speech recognition for transcribing meetings, but then both UberConference and Google Meet just integrated it into their existing offerings, thereby massively reducing the market size for standalone solutions. And given how heavily subsidized AI API calls are at the moment, just relying on them is a huge risk for your business model, because you never know when your suppliers' prices will 10x to represent the true cost of providing those AI services.
In my opinion, the sales of many of these new AI tools are mostly driven by the existing audience of the creator. In many cases, you could just ask an LLM to quickly build you your own competing solution, which you can then use for free. E.g. all those SEO content and LinkedIn blog post bots. Vibe-coded AI "businesses" are the software equivalent of branded white t-shirts.
Now that ChatGPT desktop can read files in my code editor and apply changes I've pretty much stopped using dev specific AI tools. Same with spreadsheet problems - where uploading data to ChatGPT and hoping for the best actually works pretty well now.
After all, what is the business of such startups of the "AI age"? It's using the AI models on the backend, where users can't reach, to sprinkle some magic onto features. But as a user, I don't want your product to use AI as a tool, I want your product to be a tool the AI uses. The former keeps power away from users, the latter gives it back to them.
At work, I was tasked with building some magical agentic stuff, working on it for a while, I realized that, HN shouts, oAI/xAI/Google/Amazon/Anthropic have no moats, there are oss models available. The actual fact is, the moat is the access to scale of resources(GPUs, power infra, network), which is very difficult to build for random joe's startup.
You must always rent a model access from one of these players(even OpenRouter delegates), and that is the exact moat.
GenAI solves problems of having more generalized solutions, so instead of a super customized secret sauce solutions for your product as competitive edge, you now build magic prompts for GenAI to take the same input and hopefully with enough GenAI ingesting the same data and coming to consensus, you get a reasonably useful output that previously your custom solution was capable of. Since you no longer have a custom solution, you now pay off the GenAI operators(the real moat of GenAI for all operators hosting it). In the process,you also sacrifice your competitive edge of that super secret IP and relying heavily on GenAI prompting correctly and numerous verification in each step with enough automation, which of course again costs money.
GenAI is the new hammer of visionary leadership and executives (a hefty amount of money has been burned to campaign and PR to convince these people) to use it everywhere, so the operators can ensure that they make some profit from the amount of money they sank on it. If you super impose the "AI" of current year to "Apps" in 201x era, where everything must have an "app", you'll suddenly realize that we've seen the same before and of course most apps need cloud... and as clouds have costs, apps became more subscription model instead of previous 200x era buy once, use forever.
I suspect that model quality/vibes and integrations will play a role as well though.
How does this preclude the AI "age"? And why is the metric "companies make money off of it"?
I view it more like open source/Linux. When Linux was new, it was immensely useful , but not a means for companies to make money (except for a tiny few).
Or more precisely, far more people used Linux for their own personal benefit than companies making money off of it.
And so it is with generative AI. For me, personally, it's very useful.[1] So assuming the major companies (OpenAI, Anthropic, etc) don't go bankrupt and/or kill it off, gen AI is here to stay, and will continue to disrupt. That startups can't make much money off of it is somewhat irrelevant.
[1] It has revolutionized speech recognition for me. I have several automations running that were coded by Claude. Things that had been in my mind for years but I didn't have time to write them. MCP will add more value to my life soon. Etc.
1. It solves a problem. Doesn't have to be a completely unsolved problem, can just be a new solution. Or even just new packaging on an old solution. But it needs to solve some kind of problem.
2. It's trustworthy. Some people get a tool to suite their own process. But the majority, from anecdotal evidence, will adopt the tool's process. There's this idea that "these guys know how to do invoicing so I don't have to think about invoicing if I use their invoicing tool".
3. It's known. A bit philosophical, but if something exists that nobody _knows_ solves a problem they might not even know they have, how much of a useful tool is it, really?
DropBox is an interesting example. It wasn't exactly a major scientific breakthrough, and a lot of people asked "why don't people just use FTP?". If you focus on (1), DropBox looked close to pointless. But what they did is nail (2) and (3).
Now, if you subscribe to the hype, you might argue (1) and (2) will soon be covered. AI will magically solve your problem and be a universal domain expert telling you what to do, so you don't have to think about it. You might also argue that it will magically solve (3), with stuff like Gemini Live kinda watching you all day and constantly going "let me tell you how to do that" or "let me do that for you".
Seems unlikely to me. Not impossible, most things I can think of are theoretically possible. Just unlikely. And if you think even just _one_ of those three aspects can't be fully automated in the near future, there's still plenty of opportunity left to differentiate in that area.
I think generative AI does unlock a new generation of startups, because it's genuinely new technologies that we can find at least some valuable use cases for. And an army of startups tends to be better at quickly exploring a new solution space than a few big incumbents. So in that sense, it is similar to smartphones, which also brought a new solution space, and with it, startups.
It's trustworthy.
In what way is AI trustworthy? It's ruining the parts of the internet I use and care about. I can't visit Digikey's site in an incognito tab without having to sit through a ~5 second captcha these days. Mouser is less aggressive, but it's still problematic. Drew's spent how much time combating AI bots instead of improving Sourcehut?In fact I'd be hard pressed to think of a site that isn't getting boned by AI.
I've seen a depressing amount of people treat LLMs like some sort of oracle. So I can picture a significant number of consumers just trusting ChatGPT with their taxes or whatever, based on the assumption that it know any domain at least as good as human experts.
I'm not saying _I_ find any LLM trustworthy. But if enough people do, it becomes difficult to differentiate there.
In terms of what problems it solves, I would imagine that will be up to the developers/companies to come up with the Uber/Airbnb/Tiktok, that the iPhone enabled, that AI enables. Same as any platform.
"I am getting extremely skeptical of Photoshop. What class of problems is this software going to solve?"
"I am getting extremely skeptical of the internet. What class of problems is this network going to solve?"
We're still in the post Netscape, pre-dotcom-crash bubble.
Real applications are coming.
https://www.theverge.com/news/646458/openai-gpt-4-1-ai-model
Imagine putting dice and random objects (cups, forks..) on a table, pointing your phone at them and asking it to invent a new game for your friends. Tell it to use these objects and also use the live camera as a gameplay element.
Or recognizing bird or plant species.
Or helping a blind person go hiking, helping avoid tree roots and describing the beautiful scenes they’re in.
So much possibility!
> describing the beautiful scenes they’re in
And this part is ableism at its best. Do you eally think what people like me are missing are computer generated descriptions of our surroundings, so that we notice the beauty? Reminds me of the Cochlear implant debate some people from the deaf community have. Is my life less interesting because I dont see your colours? Am I a lesser human that needs treatmenet because I dont see your beauty? Me thinks no.
> Please, don't hype accessibility just for your personal fun. There are people out there with real problems, and dangling impossible solutions before their "eyes" is pretty much cruel
Do you think it's going to be impossible forever? bandwidth and latency seem like the surest things to improve in AI tech
It has been tried a lot. I saw the first sonar-alike navigation aid in the early 90s. It basically translated detected obstacles into vibration. Thats where you start to realize that bandwidth is the issue. Because a single, or maybe even a group of, vibrations, doesn't really tell you anything about the nature of the obstacle. Now, we're at a point where vision models (if they dont hallucinate, DANGER!!!) can actually destinguish different obstacles and even describe them. Nice. However, you're pretty much limited to speech synthesis as an information channel. That is not that bad, but very distracting when you move about outside. After all, blind people already use their ears to learn about their surroundings. There isn't much, if any, bandwidth left to stick some constantly chatting voice in. You end up deminishing your actual senses input about the surroundings, which is also dangerous. Nothing beats the turn-around time of your own ears or tactile info you might get from your cane...
So, to answer your question: Maybe. I haven't seen a technology that can squeeze some more bandwidth out of the current situation yet.
"If only you could see, then you could appreciate the beauty of the scenery! Let me describe what are missing out on. I hope your sub-standard experience, as rated by sighted people, is sufficiently adequate to you. May I list more short-comings of your existence in comparison to able people?"
I did a BeMyEyes test recently, trying to sort about 40 cans according to the existance of a deposit logo. After 90 minutes of submitting photos, and a second round to make sure it doesn't lie too much, I had 16 cans which according to BeMyEyes (OpenAI) had a deposit logo. Then, I went to the shop to bring them back. Turns out, only 4 cans had a logo. So after a second round to eliminate hallucinations, the success rate was only 25%.
Do you call that reliable?
But isn't the BeMyEyes assisting happening via other humans? I remember signing up for some "when blind people need your help" thing via BeMyEyes and I understood it as it's 100% humans on the other end of the call that will help you.
However, somewhere around 1 or 2 years ago, they added a OpenAI vision model based way to send in photos and have them described.
In general, its a very nice feature, if it works. For instance, I do use it successfully to sort laundry.
But the deposit logo test I did gave horrible results...
Deep down though I can feel Apple must have some level of aversion to this. They have been trying to untangle themselves from Google for so long.
Gemini only talked about some useless surface knowledge that would be forgotten quickly, whereas if she actually read the Wikipedia page she would learn more and retain it better.
- explain this symbol to me
- what TV show am I watching
- how much is the house I'm looking at worth (bring in geo-location)
- How often should I be watering this plant
- how many calories are in this meal
- what type of car is this
- whats the exact name/type of this screw part
- what are the exact dimensions of this door frame
- what kind of style of art is this
- how often should I be taking this medication
- how many copies has this book sold
- which part of the world is this pic from
You know ... something more than "reboot that shit"
I will say the video with Gemini live is pretty impressive. My family and I tried it a bit yesterday, and my kids wanted to show Gemini all our pets. My kid showed it our cat, picking it up roughly as she is wont to do, and I was impressed when it asked "Is [name of cat] always so patient being handled like that?"
I'm on a ChatGPT pro plan, been using it for a good while but got an offer on Google One storage so tried it out for a month. Google's models are far behind compared to OpenAI's, and seemingly o1 Pro Mode is still the best out there, albeit slow obviously. But probably the model I've got furthest with on difficult problems, and even the "simpler" models from OpenAI are still better than Gemma 2.5.
It does seem that Google has better tooling available for their models though, so a combination of the tooling of Google with the models of OpenAI would probably be optimal, but unlikely we'll see that happen.
Image search and text recognition have become way better than Google Lens ever was. In-place translation is also nice, and I use to check idioms, and other grammarly-esque things.
I think it's good at language related things and transformations, not great at anything else (I wouldn't ask it to create anything new TBH). I can't even get it reliably spun Google docs from my prompts from the standard web/app interfaces.
Similar to FactolSarin, it's baked into my Google account and subscriptions so I don't really need it to be stellar and wouldn't pay for it as a single product.
For more general things, ChatGPT is still better imo, but that gap is shrinking, at least for what I do. Their analysis plugin is super useful to me.
I will say that it’s hard to evaluate as some of my habits may drive behavior.
This soon will be available to all Gemini Advanced subscribers on Android devices, and today we’re bringing it to more people, starting with all Gemini app users on Pixel 9 and Samsung Galaxy S25 devices
Edge compute is a long ways off, even on desktop without a dedicated GPU. But especially on mobile.
You might have bumped into a media website trying to run a WASM-powered onnx runtime background removal tool, or perhaps a super slim LLM. You'll notice how slow these are and how they can lock up your browser. That's about the experience you can expect from edge compute.
Nvidia's proclamation that they're going to be working on robotics as their next growth sector could mean more innovation on the edge / low power compute front. But most of the yield will come from better model architectures and models designed specifically to work with compute constraints.
For now, datacenter inference reigns supreme.
Yes it's slow because they will most likely execute on CPU (for sure on iOS). WebGPU is still not enabled on safari and WebNN is not even supported anywhere (so that you can use NPU provider). ONNXRuntime is not even the most optimised when running natively (instead of WASM) on iOS, e.g. doesn't support MPS provider (GPU) and NPU provider (via CoreML) implements only subset operators (last time I tried). Safari also provide limitation on WASM memory usage
In practice when you want the best performance you would have to use native app and CoreML with NPU provider and model architecture optimized for NPU. On iOS for now big limitation is available RAM, but even my iPhone 13 mini has exactly the same fast NPU as on my Macbook M2 Max when tested it's having similar speed to running on GPU.
There's really no use for AI outside of making studio ghibli drawings and giving me a bunch of broken code quickly.
Marie Kondo doesn't need to worry about being replaced by AI, yet...
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
- Let Gemini watch you drive so you can get advice on driving better
- Let Gemini help you rewire your outlet for the first time
- Let Gemini help you pick edible mushrooms in the forest
"Gemini, what's the best way to bury this body?"
"Gemini, please count the cockroaches in this kitchen."
"Gemini, calculate my survival chances sprinting across this interstate."
"Gemini, speculate on the percentage volume of pee-pee in this swimming pool."
"Gemini, how many days will this meal set back my life expectancy?"
"Gemini, will these socks lead to my arrest by the fashion police?"
"Gemini, my cat is making that face again -- has it found out it's adopted?"
But they can make it more reliable over time by deploying the product now and training on user data:
> Google collects your chats (including recordings of your Gemini Live interactions), what you share with Gemini Apps (like files, images, and screens), related product usage information, your feedback, and info about your location. (...)
> Google uses this data, consistent with our Privacy Policy, to provide, improve, and develop Google products and services and machine-learning technologies, including Google’s enterprise products such as Google Cloud.