1) I find it interesting that the LLM rarely seems trained to understand it's own features, or about your account, or how the LLM works. Seems strange that it has no idea about it's own support.
2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?
It does say it's a beta on the label, but the thing inside doesn't seem to know that, nor what it's supposed to know. Your point 1, for sure.
Point 2 is a SaaS from before the LLMs+RAG beat normal things. Status page, a SaaS. API membership, metrics, and billing, a SaaS. These are all undifferentiated, but arguably they selected quite well for when the selections were made, and unless the help is going to sell more users, they shouldn't spend time on undifferentiated heavy lifting, arguably.
How do you know it hallucinated? Maybe your task was one too many and it is only able to handle zero tasks (which would appear to be true in your case).
Just not a priority most likely. Check out the search by Mintlify docs to see a very well built implementation.
Example docs site that uses it: https://docs.browserbase.com
"working like OpenAI said it should" is a weird thing to put low priority. Why do they continuously put out features that break and bug? I'm tired of stochastic outputs and being told that we should accept sub-90% success rates.
At their scale, being less than 99.99% right results in thousands of problems. So their scale and the outsized impact of their statistical bugs is part of the issue.
Also, I was specifically talking about it being able to understand the features it has in my earlier comment, I don't think that is the same problem as the remind me feature not working consistently.
Oh, that's because modern-day product development of "ship fast, break things" is its own problem. The whole tech industry is built on principles that are antithetical to the profession of engineering. It's not controversial in product development, because the people doing the development all decided to loosen their morals and think its Fine to release broken things and fix later.
That my bar is high and OpenAI is so low is its own issue. But then again, I haven't released a product where it could randomly tell people to poison themselves by combining noxious chemicals or whatever other dangerous hallucination ChatGPT spews. If I had engineered something like that, with the opportunity to harm people and being unable to guarantee it wouldn't, if I had engineered that misinformation was a possibility to be created at scale, if I had engineered this, I would have trouble sleeping...
"Yeah sure I got you a table at a nice restaurant. Don’t worry."
I agree, but then again, if you're a dev in this space, presumably you know what keywords to use to refine your search. RAG'ed search implies that the user (dev) are not "in the know".
But I dont understand why their own documentation and products and lots of examples using them wouldn't be the number one thing they would want to train the models on (or fine tune, or at least make available through a tool)
Yeah that's not gonna end well. I thought they, of all people, would know the limitations and problems.
Can’t imagine why everyone doesn’t pay $200/mo for even more features. Eventually I bet they can clean out /tmp!
LLM heads will say “it’s not completely unreliable, it works very often”. That is completely unreliable. You cannot rely on it to work.
Please product people, stop putting LLMs at the core of products that need reliability.
The chance that the flipped bit changes a bit that results in a new valid state and one that does something actually damaging is astronomically small.
Meanwhile LLM errors are common and directly effect the result.
If you don't understand the tolerance of your scenario, then all this talk about LLM unreliability is wasted. You need to spend time understanding your requirements first.
In practice I think it happens often enough, and I remember a blackhat conference talk from around a decade ago where the hacker squatted typoed variants of the domain of a popular facebook game, and caught requests from real end users. Basing his attack on the random chance of bitflips during dns lookups.
Related, but not the video I was referring to
Then it would translate that into cron commands in the background.
It seems like it would be good for summarizing daily updates against a search query. but all it would do is display them. I would probably want to connect it with some tools at minimum for it to be useful.
> ChatGPT has a limit on 10 active tasks at any time. If you reach this limit, ChatGPT will not be able to create a new task unless you pause or delete an existing active task or it completes per its scheduled time.
So this is pretty much useless for most real-world uses cases.
For context: ~50% of our users use a time-triggered Loop, often with an LLM component.
Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.
We're moving away from cron-esque automations as one of our core-value props (most new users use us for spinning up APIs really quickly), but the base functionality of LLM+code+cron will still be available (and migrated!) to the next version of our product.
> Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.
None of these require an LLM. It seems like you own this service yet can't find any valuable use for it.
---
ChatGPT tasks will become a powerful tool once incorporated into GPTs.
I produce lots of data. Lots of it, and I'd like to have my clients have daily updates on it, or even have content created based on it.
Sorry? My point was that these are the only overlapping features I've personally found useful that could be replaced with the new scheduled tasks from ChatGPT.
Even these shouldn't require an LLM. A simple cron+email would suffice.
The web scraping component is neat, but for my personal use-cases (tide tracking) I've had to use LLM-generated code to get the proper results. Pure LLMs were lacking in following the rules I wanted (tide less than 1 ft, between sunrise and sunset). Sometimes the LLM would get it right, sometimes it would not.
For our customers, purely scheduling an LLM call isn't that useful. They require pairing multiple LLM and code execution steps to get repeatable and reliable results.
> ChatGPT tasks will become a powerful tool once incorporated into GPTs.
Out of curiosity, do you use GPTs?
Baby name generator: why would this be a scheduled task? Surely you aren't having that many children... :)
Reminder to pay, notifications: what value does OpenAI bring to the table here over other apps which provide calendar / reminder functionality?
So far it's help name two children :) -- my wife and I like to see the same 10 ideas each day (via text), so that we can discuss what we like/don't like daily. We tried the sift through 1000 names thing and it didn't fit well with us.
> Reminder to pay, notifications: what value does OpenAI bring to the table here over other apps which provide calendar / reminder functionality?
That's exactly my point. Without further utility (i.e. custom code execution), I don't think this provides a ton of value at present.
Am I missing something or is there exactly zero benefit here over native Apple/Google calendar/todo apps?
My point was that this new functionality, while neat at a surface level, doesn't provide much real utility.
Without custom code execution, you're limited to very surface-level tasks that should be doable with a cron+sms/email.
i.e. look up some niche news on a topic and format it in a particular way
But it won't let me reschedule my task execution time or change its prompting... It will just go forever now I guess
Controlling the ability to nudge the wakeup times by small amounts of time can make a huge difference to your ability to manage spiky workloads like this.
They added Projects in December:
https://help.openai.com/en/articles/10169521-using-projects-...
I say this as someone who prefers using ChatGPT over Claude, but pays for both. Hoping they figure it out.
edit: restructured text to make sense.
Open AI creating an AI phone with Microsoft ... release H.E.R. (the movie) in your pocket.
Your AI assistant / Agent is seen on the Lock Screen (like a FaceTime call UI/UX) waiting at your beckon to do everything for you /be there for via via text, voice, gestures, expressions, etc.
It interfaces with other AI Agents of businesses, companies, your doctor, friends & family to schedule things & used as a knowledge-base (ask friends birthday if they allow that info).
Apple is indeed stale & boring to me (heavy GPT user) in 2025.