129 pointsby isaacdl2 days ago18 comments
  • UmYeahNo2 days ago
    I tried this yesterday, asking it to create a simple daily reminder task, which it happily did. Then when the time came and went I simply got a chat that the task failed, with no explanation of why or how it failed. When I asked it why, it hallucinated that I had too many tasks. (I only had the one) So, now I don't know why it failed or how to fix it. Which leads to two related observations:

    1) I find it interesting that the LLM rarely seems trained to understand it's own features, or about your account, or how the LLM works. Seems strange that it has no idea about it's own support.

    2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

    [0] https://help.openai.com/

    • Terretta2 days ago
      Same experience except mine insisted I had no tasks.

      It does say it's a beta on the label, but the thing inside doesn't seem to know that, nor what it's supposed to know. Your point 1, for sure.

      Point 2 is a SaaS from before the LLMs+RAG beat normal things. Status page, a SaaS. API membership, metrics, and billing, a SaaS. These are all undifferentiated, but arguably they selected quite well for when the selections were made, and unless the help is going to sell more users, they shouldn't spend time on undifferentiated heavy lifting, arguably.

    • varispeed2 days ago
      > it hallucinated that I had too many tasks.

      How do you know it hallucinated? Maybe your task was one too many and it is only able to handle zero tasks (which would appear to be true in your case).

    • reustle2 days ago
      > It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

      Just not a priority most likely. Check out the search by Mintlify docs to see a very well built implementation.

      Example docs site that uses it: https://docs.browserbase.com

    • derefr2 days ago
      Re: 2 — for the same reason that you shouldn't host your site's status page on the same infrastructure that hosts your site (if people want to see your status page, that probably means your infra is broken), I would guess that OpenAI think that if you're looking at the support docs, it might be because the AI service is currently broken.
    • fooker2 days ago
      You can hardly blame a product for not doing something that we don't know for certain to be possible.
    • neom2 days ago
      I've thought about this a lot too and my guess is that because foundational modals take a lot to train, I don't think they are trained fairly often, and from my experiences you can't train in new data easily, so I think you'd have to have some little up to date side system, and I suspect they're very thoughtful about these "side systems" they place, from trying to build some agent orchestration stuff myself nothing ends up being as simple as as I expect with "side systems" and stuff easily goes off the rails. So my thought was probably, given the scale they're dealing with, this is probably a low priority not actually particularly easy feature.
      • miltonlost2 days ago
        > So my thought was probably, given the scale they're dealing with, this is probably a low priority not actually particularly easy feature.

        "working like OpenAI said it should" is a weird thing to put low priority. Why do they continuously put out features that break and bug? I'm tired of stochastic outputs and being told that we should accept sub-90% success rates.

        At their scale, being less than 99.99% right results in thousands of problems. So their scale and the outsized impact of their statistical bugs is part of the issue.

        • neom2 days ago
          Why are you setting your bar this way? Is it because of how they do their feature releases (no warning of it being an alpha or beta feature)? Their product, ChatGPT was released 2 years ago, and is a fairly complicated product. My understanding was the whole thing is still a pretty early product generally. It doesn't seem unusual that any startup doing something as big as they are to release features that don't have all the kinks ironed out. I've released some kinda janky features to 100,000s of users before not totally knowing how it's going to preform with all of them at that scale, I don't think that is very controversial in product development.

          Also, I was specifically talking about it being able to understand the features it has in my earlier comment, I don't think that is the same problem as the remind me feature not working consistently.

          • miltonlost2 days ago
            > I've released some kinda janky features to 100,000s of users before not totally knowing how it's going to preform with all of them at that scale, I don't think that is very controversial in product development.

            Oh, that's because modern-day product development of "ship fast, break things" is its own problem. The whole tech industry is built on principles that are antithetical to the profession of engineering. It's not controversial in product development, because the people doing the development all decided to loosen their morals and think its Fine to release broken things and fix later.

            That my bar is high and OpenAI is so low is its own issue. But then again, I haven't released a product where it could randomly tell people to poison themselves by combining noxious chemicals or whatever other dangerous hallucination ChatGPT spews. If I had engineered something like that, with the opportunity to harm people and being unable to guarantee it wouldn't, if I had engineered that misinformation was a possibility to be created at scale, if I had engineered this, I would have trouble sleeping...

            • neom2 days ago
              So what's your plan? Opt out of ever using the products? You're a hypocrite if you continue to use them with a stance like that.
      • yosito2 days ago
        I regularly use Perplexity and Cursor which can search the internet and documentation to answer questions that aren't in their training data. It doesn't seem that hard for ChatGPT to search and summarize their own docs when people ask about it.
        • neom2 days ago
          You would want a feature like "self aware" to be pretty canonical, not based on a web search, and even if they had a discreet internal side system it could query that you controlled, if the training data was a year old, how would you keep it matched from a systems point of view over time? Also it's unclear how the model would interoperate the data each time it ran on the new context. It seems like a pretty complicated system to build tbh, esp when maintaining human created help and docs and FAQs etc is A LOT simpler and more reliable source of truth. That said, my understanding is behind the scenes they are working towards the product we experience just built around the foundational model, not THE foundational model is it pretty much is today. Once they have a bunch of smaller llms that do discreet standard tasks set up, I would guess they will become considerably more "aware".
    • baxtr2 days ago
      Now imagine giving this "agent" a task like booking a table at a restaurant or similar.

      "Yeah sure I got you a table at a nice restaurant. Don’t worry."

    • behnamoh2 days ago
      > 2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

      I agree, but then again, if you're a dev in this space, presumably you know what keywords to use to refine your search. RAG'ed search implies that the user (dev) are not "in the know".

    • m3kw92 days ago
      Buggy af right now, 95% tasks failed and I get a ton of emails about it
      • ProofHouse2 days ago
        Very, very, very buggy and really looks extremely low effort as with many OpenAI feature rollouts. Nothing wrong with an MVP feature, but make it at least do what it’s supposed to do and maybe give it 10% more extensibility than the bare bones.
    • netcraft2 days ago
      I question the same things frequently. I routinely try to ask chatgpt to help me understand the openai api documentation and how to use it and it rarely is helpful, and frequently tells me things that are just blatantly untrue. At least nowadays I can link it directly to the documentation for it to read.

      But I dont understand why their own documentation and products and lots of examples using them wouldn't be the number one thing they would want to train the models on (or fine tune, or at least make available through a tool)

      • _factor2 days ago
        You mean converting $20 monthly subscribers into less profitable API users?
    • Mo32 days ago
      Wait so... they made the LLM itself control the scheduling?

      Yeah that's not gonna end well. I thought they, of all people, would know the limitations and problems.

    • ElijahLynn2 days ago
      Yeah, I saw the 4o with Tasks today, tried it and asked "what is 4o with Tasks", it had no idea. I had to set it to web search mode to figure it out.
      • fooker2 days ago
        If you ask me to describe how a human brain works, I'll have no idea and woukd have to search the web to get an (incomplete) idea.
  • dgfitz2 days ago
    New killer feature: cron

    Can’t imagine why everyone doesn’t pay $200/mo for even more features. Eventually I bet they can clean out /tmp!

    • chairhairair2 days ago
      cron, but completely unreliable. How nice.

      LLM heads will say “it’s not completely unreliable, it works very often”. That is completely unreliable. You cannot rely on it to work.

      Please product people, stop putting LLMs at the core of products that need reliability.

      • kenjackson2 days ago
        It's all a matter of degree. Even in deterministic systems, bit flipping happens. Rarely, but it does. You don't throw out computers as a whole because of this phenomena, do you? You just assess the risk and determine if the scenario you care about sits above or below the threshold.
        • dkjaudyeqooe2 days ago
          A bit flip is a rare occurrence in an array typically tens of billions large.

          The chance that the flipped bit changes a bit that results in a new valid state and one that does something actually damaging is astronomically small.

          Meanwhile LLM errors are common and directly effect the result.

          • kenjackson11 hours ago
            My point is that your confidence level depends on your task. There are many tasks for which I'll require ECC. There are other tasks where an LLM is sufficient. Just like there are some tasks where dropped packets aren't a big deal and others where it is absolutely unacceptable.

            If you don't understand the tolerance of your scenario, then all this talk about LLM unreliability is wasted. You need to spend time understanding your requirements first.

        • great_psy2 days ago
          When’s the last time you personally had a bit flip on you?
          • mhitza2 days ago
            You generally cannot know because we don't measure for it? Especially not on personal computers, maybe ECC ram reports this information in some way?

            In practice I think it happens often enough, and I remember a blackhat conference talk from around a decade ago where the hacker squatted typoed variants of the domain of a popular facebook game, and caught requests from real end users. Basing his attack on the random chance of bitflips during dns lookups.

            Related, but not the video I was referring to

            https://news.ycombinator.com/item?id=5446854

    • rsynnott2 days ago
      Not just that, cron, only non-deterministic! The future is now.
    • theshrike792 days ago
      An actual killer feature would be a system that lets me define repeating tasks with natural language.

      Then it would translate that into cron commands in the background.

    • postsantum2 days ago
      I feel like the obligatory comment about Dropbox is coming your way
  • headcanon2 days ago
    I'm trying to figure out how this would be useful with the existing feature set.

    It seems like it would be good for summarizing daily updates against a search query. but all it would do is display them. I would probably want to connect it with some tools at minimum for it to be useful.

  • DeepYogurt2 days ago
    They're really trying to juice the usage numbers
    • 42lux2 days ago
      "How chatgpt reminders saved my life and made me more productive." Videos on YouTube in 3,2,1.
    • JTyQZSnP3cQGa8B2 days ago
      As long as it’s generating hype and funding, it brings us closer to their own definition of AGI. It’s the perfect plan.
      • 2 days ago
        undefined
      • 2 days ago
        undefined
  • srid2 days ago
    Important caveat:

    > ChatGPT has a limit on 10 active tasks at any time. If you reach this limit, ChatGPT will not be able to create a new task unless you pause or delete an existing active task or it completes per its scheduled time.

    So this is pretty much useless for most real-world uses cases.

  • jumploops2 days ago
    I'm surprised it took OpenAI this long to launch scheduled tasks, but as we've seen from our users[0], pure LLM-based responses are quite limited in utility.

    For context: ~50% of our users use a time-triggered Loop, often with an LLM component.

    Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

    We're moving away from cron-esque automations as one of our core-value props (most new users use us for spinning up APIs really quickly), but the base functionality of LLM+code+cron will still be available (and migrated!) to the next version of our product.

    [0]https://magicloops.dev/

    • MattDaEskimo2 days ago
      This was a weak citation.

      > Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

      None of these require an LLM. It seems like you own this service yet can't find any valuable use for it.

      ---

      ChatGPT tasks will become a powerful tool once incorporated into GPTs.

      I produce lots of data. Lots of it, and I'd like to have my clients have daily updates on it, or even have content created based on it.

      • jumploopsa day ago
        > None of these require an LLM. It seems like you own this service yet can't find any valuable use for it.

        Sorry? My point was that these are the only overlapping features I've personally found useful that could be replaced with the new scheduled tasks from ChatGPT.

        Even these shouldn't require an LLM. A simple cron+email would suffice.

        The web scraping component is neat, but for my personal use-cases (tide tracking) I've had to use LLM-generated code to get the proper results. Pure LLMs were lacking in following the rules I wanted (tide less than 1 ft, between sunrise and sunset). Sometimes the LLM would get it right, sometimes it would not.

        For our customers, purely scheduling an LLM call isn't that useful. They require pairing multiple LLM and code execution steps to get repeatable and reliable results.

        > ChatGPT tasks will become a powerful tool once incorporated into GPTs.

        Out of curiosity, do you use GPTs?

    • duskwuff2 days ago
      > Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

      Baby name generator: why would this be a scheduled task? Surely you aren't having that many children... :)

      Reminder to pay, notifications: what value does OpenAI bring to the table here over other apps which provide calendar / reminder functionality?

      • jumploopsa day ago
        > Baby name generator: why would this be a scheduled task? Surely you aren't having that many children... :)

        So far it's help name two children :) -- my wife and I like to see the same 10 ideas each day (via text), so that we can discuss what we like/don't like daily. We tried the sift through 1000 names thing and it didn't fit well with us.

        > Reminder to pay, notifications: what value does OpenAI bring to the table here over other apps which provide calendar / reminder functionality?

        That's exactly my point. Without further utility (i.e. custom code execution), I don't think this provides a ton of value at present.

    • dimitri-vs2 days ago
      "ok Google, remind me to ____ every ____"

      Am I missing something or is there exactly zero benefit here over native Apple/Google calendar/todo apps?

      • jumploopsa day ago
        You're not missing anything, other than us using Siri :)

        My point was that this new functionality, while neat at a surface level, doesn't provide much real utility.

        Without custom code execution, you're limited to very surface-level tasks that should be doable with a cron+sms/email.

  • joshstrange2 days ago
    This feature is really bad (unreliable) and they don’t even make a good case for _why_ you would want to use this over literally any other reminder system. I guess it can execute an LLM to decide what to send to you at the scheduled time but its unreliability would never have me relying on it. Some use cases that might be interesting * Let me know the closing stock price for XXXXX * Compile a list of highlights from the XXXX game after it finishes But everything I can think of is just a toy, cool if it works but not ground breaking and possible with much more reliable methods. OpenAI really seems just be throwing stuff at the wall to see if it sticks then moving on and never iterating on the previous features. Dall-e is kind of a joke compared to other things (one-shot only), I trust Claude more for programming, o1 was ho-hum for my needs, desktop app still feels like a solution in search of a problem, etc.
    • reustle2 days ago
      Has been consistently working for me, and it does web searching within the tasks.

      i.e. look up some niche news on a topic and format it in a particular way

  • darkteflon2 days ago
    Surely we want to be scheduling and calling LLMs from temporalio, dagster - even cron - instead of whatever this is. Why put the LLM at the middle?
  • android5212 days ago
    I tried it and it failed to send me desktop notification. I did receive emails (at the wrong time). I do think it is too early to launch. 5 min test could have found out these bugs.It really hurts their brand.
  • ilaksh2 days ago
    This will be a lot more useful when it's able to combine with more tools, such as in custom GPT actions, APIs, "computer use", the Python interpreter, etc.
  • ProofHouse2 days ago
    Yeah, it’s pretty bad, embarrassingly so quite honestly. Literally a single developer in a day could probably significantly improve it. I’m sure that’s coming, but why don’t they just launch these MVP features at least a quarter baked. It’s essentially unusable as is. If it could ping me on my phone And advanced voice could open or I could go do a basic task, great I’m back to using it. But essentially as it is rolled out, it’s hilariously minimal and borderline unusable.
  • elif2 days ago
    Works on my machine. (tm)

    But it won't let me reschedule my task execution time or change its prompting... It will just go forever now I guess

  • kifler2 days ago
    Oddly enough, I do not have access to scheduled tasks either on the app or web interfaces and I am a paying customer.
    • delgaudm2 days ago
      It took me a minute to find it. It's a different model -- pull down the models list and you might see one with tasks.
  • amelius2 days ago
    Can I ask it to check for deals on products and make it search the web several times a day?
  • 2 days ago
    undefined
  • geor9e2 days ago
    sounds like theyre trying to get ahead of cron job wrappers so they dont get slammed at peak times
    • daveguy2 days ago
      If it works correctly, wouldn't those still be peak times? Except with this they have to process the initial scheduling request in addition to the at-execution task.
      • mh-2 days ago
        Everyone else's crons, synced to wall clocks, vs your centralized cron (task scheduler, really) that is aware of scheduled work and current load on your systems that consume the scheduled tasks.

        Controlling the ability to nudge the wakeup times by small amounts of time can make a huge difference to your ability to manage spiky workloads like this.

      • geor9e2 days ago
        A lot of answers don't go stale for hours or days. They'll do the task early, at an off-peak time, hidden from the user, double-check that it really wasn't time sensitive, then surface the saved answer at the time desired.
        • throwaway3141552 days ago
          How are they going to double check without incurring the cost of running it again?
          • geor9e2 days ago
            Start with a regex (or fast tiny model) to flag obvious time-sensitive tasks. Else, do the task early by prompting it "if this requires up to the minute information, output cancel, else [prompt]". At best, it's 1 regex + 1 full inference. At worst, it's 1 regex + 1 output token + 1 full inference.
  • UltraSane2 days ago
    This could be done with an API key and AWS Lambda in minutes.
  • retskrad2 days ago
    OpenAI resembles the old Apple: ship the best experience. The ChatGPT app on every platform is the best in business and they are shipping polished features relatively quickly. It's quite the contrast to Apple of today, the world's largest company who is so inept that they are releasing Apple Intelligence, which is quite literally using ChatGPT 3.5 tech in 2025. It just shows how valuable CEO's like Altman, Musk and Jobs are to a corporation.
    • extr2 days ago
      The ChatGPT UI/UX is pretty middling. They still don't have a proper answer to Claude Projects, plus they are focusing on shipping stuff like this instead of fixing the numerous papercuts with the chat experience in their UI. How is it that I can access the most powerful AI on the planet with o1 pro, but if I paste more than few pages of text there's no solution for that, it just overflows the input box and makes it impossible to navigate?
      • Jimmc4142 days ago
        > They still don't have a proper answer to Claude Projects

        They added Projects in December:

        https://help.openai.com/en/articles/10169521-using-projects-...

        • mh-2 days ago
          ChatGPT's Projects feature has weird limitations I've run into. Features that work outside projects, do not necessarily work inside them.

          I say this as someone who prefers using ChatGPT over Claude, but pays for both. Hoping they figure it out.

          edit: restructured text to make sense.

        • dimitri-vs2 days ago
          Only with 4o model which is lacking. Not with any of the o1 models. You also can't upload any documents, not even.txt files which is absurd.
        • throwaway3141552 days ago
          OpenAI projects don't work very well compared to Anthropic (which has its own limitations as is).
    • arghwhat2 days ago
      The "old" Apple certainly didn't ship anything quick or on the bleeding edge, nor did they ship the "best" experience. They did, however, have somewhat different priorities than their competitors. They still do to some extent.
    • TylerE2 days ago
      Apple Intelligence is running on device instead of racks and racks of cloud hardware. Of course it’s less sophisticated.
      • amelius2 days ago
        Yeah, but knowing that doesn't make it much better; it's the wrong design choice.
        • mh-2 days ago
          Agreed. The vast majority of their audience doesn't understand the difference. And among the subset that do, I imagine there's a fair number of us that don't care about the distinction. I just want it to work well.
    • apwell232 days ago
      this has to be sarcasm
      • dmonitor2 days ago
        their commenting behavior is strange. i'm not certain.
    • paul79862 days ago
      Indeed which makes me excited for..

      Open AI creating an AI phone with Microsoft ... release H.E.R. (the movie) in your pocket.

      Your AI assistant / Agent is seen on the Lock Screen (like a FaceTime call UI/UX) waiting at your beckon to do everything for you /be there for via via text, voice, gestures, expressions, etc.

      It interfaces with other AI Agents of businesses, companies, your doctor, friends & family to schedule things & used as a knowledge-base (ask friends birthday if they allow that info).

      Apple is indeed stale & boring to me (heavy GPT user) in 2025.