It looks like the status/need-triage label was removed(github.com)

303 pointsby nickswalker16 days ago20 comments

embedding-shape16 days ago
It's easy to miss, but in the middle of the page:
> 4609 remaining items
Seems gemini-cli and gemini-cli didn't understand who themselves were, so they though someone else added/removed the label, which it tried to correct, which the other then tried to correct, which the other...
Considering that that repository has what seems like ~10 longer term contributors, who probably get email notifications, together with a bunch of other people who get notifications about it, wonder how many emails were sent out because of this? If we just assume ten people get the emails, it's already 46K emails going out in under 24 hours...
Also, who pays for the inference of this gemini-cli? Clicking the "user" links to https://github.com/apps/gemini-cli, and it has a random GitHub user under "Developer", doesn't seem like it's a official Google project, so did someone pay for all of these inference calls here? That'd be a pretty sucky bill to pay...
- TACD16 days ago
  This isn't the first time it's happened, either. It's a pretty frequently recurring issue, in fact:
  https://github.com/google-gemini/gemini-cli/issues/16723
  https://github.com/google-gemini/gemini-cli/issues/16725
  https://github.com/google-gemini/gemini-cli/issues/16732
  https://github.com/google-gemini/gemini-cli/issues/16734
  - embedding-shape16 days ago
    All opened the 15th of January though, same as the instance linked in the submission. Seems maybe more accurate to say "widespread issue" rather than "frequent issue", as it seems to only have happened at one occasion, but it had time to spam many issues on that day.
  - 29373672912913 days ago
    [dead]
- hirsin16 days ago
  The owner is a Google employee, but for the sake of safety it should be owned by a real Google org. I've just asked them to migrate it to their OSS org.
  Unfortunately the app creation flow on GitHub makes it impossible (for now) for a normal org user to create an app for the org, so apps end up getting created on personal accounts and become load bearing. We've got a work item to make it possible to give app creation rights to your org members, I've got it high on the priority list for the next six months.
  Re:payment As I understand it each org that uses the gemini cli agent puts their api key in their actions secrets, which gets picked up and used to call Google inference APIs. So the org these comments are in is paying for the inference.
  - htrp16 days ago
    Dear god. This reminds me of all of the things in Google that are "load bearing" and have to be owned by random gmail accounts instead of formal service accounts or org accounts.
    How long has this one been on the roadmap for? (since you actually work for github)
    hirsin15 days ago
    Tbc apps can be owned by orgs today, but the process is annoying - devs create the app and then transfer it to the org, and then are made managers of the app. Really high overhead.
    It's part of the push we've been making over the last year or two to improve custom roles and finer-grained authorization for resources.
- oooyay15 days ago
  The first event-driven agent I ever built ran into this style of bug. The bot had a name, it knew the name, but what it didn't know is that the name could show up as a user ID in various forms and then it didn't know how to recognize itself. Every view the agent has needs to be curated towards the agents understanding of itself and the world around it, you can't just spew API results at it.
- m0llusk16 days ago
  Some are saying there is no more room for junior employees in all of this, but it seems like these LLM spasms generate lots of disruption that would be at appropriate levels of complexity and priority for juniors to be handling.
  - esafak16 days ago
    What if a junior with an LLM did this?
    DANmode16 days ago
    What about an LLM with a junior?
    fragmede15 days ago
    LM?
    DANmodea day ago
    The joke was that the chatbot is driving the junior developer,
    not the other way around,
    a nonzero amount of the time.
- sneak16 days ago
  “Everyone, just STOP PRESSING REPLY-ALL.”
  It’s not just bots that fall into this trap.
  - embedding-shape16 days ago
    The linked issue literally only have one bot falling into that trap...
    NitpickLawyer16 days ago
    The quote is from one of the old "mail storm" stories out there. Where someone miss-configures something, someone e-mails a list, people are out of office, people reply-all, and hilarity ensues. Plenty of them posted on slashdot and the like back in the day.
    fragmede15 days ago
    who ate all the bananas?
  - cyberax16 days ago
    Unsubscribe.
- philipwhiuk16 days ago
  > Considering that that repository has what seems like ~10 longer term contributors, who probably get email notifications, together with a bunch of other people who get notifications about it, wonder how many emails were sent out because of this? If we just assume ten people get the emails, it's already 46K emails going out in under 24 hours...
  Unless GitHub are idiots they batch email updates to mitigate this
  - embedding-shape16 days ago
    Yeah, they probably do batching, but not by "day" intervals exactly, probably minute if not second. Still end up with a whole lot of emails, probably 50K+, within some hours.
  - 16 days ago
    undefined
- eviks16 days ago
  > Thank you for your understanding! × 4609
  - netsharc15 days ago
    Google Gemini 2028!
- Sophira16 days ago
  > did someone pay for all of these inference calls here?
  Considering that these responses are all the exact same two replies in wording, and that this is a task which could be easily automated without AI, I seriously doubt that it's going to be caused by actual inference.
  - magicalist16 days ago
    Yeah, this looks more fun than it is because the bot is named "gemini-cli".
    This is just two github actions conflicting with each other, one that auto-labels with "status/need-triage" and the other that incorrectly sees gemini-cli as lacking the permission to do that.
    The fix looks like it was https://github.com/google-gemini/gemini-cli/pull/16762 because the bot adding labels wasn't passing the org-level ownership check they used at first.
  - embedding-shape16 days ago
    I didn't think that the GitHub responses themselves were LLM generated, but considering the name, I assumed that even incoming responses might be passed through something that ends up doing inference calls, but that very well might not be the case here at all. Doesn't seem like something that'd be even hard to do without inference, so you might be right.
PyWoody16 days ago
Heh. This reminds me of the time when our newly hired "Salesforce Expert" improved our support queue:
```
  Every time Support received a new email, a ticket in Salesforce would be created and assigned to Support
  
  Every time Support was assigned a new ticket, Salesforce would send a notification email
```
The worst part is he wouldn't admit to the mistake and it took us forever to find where he buried the rule.
- bedatadriven16 days ago
  I can remember something like this a few years ago when a customer emailed our helpdesk with their own internal IT support desk in copy. Our helpdesk at the time sent a complete new email acknowledging the request, which the customer's desk ALSO acknowledged in a new thread...
  I think it took us a good hour and a few hundred tickets to get the helpdesks to stop fighting with each other!
  - pixl9716 days ago
    Ah, mailing loops are great.
    I remember working for an ISP in the mid 90s. We never really had problems with 1 to 1 mailing loops bouncing back and forth, but we ended up with a large circular mailing loop involving a mailing list, and bad addresses on it getting bounced to the previous server which sent a reply to the mailing list, which got bounced and sent to everyone in the group which caused someone else's mailbox to fill up that was in a forward, which for some reason sent a bounce to the mailing list that really started to set off the explosive growth.
    Needless to say the bounces seemed to be growing quadratically and overwhelmed our medium sized ISP, a decent sized college, and a large ISPs mailing system in less time than anyone could figure out how to get it to stop.
- pousada16 days ago
  I only used salesforce once (was “forced” to use it haha) and it was mind boggling how anyone would ever want to use it or even become an expert in using it.
  I’d rather track everything in a giant excel tyvm
  - embedding-shape16 days ago
    > it was mind boggling how anyone would ever want to use it or even become an expert in using it.
    As in a lot of cases, the answer is money. If you have expertise in Salesforce, you can get paid a lot, especially if the company you contract/freelance for is in an "emergency" which, because they use Salesforce, they'll eventually be. As long as you get the foot in the door, you'll have a steady stream of easy money. It fucking sucks though, the entire ecosystem, not for the weak of heart.
  - GuinansEyebrows16 days ago
    nobody who actually uses salesforce for daily work chose it. it's sold directly to CIO/CTOs as a one-stop shop for CRM, ticketing, reports and biz dev, who may occasionally use it for reporting (but more often get their staff to provide the reports directly to them). everybody stuck having to use it to actually track work just has to suffer with it.
    wrs16 days ago
    Or in my case, it was sold directly to the CMO, and as the CTO I was stuck with it!
    GuinansEyebrows16 days ago
    you won't get off that easily in the eyes of your subordinates :) but to be fair, i should have said CxOs. CEOs fall for this dogshit too.
    consp16 days ago
    Isn't this the SAP businesscase as well?
  - DANmode16 days ago
    You become an expert in using SalesForce, or SAP, for the same reason you get a medical license in the US.
    There’s a limited number of you who are willing to traverse that gauntlet of abuse, so you know you’ll always have work.
    mrgoldenbrown15 days ago
    For doctors in the US I think the limit is more artificial than that: a cap on how many med school seats are allowed.
    DANmode15 days ago
    The captive audience after some (mostly) arbitrary grinding,
    (not being drawn to serve the niche by any particular talent or interest besides $),
    is the comparison being drawn.
- bArray16 days ago
  Maybe 20 years ago... As a student, the school had an email server that allowed rules to be set. You could set an email to be sent as a result of another email.
  IT were not stupid though, and set a series of rules:
  1. You cannot have a rule trigger to email yourself.
  2. You cannot reply to an email triggered by a rule.
  3. You have ~50MB max of emails (which was a lot at the time).
  Playing around one lunch, my friend had setup a "not in office" automated reply, I setup a rule to reply to any emails within our domain with a "not in office", but put their name in TO, CC and BCC. It turns out that this caused rule #2 not to trigger. After setting up the same rule on my friend's email, and sending a single email, the emails fired approximately one every 30 seconds.
  A few hours later we returned to our email boxes to realise that there were thousands and thousands of emails. At some point we triggered rule #3, which in turn sent an email "out of space", with a small embedded school logo. Each one of these emails triggered our email rule, which in turn triggered an email "could not send message", again with an embedded logo. We desperately tried to delete all of the emails, but it just made way for more emails. We eventually had to abandon our efforts to delete the emails, and went to class.
  About an hour later, the email server failed. Several hours later all domain logins failed. It turned out that logins were also run on the email server.
  The events were then (from what I was told by IT):
  * Students could not save their work to their network directory.
  * New students could not login.
  * Teachers could not login to take registers or use the SMART white boards.
  * IT try to login to the server, failure.
  * IT try to reboot the server, failure.
  * IT take the server apart and attempt to mount the disk - for whatever reason, also failure.
  * IT rebuild the entire server software.
  * IT try to restore data from a previous backup, failure. Apparently the backup did not complete.
  * IT are forced to recover from a working backup from two weeks previous.
  All from one little email rule. I was banned from using all computers for 6 months. When I finally did get access, there was a screen in the IT office that would show my display at all times when logged in. Sometimes IT would wiggle my mouse to remind me that they were there, and sometimes I would open up Notepad and chat to them.
  P.S. Something happened on the IT system a year later, and they saw I was logged in. They ran to my class, burst through the door, screamed by username and dragged me away from the keyboard. My teacher was in quite some shock, and then even more shocked to learn that I had caused the outage about a year earlier.
  - inopinatus16 days ago
    You were not the root cause of that outage.
    > IT were not stupid
    Everything else you described points to them being blundering morons. From an email forwarder that didn’t build loop detection into its header prepending, fucking up a restore, and then malware’ing the student that exposed them into kafkaesque technology remand, all I’m taking away here is third-degree weaponised incompetence
    bArray16 days ago
    Yes and no. This was the IT of a school, most likely low-paid College/University graduates trying to patch together a working system on a shoe-string budget 20 years ago. Maybe they were fully aware of the issues and struggled to get time to deal with them - try convincing an uneducated management that you need to fix something that is currently working.
    I remember IT were continuously fixing computers/laptops broken by students, fixing connectivity issues (maybe somebody has pushed crayons into the Ethernet ports), loading up software that teachers suddenly need tomorrow, etc. Maybe they also have to prevent external actors from accessing important information. All the whilst somebody well above your pay grade is entering into software contracts without knowing anything about software.
    Things are likely far more plug & play now for IT infrastructure, back then (XP I think) it was more the Wild West. Only five years ago I know that a University login system used to send username and password credentials via plaintext, because that's how the old protocols worked. The same University also gave me sudo to install/run programs, which provided sudo over all network drives.
    You would probably be horrified to know how much infrastructure still runs on outdated stuff. Just five years ago the Chinese trains stopped working because Adobe disabled Flash [1]. I know of some important infrastructure that still uses floppy disks. Not so long ago some electrical testing could not be conducted because the machine that performed it got a corrupted floppy disk.
    [1] https://arstechnica.com/tech-policy/2021/01/deactivation-of-...
    inopinatus16 days ago
    Ah well having operated at all levels of institutional hierarchies I include the hapless/indifferent management within functional and operational scope of the term “IT”, and they are accountable in any case, however understanding you choose to be of the struggling folks at the pointy end. So there’s your root cause.
  - direwolf2016 days ago
    Glad I wasn't the only person who did this.
  - 16 days ago
    undefined
- trgn16 days ago
  > and it took us forever to find where he buried the rule.
  Salesforce is such an ugly beast
- pinkmuffinere16 days ago
  lol, that's amazing. Things like this make me both angry (how could they be so dumb!), and empathetic (what is the rest of their life like?)
ryandrake16 days ago
A similar issue made HN last week, same repo, where an AI bot was having the same kind of argument with itself over and over on an issue. Someone mentioned: This sort of thing is why RAM is 800 bucks now.
- omoikane16 days ago
  This is the thread:
  https://news.ycombinator.com/item?id=46636291
bdmorgan16 days ago
Script author here :-) This was due to two different GitHub Action workflows:
(Workflow 1): Remove the need-triage label under certain conditions.
(Workflow 2): If anyone outside a project maintainer removes a label, re-add it with a friendly message explaining why.
Submitted those at like 10 or 11 pm and went to sleep. Woke up to all issues that got changed overnight with dozens, hundreds, or thousands of these messages.
Cause: Workflow 2 should have checked for project maintainers but also other bots and automation that might also be clearing labels. It got fixed immediately once we realized the issue.
- storystarling15 days ago
  I learned the hard way to always implement a circuit breaker for event-driven triggers like this. We use a simple Redis counter with a short TTL to rate limit execution and fail fast if it detects a loop. It is standard practice in backend queues like Celery but easy to overlook in CI configurations.
- doodlesdev15 days ago
  > Submitted those at like 10 or 11 pm and went to sleep.
  That's a classic :)
  Hopefully this hasn't caused any real harm. At least it sure did give me a good laugh when I first saw it.
  - fragmede15 days ago
    In corporate, that's pushed to prod and then got on an international flight on a Friday afternoon.
alwa16 days ago
This issue seems to involve Gemini-cli[bot] squabbling with itself, adding and removing the label from the issue (leaving contradictory explanation comments to itself each time) for a good 4,600 rounds
- add-sub-mul-div16 days ago
  I don't lament the lack of flying cars because they don't seem practical, but I am disappointed that the future turned out to be this stupid.
supernes16 days ago
Finally an example of AI doing something useful. Imagine having to add and remove all those tags 4500+ times by hand!
- heliumtera16 days ago
  Professionals GitHub labels adder-remover became obsolete. AGI practicality achieved
robertclaus16 days ago
Classic CI bug with a flair of LLM fun! We had something similar creep into our custom merge queue a few weeks back.
- embedding-shape16 days ago
  What "classic CI bug" makes bots talk with each other forever? Been doing CI for as long as I've been a professional developer, and not even once I've had that issue.
  I've made "reply bots" before, bunch of times, first time on IRC, and pretty much the second or third step is "Huh, probably this shouldn't be able to reply to itself, then it'll get stuck in a loop". But that's hardly a "classic CI bug", so don't think that is what you're referring to here right?
  - btown16 days ago
    If you’re making a bot in which there will be many sub-behaviors, it can be tempting to say “each sub-behavior should do whatever checks it needs, including basic checks for self-reply.”
    And there lie dragons, because whether a tired or junior or (now) not-even-human engineer is writing new sub-behavior, it’s easy to assume that footguns either don’t exist or are prevented a layer up. There’s nothing more classic than that.
    embedding-shape16 days ago
    I'm kind of understanding, I think, but not fully. Regardless of how you structure this bot, there will be one entrypoint for the webhooks/callbacks, right? Even if there is sub-behaviours, the incoming event is passing through something, or are we talking about "sub-bots" here that are completely independent and use different GitHub users and so on?
    Otherwise I still don't see how you'd end up with your own bot getting stuck in a loop replying to itself, but maybe I'm misunderstanding how others are building these sort of bots.
    btown16 days ago
    Sorry, could have been more clear.
    Someone sets up a bot with: on a trigger, read the message, determine which "skill" to use out of a set of behaviors, then let that skill handle all the behavior about whether or not to post.
    Later, someone (or a vibe coding system) rolls out a new skill, or a change to the skill, that omits/removes a self-reply guard, making the assumption that there are guards at the orchestration level. But the orchestration level was depending on the skill to prevent self-replies. The new code passes linters and unit tests, but the unit tests don't actually mimic a thread re-triggering the whole system on the self-posting. New code gets yolo-pushed into production. Chaos ensues.
    pixl9716 days ago
    All I can think of, and actually have seen is
    1. Bot run a series of steps A through Z.
    2. Step X is calling an external system that runs its own series of steps.
    3. Some potential outcomes of said external system is if it detects some potential outcomes (errors, failed tests, whatever) is it kicks back an automated process that runs back through the bot/system where said system makes the same mistake again without awareness it's caught in a loop.
    matsemann16 days ago
    1. Set up a bot that runs on every new comment on a PR 2. The bot comments something on that PR
    Doesn't have to be more advanced than this to get an infinite loop if you don't build anything where it ignores comments from itself or similar.
    embedding-shape16 days ago
    Previously:
    > pretty much the second or third step is "Huh, probably this shouldn't be able to reply to itself, then it'll get stuck in a loop". But that's hardly a "classic CI bug",
    matsemann15 days ago
    If I've previously misunderstood your point, copy pasting it doesn't clear anything up, no..?
    I don't see why it's not a "classic CI bug". It's an easy trap to fall into, and I've seen it multiple times. Same with "action that runs on every commit to main to generate a file and push a new commit if the file changes", that suddenly gets stuck in a loop because the generated file contains a comment with the timestamp of creation.
  - Hamuko16 days ago
    Yeah, a bot replying to itself is pretty poor design. It's one of the first things you do even with toy bots. You can even hardcode knowing itself, since usually you have an unchanging ID. A much more common problem is if someone deploys another bot, which will lead your bot into having an endless back-and-forth with it.
    embedding-shape16 days ago
    > A much more common problem is if someone deploys another bot, which will lead your bot into having an endless back-and-forth with it.
    This I'd understand, bit trickier since you're basically end up with a problem typical of distributed systems.
    But one bot? One identity? One GitHub user? Seems really strange to miss something like that, as you say, it's one of the earlier things you tend to try when creating bots for chats and alike.
    matsemann15 days ago
    Being one of the earlier things to catch is what makes it a classic.
meisel16 days ago
To be clear, is AI actually at play here, aside from the fact that the repo is for Gemini? It just looks like two simple rules that interact poorly, that we could've seen in 2015.
- tuetuopay16 days ago
  Well, it's even more ironic as AI in general is touted as smart. I'd fully expect such bots to notice they're in a loop and one to throw the towel. Still a long way to AGI. And to AI for that matter.
Elfener16 days ago
Maybe I'm missing something, but this seems to be an issue report claiming to be a PR? Where's the patch?
Edit: there's actually a PR, but this is one of those repos where for some reason, they require every PR to have an associated issue. And in this case, they aren't even linked...
abathologist16 days ago
This will soon be happening with our parents' social security checks, our friend's cancer treatment plan, our international flights logistics, our ISPs routing configurations, ...
Fun times are coming.
keriati116 days ago
Today github labels, tomorrow paperclips?
amiga38616 days ago
Project admins setting up automation: https://youtu.be/B4M-54cEduo?t=102
The automation: https://youtu.be/GFiWEjCedzY?t=51
armchairhacker16 days ago
Ironically this is type of issue is common in pre-LLM (rules-based) AI. Given that the back-and-forth messages are the same, I suspect they're generated by a small script, not an LLM. But I wouldn't be surprised if the script was created mostly or entirely by an LLM.
jayd1616 days ago
I think the real irony is an LLM trying to enforce permissions at all. Why is it doing that? If the tag exists, the user had the permission to create it, no?
- throwaway17373816 days ago
  I’m guessing there’s no permissions around labeling an issue in Github.
heliumtera16 days ago
So much productivity accomplished here! Those are numbers management loves to see.
gemini-cli did much more work in this PR then the author himself.
a-dub16 days ago
in the old days one would add and check for a loop detection token when loops like this could be driven by external systems... i wonder if today it would be as simple as adding "ensure you don't get stuck in any loops" to a prompt.
fwiw. doesn't look like gemini at all, the responses are perfectly canned... maybe just good old fashioned ci rules.
- vjekm16 days ago
  I also start all of my prompts with "solve the halting problem."
  - fragmede15 days ago
    Clang manages to have way more useful error messages despite not solving three halting problem. You don't need to solve the halting problem to have caught this problem. Even if you don't solve it for the general case of the halting problem, solving it here for a levels deep and then collapsing the levels would have stopped this problem in its tracks. Sure, someone could just come in and cause the bug at N+1 levels deep because you've only solved it at N, but you can write different tests to mitigate that problem in practice, despite not having infinity RAM *2+1 to solve the general case of the halting problem.
    Hilariously, the halting problem has been written in enough of the LLM training data that it can identify some cases where the code won't terminate.
- Night_Thastus16 days ago
  It's a language model. It doesn't know what a loop is, or have any awareness of that the content it's replying to may be made by itself - as it has no sense of 'self'.
Phui3ferubus16 days ago
> 4610 remaining items
Normally I would complain about people spamming in GitHub issues but I don't think it will matter this time
minimaxir16 days ago
It's not wrong.
mise_en_place16 days ago
Now that's what I call job security.
venturecruelty16 days ago
[dead]