Google announces Sec-Gemini v1 a new experimental cybersecurity model(security.googleblog.com)

161 pointsby ebursztein7 months ago12 comments

qwertox7 months ago
There is generally something about the Gemini models which feels a bit different than Claude, ChatGPT or Mistral.
I always have the feeling that I'm chatting with a model oriented towards engineering tasks. The seriousness, lack of interest of being humorous or cool.
I don't know if this is because I interact with Gemini only through AI Studio, and it may have different system instructions (apart from those one can add oneself, which I never do) than the one at gemini.google.com.
I never use gemini.google.com because of the lack of a simple export feature. And it's not even possible to save one chat to disk (well, neither do the others), I just wish it did.
AI Studio saving to Google Drive is really useful. I lets you download the chat, strip it of verbose things like the thinking process, and reuse it in a new chat.
I wish gemini.google.com had a "Save as Markdown" per answer and for the complete chat (with a toggle to include/exclude the thinking process). Then it would be a no brainer for me.
It's the same as if Google Docs would not have an "Download.." menu entry but you could only "save" the documents via Takeout.
- Y_Y7 months ago
  > The seriousness, lack of interest of being humorous or cool.
  I love this. When ChatGPT compliments me on my great question or tries to banter it causes me great despair.
  - neodypsis7 months ago
    I've noticed 4o uses a lot of emojis, and, in general, is very enthusiastic. I find it funny. If I want a more formal bot, I switch to one of the o3 family.
    querez7 months ago
    I use a very simple custom system prompt (not on my work machine at the moment, but essentially something along the lines of "for technical questions, please be concise and to the point, and when asked for code, omit explanations and emit just the code itself unless I ask for explanations"), and it does wonders.
    indeyets7 months ago
    It’s interesting that my default prompt is exactly the opposite one: “do not write the code unless I ask for it specifically”. I like to use LLMs as a discussion partner, but writing code is trivial after a good discussion and I can do that myself
    querez7 months ago
    I guess it depends on use-cases. I use ChatGPT a lot for "trivial" questions a la "how do I uncommit a specific file in my last git commit" or "how do I paste from one PIL.Image into another one". In the past I would have to search google, click on the StackOverflow link, and then parse that whole page. Asking ChatGPT to give me just the snippet is faster, so doesn't get me out of my flow as much.
    jonplackett7 months ago
    Every now and then 4o seems to get a bit drunk and use tonnes of emojis or start swearing when I haven’t sworn myself in the chat.
    The other day I asked a fairly innocuous question and it LOLed and said it’d give me the ‘no Bullshit answer’
    kfajdsl7 months ago
    I've had 4o start off its response with a Smiling Face with Sunglasses emoji by the heading unprompted lol.
    edit: does hacker news filter out emojis? TIL (there should be emojis after this colon: )
    tomrod7 months ago
    Meta: no emojis on HN. Pure emoticon. :)
    7 months ago
    undefined
  - prawn7 months ago
    If I am brainstorming ideas and ChatGPT gives the inevitably fawning response, it always reminds me of those friends who have never heard a side project idea before and get excited about anything.
  - qwertox7 months ago
    So do I. But it's not like ChatGPT isn't flexible, the code it generates for small tasks is really good, and the site is faster than AI Studio.
    For example, if I want to quickly create a Python script to list all VMs via libvirt and output their attached drives and filesystems, that's a task for ChatGPT.
    But for the things where I don't want an AI to "suck up" to me and instead "stay professional", that's Gemini.
  - DanielVZ7 months ago
    Sometimes it answers something along the lines of: BOOM! thats where the bug was, and here’s how to fix it…
    While being entirely wrong and I cringe a little
  - cainxinth7 months ago
    It was flattering for a nanosecond, and then you realize 4o will call almost anything “insightful” or “profound.”
- tyushk7 months ago
  You put into words something I've been struggling to describe for a long time. Gemini gives short, succinct responses with whatever information you need and minimal anything else. ChatGPT, Claude both fill text with mannerisms, formatting, etc.
  I didn't realize just how big the difference was until I tested it.
  "How do I clear a directory of all executable files on Debian?"
  Gemini 2.0 Flash: (responses manually formatted)
  find /path/to/directory -type f -executable -delete Replace /path/to/directory with the actual path.
  ChatGPT: (full link [1])
  To clear (delete) all executable files from a directory on Debian (or any Linux system), you can use the find command. Here's a safe and effective way to do it: # [checkmark emoji] Command to delete all executable files in a directory (not recursively): [..] # [magnifying glass emoji] Want to preview before deleting? [..] # [caution sign emoji] Caution: [..]
  [1] https://chatgpt.com/share/67f055c8-4cc0-8003-85a6-bc1c7eadcc...
  - dartos7 months ago
    Probably because both anthropic and openai were on the whole AGI train where they were trying to heavily personify their products.
    Google never seemed to personify theirs, IIRC. They always presented their AI tools in a utilitarian way.
- HelenePhisher7 months ago
  > And it's not even possible to save one chat to disk (well, neither do the others), I just wish it did.
  Ask Claude to generate a .md of the conversation, it will do that with the option to download that or a PDF of it. A lovely, but well hidden feature!
  - qwertox7 months ago
    Thanks for the tip. I tested it and this also works with Gemini and ChatGPT.
    The only drawback I see is that it requires enough free space in the context window to duplicate the visual part of the chat.
- occamschainsaw7 months ago
  I have been using the Obsidian web clipper to export chats from ChatGPT and Claude web versions to nicely-formatted md files. You can save md to Obsidian or download it as a standalone file. It doesn’t support Gemini yet though.
  https://github.com/obsidianmd/obsidian-clipper
- asadm7 months ago
  2.5 has been amazing for programming. I just send it entire repo as context when I am lazy and then ask it for entire modified files back with the (medium sized) change. It almost always works! I wish to either start using cursor or some vscode extension to do this from ide itself.
  - upcoming-sesame7 months ago
    How do you send an entire repo to it ? file by file ?
    asadm7 months ago
    I have a few ways but usually https://github.com/yamadashy/repomix works
- 7 months ago
  undefined
- ZYbCRq22HbJ2y77 months ago
  Is it because Google is feeding the model that information about you? It knows more of the responses you'd like? Just like Google does with search history?
- gavinray7 months ago
  It doesn't seem as popular, but I've found Grok to treat you the least like a child and provide good answers. Especially with more complicated tasks.
  - uejfiweun7 months ago
    I think Grok is the best for asking about current events but I kind of hate how it always tries to turn everything into a conversation. But that's just my opinion! What do you think is the most annoying feature about Grok?
    ^ like that.
- codelion7 months ago
  it's interesting that different models evoke such distinct personalities. i agree, sometimes the excessive enthusiasm can be distracting. a concise, focused response is often more valuable, especially for technical tasks. i find that a clear system prompt can really steer the model's behavior, like you mentioned.
- tomrod7 months ago
  I uniformly call Gemini is a bash script. Really like it they way.
- dilyevsky7 months ago
  When you have to justify your spend to public shareholders it makes it much more difficult to spend tokens on “great quesion!” and vocal ticks and what not
- baby7 months ago
  how is that related to the post?
- regulayshun7 months ago
  [dead]
jruohonen7 months ago
> Next, in response to a question about the vulnerabilities in the Salt Typhoon description, Sec-Gemini v1 outputs not only vulnerability details (thanks to its integration with OSV data, the open-source vulnerabilities database operated by Google), but also contextualizes the vulnerabilities with respect to threat actors (using Mandiant data).
I remain still skeptical about LLMs in this space, although I might be proven wrong, as often happens. Nevertheless, OSV has already been a big advance, so it is great that it gets a further commitment.
andy997 months ago
Is this a "model" as in a set of transformer weights that inherently does security work or is it a system that has data lookup and or other tools along with an LLM to do the question interpretation, synthesis, and output presentation?
From the description re data integrations it sounds like the latter, unless the data mentioned is in fact used for training.
The distinction is important because a security-tuned model will have different limitations and uses than an actual pre-build security LLM app. Being an app also makes benchmarking against other "models" less straightforward.
esafak7 months ago
It's interesting how we're seeing the emergence of specialized models, much like trained humans.
- jgalt2127 months ago
  What's old is new again. Pretty much all ML and statistical models were specialized for a single task / domain.
infoSecer7 months ago
It always blows my mind that nobody at Google thought it would be a good idea to very carefully review the answer of the AI. In the second screenshot, the prompt asks about CVE-2024-3400, and at first glance this appears ok.
But in the affected systems section it states:
> Also Hitachi Energy RTU500 firmware and Siemens Ruggedcom APE1808 firmware.
I cannot find any reference that this Hitachi device is vulnerable to that CVE. Hitachi has a nice interface to list all vulnerabilities of their devices, this CVE is not part of it. In the Mitigation section any mention of Hitachi is also missing. Almost as if this device is not vulnerable.
There is some more weirdness, like it doesn't mention the "portal" feature is also vulnerable.
- ebursztein7 months ago
  Thanks for looking in-depth in our post. The Hitachi RTU500 mention is not an hallucination, we did check for those. It is mentioned in the Mandiant threat intelligence data.
  - infoSecer7 months ago
    Have you considered that Mandiant is wrong? I cannot find any evidence that it would be vulnerable. Hitachi doesn't even appear to be a technology partner of Palo Alto (https://technologypartners.paloaltonetworks.com/English/dire...).
    As far as I can tell, the only connection between those is, that CISA released this alert which mentions multiple unrelated advisories in one post. Which happens to be the Siemens Palo Alto and another unrelated Hitachi advisory in RTU500: https://www.cisa.gov/news-events/alerts/2024/04/25/cisa-rele...
    fc417fc8027 months ago
    Isn't the tool doing its job in that case? I wouldn't generally expect it to independently determine that an otherwise reliable source made a mistake. In fact I feel like that would be a really bad idea.
    Imagine if a relatively clueless intern left something out of a report because the textbook "seemed wrong".
    infoSecer7 months ago
    I don't really know what its job is to be honest.
    Saying that the input data is wrong and the AI didn't hallucinate that data is also kind of a "trust me bro" statement. The Mandiant feed is not public, so I cannot check what was fed to it.
    I don't really care why its wrong. It is wrong. And using that as the example prompt in your announcement is an interesting choice.
notepad0x907 months ago
I'm always torn apart when it comes to LLMs and analytical tasks. When you perform an analytical task, whether it is something simple like assessing the potential risk and impact of a vulnerability or complex like analyzing an obfuscated malware sample to determine its capabilities, you have to thoroughly go over the data points available to you, and corroborate the data points or evidence you are using to come up with conclusions. LLMs can help with a lot of this, but you still have to go over their reasoning (black-box mostly) or backtrack their work before you can accept their conclusions.
In other words, even with humans, their skills and experience are never enough. they have to show the reasoning behind their conclusions and then show that reasoning is backed up by an independent source of fact. Short of that, you can still perform analysis, but then you must clearly state that your analysis is weak and requires more follow-up work and validation.
So with LLMs, I'm torn up because they kind of make your life a lot easier, but does it just feel that way or are they adding more work and uncertainty where that is intolerable?
ziddoap7 months ago
Could be great for augmenting a cybersec professional's tasks; I'm certainly interested in trying it. However, I fear it will not be used as just one of the tools in the toolbox, and rather it will be used as something to defer (and consequently shed liability) to.
- booi7 months ago
  Has anybody been able to shed liability to AI yet?
  - ziddoap7 months ago
    In the legal sense? I'm not sure.
    In the corporate day-to-day? Absolutely.
  - walleeee7 months ago
    We have practiced the art of liability displacement from living, breathing human beings to artificial constructions for a lot longer than we've had a digital substrate for such
  - 7 months ago
    undefined
mmooss7 months ago
Using AI systems for high-speed security actions, proactive and reactive, seems necessary but not sufficient:
I expect attackers will also use AI systems, trained on the latest in effective attacks. What about defense would make defenders' AI systems more effective than attackers'?
I think it's necessary because, if the attackers use AI systems then the defenders need to keep up.
Also, we need to be creating far more secure systems to start with. Now it is, to a degree, security through obscurity - something is secure when attackers can't find the bugs fast enough. Security through obsurity wouldn't seem to work well when the attacker uses AI software.
ZYbCRq22HbJ2y77 months ago
Does it seem like a bad idea to trust something that is probablistically correct with security?
- amitport7 months ago
  Like with any automatic procedure: Are humans better?
  Specifically, in their own example they are just citing Mandiant, which may itself be wrong...
  https://news.ycombinator.com/item?id=43595294
7 months ago
undefined
bn-l7 months ago
Maybe this has something to do with the wiz acquisition.
majestik7 months ago
I read the article, and while it’s great the model can generate relevant output- so what? The article doesn’t discuss any action being taken using that output.
So what’s the big breakthrough here?