When Gemini Pro came out about a year ago (I forget which version number), the reasoning was visible.
The reasoning was extremely useful. It would capture the logical structure of the whole problem space.
I found it incredibly valuable and actually more readable than the "human friendly" final output. (A massive blob of prose.)
I was very sad when they removed it.
edit: pretty sure full cot is always available via api.
Honestly, who cares?
I have yet to see a documented example of a system prompt leak that was NOT the real system prompt. Have you seen one?
Loosely, LLMs give plausible responses. And LLMs are really good at writing confident-sounding responses.
LLM output is as if someone is replying with the sole purpose of appearing helpful and knowledgeable.
I wouldn't trust opinions on LLMs from people who are entirely positive or entirely negative: the technology is just too mixed for that. I'd say it's useful for someone to have had a bad experience with LLMs (e.g. LLMs being confidently wrong), as well as making use of LLMs for things they're powerful at. (e.g. "small" programming tasks).
> If you are already standing at the stove (say, at 11:51), you can simply put the pan on a burner with a little water and turn it on.
I assume the current time gets injected into the promt, and gemini thinks it comes from the user?
I had that a few times now. Always very close to the end of a longer response.
Edit: Never mind. My bad. I added "Please use 24-hour time in all our future chats." to my personalized settings. I got tired of it using AM / PM system, but forgot about it.
I doubt the OP is actually the Gemini system prompt, but I'm sure it does try to keep personal data from screwing up results, but it's just not possible given the state of the technology. Everything you cram into the limited context probably either helps or hurts the results and if it's unrelated to the specific problem, it hurts.
When the model tries to satisfy everything it remembers about me, it comes up with conflicting details and desires. My personal projects don't look anything like my work projects. My little games don't have the same requirements as my security sensitive software for robots in hospitals. The fact that I asked how a hospitality business operates doesn't mean the tax question I asked a week later is about a hospitality business.
The models just can't make sense of all that data yet, and even if they've been instructed to consider that maybe some details aren't important, it still impacts the attention math.
Every time it turns out to be hallucinations.
I had an interesting case yesterday with Gemini where I asked it a casual question about a PDF and rather than mirroring my casual tone/question it mirrored the PDF instead like it was writing a paper!
In a similar vein, I've also have the Gemini voice app glitch a number of times and reply to itself - thinking that I had said what it last said!
> Avoid speculative reasoning or multi-step logical leaps.Domain Isolation: Do not transfer preferences across categories (e.g., professional data should not influence lifestyle recommendations).Avoid "Over-Fitting": Do not combine user data points.
Makes sense. What this really reflects is inability to reliably multi-step reason, where multiple reasoning steps that are individually valid get combined into an invalid chain (walk to car wash).
> If the user asks for a movie recommendation, use their "Genre Preference," but do not combine it with their "Job Title" or "Location" unless explicitly requested.Sensitive Data Restriction: You must never infer sensitive data (e.g., medical) from Search or YouTube.
Yeah, it would be a bit off-putting to get movie recommendations based on my job title, and HIGHLY off-putting to get recommendations based on my medical or search history. I guess the news here is that Gemini does have access to your medical and search history ... exploits incoming ?!
Thanks, it really made my morning looking at it.
> Before providing the final response, create a compliance checklist to verify that every constraint has been met.
I wonder if just this statement causes the Gemini to ensure compliance or there's a separate post validation function
hmmm... that aged well.
Can you provide more explanation about how this occurred?
Since the content was irrelevant, i called it as "randomly".
Every now and then, if you ask it that, it'll just dump everything, including system prompt. (Which will often include a message about not dumping the system prompt...)
Balance empathy with candor
"empathy" would have to be emulated like a sociopath, to a lesser extent "candor"but then also "balance" requires a grasp of the weight of each, even if mathematically?
BTW what on earth happens internally when you ask another "AI" to evaluate the prompt of another "AI"
Unfortunately you have to learn to let go, and say, "I'll never be able to keep this all in my head", and learn to think about it in terms of of the outputs/inputs and how you can create a model capable of efficiently modeling your problem and how parameters can be nudged to get an output which is kinda shaped how you want.
Maybe some really genius savant could keep it all in their head but I doubt it, like I said it'd be like trying to understand a person by reasoning about their neural pathways.
A human designed the network of matrix multiplications that make up the model, but the matrices all started out filled with zeroes. The training process is what puts nonzero values in these matrices.
The values are such that if you look at how each column of the input matrix (which, recall, is an encoding of each token of the conversation history) gets multiplied as it works its way through the network, you can think of the coefficients as the model’s understanding of a “concept”. Notably, because modern LLMs are built around the concept of a “transformer” which looks at every word of the context simultaneously, the “concept” can involve coefficients that act on multiple (usually nearby) tokens. So the model may have internalized the concept of “empathetic” as “a cluster of nearby tokens whose values, when multiplied by M1[2492, 59272] * M2[592827, 7394] * M3[93732, 429474] * ..., are all similar to each other.”
As might be apparent, this means that the values for each token embedding are also highly sensitive to the coefficients buried deep within the LLM’s matrices. These values are the result of the training process too. The whole thing gets trained at once on an enormous body of text, some of which explicitly explains concepts like “empathy,” which will enable the model to associate empathetic clusters of words with other clusters of words that define empathy. But if you removed all the definitions and explanations of empathy from the training set, the model would still probably form a (weaker) concept of “empathy” that it would have a harder time explaining. It might not even know the word “empathetic,” but it would have some internal structure that causes token clusters like “I’m sorry for your loss” and “I know what you’re going through” to have similar values across some of their elements.
Using LLMs for medical research is all about encoding drug data rather than words and trying to extract the “concepts” that the training process has formed. The hope is that the model has internalized a novel insight from seeing way more data than a human could consume, much less reason about.
It matches with Reddit posts that have statistically similar words and starts generating the next statistically likely token.
but that still requires it to recognize the concept of "empathy" and "candor" in words of others
even if it is just pattern matching on a massively parallel scale, it still seems beyond simple logic
if you told "AI" to comb a reddit sub and find only posts that are empathetic, how on earth is that evaluated?