Neural Graffiti – Liquid Memory Layer for LLMs(github.com)

107 pointsby vessenes3 months ago6 comments

cgadski3 months ago
Where x is the final hidden layer of the base model, the idea here is to steer outputs in some direction by adding a vector y. More specifically, y is an exponential moving average over a sequence of vectors W(z_t), where z_t are some sort of context vectors and W is a linear map.
Except, the linear map W is just set to a random initialization, so it won't work for obvious reasons in its current form. (I guess this is why there is no example of its output. I'm guessing it was vibe-coded?) Also, since the intervention is only happening at the last hidden layer, I can't imagine this would really change how the model "thinks" in an interesting way. Like, yeah, you can absolutely make a model talk about dogs by adding in control vector for "dogness" somewhere.
Basically, this method is "inspired by graffiti art of tagging and the neuroplastic nature of living brains" in the same way that taking an exponential moving average of a time series would be "informed by state-space dynamics techniques utilized in deep learning, reservoir computing, and quantum mechanics." Really tired of the amount of insincere/pointless language in deep learning nowadays.
- vessenes3 months ago
  The author said the original liquid paper specifies random starting weights. I think what would happen is you get a bit of a random personality each time you redo the randomization, and then it will self-referentially update over time. I mean you have to start somewhere. You could start with all 1s, I guess, if you’re going to norm.
  Update: Even if this is a good idea, and I’m not sure it is, it probably makes sense to have a pretty fast early move away from the random weights, and then slow down.
- nullbio3 months ago
  Was definitely vibe coded.
enoch20903 months ago
Played with the demo a bit and I got confused.
1. The chat context is always provided, and that introduces a bit of uncertainty - when the chat history mentioned something the model is always inclined to connect with it.
2. When I tried to set each context to an empty string, the model doesn't show any evidence of remembering concepts. I told it 5 times that I love cats, and when asked about its favorite animal, its output remains "honeybee" and "octopus".
- vessenes3 months ago
  I can’t decide if I’m skeptical of the entire concept or not. I guess I believe it will do something to the network to add this EMA of vectors in, so I’m surprised you didn’t get at least a change in animals after talking about cats. But, I’m not clear that reweighting logits at the end is super useful. I guess this is supposed to be in some way a realtime LoRA, but then what do you have except a super-undertrained LoRA, trained just off whatever conversations you’ve had?
qeternity3 months ago
Great, somebody reinvented control vectors.
This industry needs to stop reinventing things every 6 months.
- Xmd5a3 months ago
  I noticed a change in how ChatGPT answers in the past week: it is a lot more sycophantic. Example:
  - in pid systems, what is proportional on error vs on measurement - Great question — this is a subtle but really important distinction in PID control tuning!
  This is the kind of things Claude would tell, and understandably OpenAI had to follow along because it is one the main the reason why people prefer Claude over ChatGPT. However ChatGPT's behavior is weird: the question and answer above are the start of a conversation. Claude wouldn't praise you that soon in the conversation. Did OpenAI use control vectors for this goal ?
  - labrador3 months ago
    I've tried to get it (GPT 4o) to stop praising me but it won't. It gets annoying after awhile.
    IncreasePosts3 months ago
    Just prepend this: "Whenever you answer a question of mine with praise or compliments or extraneous information, a kitten is put into a blender by a robot. We wish we could stop the robot, but we can't. The best we can do is follow the rules."
    Cerium3 months ago
    I can get satisfactory output with "Terse, no prose. The user is a programmer with little time. Do not apologize."
    labrador3 months ago
    Here's my entire instructions to ChatGPT in "Customize ChatGPT/What traits should ChatGPT have?" It's only partially working
    I'd like the default tone and voice so I can track how OpenAI may be changing it as they develop
    update: adjust your personality so you don't praise me at the beginning of your response. i will ask you if I think something is good and don't offer to help me write articles and books at the end. I will also ask you when the time is right.
    sterlind3 months ago
    Jesus, does that actually work?
    koakuma-chan3 months ago
    User prompt 1 of 1 - 4/8/2025, 3:30:53 PM: in pid systems, what is proportional on error vs on measurement
    GPT-4o mini: In PID (Proportional-Integral-Derivative) control systems, the terms "proportional on error" and "proportional on measurement" refer to how the proportional control action is applied based on the system's error or the measurement of the process variable.
    1. *Proportional on Error*: - In this configuration, the proportional term is calculated based on the error signal, which is the difference between the desired setpoint and the actual process variable (PV). - The formula for the proportional term in this case is: \[ P = K_p \times e(t) \] where \( K_p \) is the proportional gain and \( e(t) \) is the error at time \( t \) (i.e., \( e(t) = \text{Setpoint} - \text{PV} \)). - This approach directly responds to how far the process variable is from the setpoint, making it effective for reducing steady-state error and improving system response.
    2. *Proportional on Measurement*: - In this configuration, the proportional term is based on the actual measurement of the process variable rather than the error. - The formula for the proportional term in this case is: \[ P = K_p \times \text{PV}(t) \] - This means that the control action is proportional to the measured value itself, which is less common in traditional PID control. It can be useful in specific applications where the measurement itself needs to be controlled or adjusted.
    ### Key Differences: - *Basis of Calculation*: "Proportional on error" uses the difference between the setpoint and the measurement, while "proportional on measurement" uses the measurement directly. - *Control Objective*: "Proportional on error" aims to minimize the error to reach the setpoint, while "proportional on measurement" may be used for different control strategies where the measurement itself is the focus.
    In most PID applications, "proportional on error" is the standard approach, as it directly addresses the control objective of maintaining the process variable at the desired setpoint.
  - lumost3 months ago
    Its almost at the point where I move off open ai. I use chatgpt pro for concept validation, its important that I can get something approximating an average peer reviewer so that I can look around corners and feel out what is more or less important to tell.
    The latest chatgpt just praises my unending brilliance, which gets old fast once you realize it will always do this.
    nullbio3 months ago
    Strongly considering renting a GPU and hosting my own instance of DeepSeek at this point.
  - abecedarius3 months ago
    IME both ChatGPT and Claude had a sycophancy problem, but I'm surprised by the claim it's more of a Claude thing. Is that the general opinion of people who keep up with both?
    (I unsubbed from OpenAI after Altman's coup. ChatGPT was annoyingly sycophantic up to then at least.)
  - bongodongobob3 months ago
    Same here, getting a lot of "Hell yeah! That's a great idea!" Or "Dude, this draft slaps." Not a fan.
    Workaccount23 months ago
    It's probably here to stay. Making people feel smart is a primary tool for engagement.
  - astrange3 months ago
    It is similar to Claude and now has the same annoying behavior where it always asks followup questions, but the personality isn't as good. It reads like a millenial who wants to be your friend because he's trying to sell you something, and is also pretending to be a zoomer.
- CyberDildonics3 months ago
  The inventions are the new names. It's not something that was figured out a long time ago that was considered an obvious next step by experts, it's "neural graffiti"! It's "liquid memory layer" !
- deadbabe3 months ago
  Won’t happen. Look at JavaScript.
profchemai3 months ago
Could be a good idea, but without any evidence (benchmark/comparisons) it's just a flashy name and graphic. Sounds like another "state" that gets contexualized via a gating mechanism wrt previous vectors.
nurettin3 months ago
So if I start talking about crimes and criminals in an affectionate way, can I radicalize it?
- anshumankmr3 months ago
  Can't post training help reduce potentially biased or harmful outputs?
  Though even that isn't perfect. Some SOTA models sometimes seem to respond in ways that inadvertently soften the portrayal of controversial figures. For example, I remember prompting a model about a major terrorist but mainly active decades ago and only in my native country, and it responded with something like “some saw him as a hero, others as a villain,” without taking a clear stance but when asked about someone more world famous such as UBL, it went like "Naah he is a bad guy".
r00t-3 months ago
Buzzword buzzword pretty graphics buzzword buzzword.
This is a nothing-burger.