CERN uses tiny AI models burned into silicon for real-time LHC data filtering(theopenreader.org)

212 pointsby TORcicada8 hours ago29 comments

intoXbox7 hours ago
They used a custom neural net with autoencoders, which contain convolutional layers. They trained it on previous experiment data.
https://arxiv.org/html/2411.19506v1
Why is it so hard to elaborate what AI algorithm / technique they integrate? Would have made this article much better
- dcanelhas7 hours ago
  I'm half expecting to see "AI model" appearing as stand-in for "linear regression" at this point in the cycle.
  - ninjagoo7 hours ago
    > I'm half expecting to see "AI model" appearing as stand-in for "linear regression" at this point in the cycle.
    Already the case with consulting companies, have seen it myself
    idiotsecant3 hours ago
    Some career do-nothing-but-make-noise in my organization hired a firm to 'Do AI' on some shitty data and the outcome was basically linear regression. It turns out that you can impressive executives with linear regression if you deliver it enthusiastically enough.
    tasuki3 hours ago
    Tbh, often enough, linear regression is exactly what is needed.
    idiotsecant9 minutes ago
    Yes, and we do it every day and call it 'linear regression' and don't need a data center full of expensive toys to do it
  - blitzar6 hours ago
    I'm half expecting to see "AI model" appearing as stand-in for "if > 0" at this point in the cycle.
    Foobar85684 hours ago
    This is why I am programming now in Ocaml, files themselves are AI ( ml ).
    srean4 hours ago
    I am sure you did not forget that pattern matching.
    Vetch3 hours ago
    This is essentially what any relu based neural network approximately looks like (smoother variants have replaced the original ramp function). AI, even LLMs, essentially reduce to a bunch of code like
    let v0 = 0 let v1 = 0.40978399*(0.616*u + 0.291*v) let v2 = if 0 > v1 then 0 else v1 let v3 = 0 let v4 = 0.377928*(0.261*u + 0.468*v) let v5 = if 0 > v4 then 0 else v4...
    samrus3 hours ago
    Thats a bit far. Relu does check x>0 but thats just one non-linearity in the linear/non-linear sandwich that makes up universal function approximator theorem. Its more conplex than just x>0
    Vetchan hour ago
    The relu/if-then-else is in fact centrally important as it enables computations with complex control flow (or more exactly, conditional signal flow or gating) schemes (particularly as you add more layers).
    greenavocado2 hours ago
    Multiply-accumulate, then clamp negative values to zero. Every even-numbered variable is a weighted sum plus a bias (an affine transformation), and every odd-numbered variable is the ReLU gate (max(0, x)). Layer 2 feeds on the ReLU outputs of layer 1, and the final output is a plain linear combination of the last ReLU outputs
    // inputs: u, v // --- hidden layer 1 (3 neurons) --- let v0 = 0.616*u + 0.291*v - 0.135 let v1 = if 0 > v0 then 0 else v0 let v2 = -0.482*u + 0.735*v + 0.044 let v3 = if 0 > v2 then 0 else v2 let v4 = 0.261*u - 0.553*v + 0.310 let v5 = if 0 > v4 then 0 else v4 // --- hidden layer 2 (2 neurons) --- let v6 = 0.410*v1 - 0.378*v3 + 0.528*v5 + 0.091 let v7 = if 0 > v6 then 0 else v6 let v8 = -0.194*v1 + 0.617*v3 - 0.291*v5 - 0.058 let v9 = if 0 > v8 then 0 else v8 // --- output layer (binary classification) --- let v10 = 0.739*v7 - 0.415*v9 + 0.022 // sigmoid squashing v10 into the range (0, 1) let out = 1 / (1 + exp(-v10))
  - thesz2 hours ago
    There is an HIGGS dataset [1]. As name suggest, it is designed to apply machine learning to recognize Higgs bozon.
    [1] https://archive.ics.uci.edu/ml/datasets/HIGGS
    In my experiments, linear regression with extended (addition of squared values) attributes is very much competitive in accuracy terms with reported MLP accuracy.
    dguestan hour ago
    The LHC has moved on a bit since then. Here's an open dataset that one collaboration used to train a transformer:
    https://opendata-qa.cern.ch/record/93940
    if you can beat it with linear regression we'd be happy to know.
  - phire7 hours ago
    I'm sure I've seen basic hill climbing (and other optimisation algorithms) described as AI, and then used evidence of AI solving real-world science/engineering problems.
    LiamPowell6 hours ago
    Historically this was very much in the field of AI, which is such a massive field that saying something uses AI is about as useful as saying it uses mathematics. Since the term was first coined it's been constantly misused to refer to much more specific things.
    From around when the term was first coined: "artificial intelligence research is concerned with constructing machines (usually programs for general-purpose computers) which exhibit behavior such that, if it were observed in human activity, we would deign to label the behavior 'intelligent.'" [1]
    [1]: https://doi.org/10.1109/TIT.1963.1057864
    zingar6 hours ago
    That definition moves the goalposts almost by definition, people only stopped thinking that chess demonstrated intelligence when computers started doing it.
    Eufrat6 hours ago
    The term artificial intelligence has always been just a buzzword designed to sell whatever it needed to. IMHO, it has no meaningful value outside of a good marketing term. John McCarthy is usually the person who is given credit for coming up with the name and he has admitted in interviews that it was just to get eyeballs for funding.
    coherentpony4 hours ago
    I am somewhat cynically waiting for the AI community to rediscover the last half a century of linear algebra and optimisation techniques.
    At some point someone will realise that backpropagation and adjoint solves are the same thing.
    bonoboTP2 hours ago
    There are plenty of smart people in the "AI community" already who know it. Smugly commenting does not replace actual work. If you have real insight and can make something perform better, I guarantee you that many people will listen (I don't mean twitter influencers but the actual field). If you don't know any serious researcher in AI, I have my doubts that you have any insight to offer.
    whattheheckheck3 hours ago
    I am sure they are aware...
  - yread6 hours ago
    And why not, when linear regression works, it works so well it's basically magic, better than intelligence, artificial or otherwise
  - plasino4 hours ago
    Having work with people who do that, I can guarantee that’s not the case. See https://ssummers.web.cern.ch/conifer/ and HSL4ML, these run BDT and CNN
  - Staross4 hours ago
    That works well to get around patents btw :)
- etrautmann6 hours ago
  It seems like most of the implementation is FPGA, which I wouldn’t call “physically burned into silicon.” That’s quite a stretch of language
- vultour6 hours ago
  Because if it’s not an LLM it’s not good for the current hype cycle. Calling everything AI makes the line go up.
  - danielbln4 hours ago
    LLMs also make the cynicism go up among the HN crowd.
    okamiueruan hour ago
    Hm. Is HN starting to become more skeptical of LLMs? For the past couple of years, HN has seemed worryingly enthusiastic about LLMs.
    andersonpicoan hour ago
    How so? Half the people here have LLM delusion in every thread posted here; more than half of the things going to the frontpage are AI. Just look at hours where Americans are awake.
    irishcoffee42 minutes ago
    Fucking Americans. Only 4% of the world population, with the magic of disproportionately afflicting the global news headlines which make their way here.
    It’s impressive, honestly.
- fnord773 hours ago
  Thanks for tracking this down. I too am annoyed when so-called technical articles omit the actual techniques.
- jgalt2123 hours ago
  Because it does not align with LLM Uber Alles.
- chsun30 minutes ago
  [dead]
peelslowlysee20 minutes ago
First internship, cern, summer 1989 on the opal lepc pit, wrote offline data filtering program in FORTRAN. Blast from the past.
jurschreuder4 hours ago
I've got news for you, everybody with a modern cpu uses this, which use a perceptron for branch prediction.
- archermarksan hour ago
  I didn't know that! Do you have any references that go into more depth here? I'd be curious how the architect and train it.
  - isotypican hour ago
    I believe D. A. Jimenez and C. Lin, "Dynamic branch prediction with perceptrons" is the paper which introduced the idea. It's been significantly refined since and I'm not too familiar with modern improvements, but B. Grayson et al., "Evolution of the Samsung Exynos CPU Microarchitecture" has a section on the branch predictor design which would talk about/reference some of those modern improvements.
- amelius3 hours ago
  At this point AI basically means "we didn't know how to solve the problem so we just threw a black box at it".
  - integralid2 hours ago
    I disagree. More often than not is "We know how to solve the problem, and the solution is some linear algebra"
    Legend244027 minutes ago
    I disagree with both of you.
    It's not about linear algebra (which is just used as a way to represent arbitrary functions), it's about data. When your problem is better specified from data than from first principles, it's time to use an ML model.
serendipty018 hours ago
Might be related: https://www.youtube.com/watch?v=T8HT_XBGQUI (Big Data and AI at the CERN LHC by Dr. Thea Klaeboe Aarrestad)
https://www.youtube.com/watch?v=8IZwhbsjhvE (From Zettabytes to a Few Precious Events: Nanosecond AI at the Large Hadron Collider by Thea Aarrestad)
Page: https://www.scylladb.com/tech-talk/from-zettabytes-to-a-few-...
konradha5 hours ago
How are FPGAs "bruned into silicon"? Would be news to me that there are ASICs being taped out at CERN
- eqvinox5 hours ago
  CERN in fact does design custom ASICs for other things: https://indico.cern.ch/event/1115079/contributions/4693643/a...
  (Probably not for this here though.)
- danparsonson5 hours ago
  Could they.... have someone else do it for them?
  - dguest35 minutes ago
    CERN doesn't build everything CERN uses:
    - FPAGs like this one are generally COTS.
    - All the experiments use GPUs which come straight from the vendors.
    - Most of the computing isn't even on site, it's distributed around the world in various computing centers. Yes they also overflow into cloud computing but various publicly funded datacenters tend to be cheaper (or effectively "free" because they were allocated to CERN experiments).
    Some very specific elements (those in the detector) need to be radiation hard and need O(microsecond) latency. These custom electronics are built all over the world by contributing national labs and universities.
    CERN builds a bit.
  - samrus3 hours ago
    Glib, but it wont be cost effective at that small scale
    danparsonsonan hour ago
    So are we arguing that the article that talks about them using ASICs is just making that up then? Otherwise what's the fourth option?
    Who says CERN needs to be cost effective?
quijoteuniv8 hours ago
A bit of hype in the AI wording here. This could be called a chip with hardcoded logic obtained with machine learning
- FartyMcFarter8 hours ago
  AI is not a new thing, and machine learned logic definitely counts as AI.
  - monkeydust7 hours ago
    For those that have experience with ML, yes. For those that have recently become acquainted with it (more on business side) they seem to really struggle with this in my experience. '
  - volemo7 hours ago
    Yeah, and don’t forget Eliza!
- bonoboTP2 hours ago
  ML is part of AI, and has always been. AI is not equal to chatgpt and AI wasn't coined/conceived in November 2022.
- killingtime748 hours ago
  Is a LLM logic in weights derived from machine learning?
  - shlewis8 hours ago
    Well, yes. That's literally what it is.
    dmd7 hours ago
    What what is? The article has nothing to do with LLMs. It even explicitly says they don’t use LLMs.
    shlewis5 hours ago
    > Is a LLM logic in weights derived from machine learning?
    I was just answering this question. LLM logic in weights is fundamentally from machine learning, so yes. Wasn't really saying anything about the article.
  - quijoteuniv8 hours ago
    Good one… but Is a DB query filter AI? I forgot to say though is sounds like a really cool thing to do
    stingraycharles7 hours ago
    Strictly speaking, expert systems are AI as well, as in, an expert comes up with a bunch of if/else rules. So yes technically speaking even if they didn’t acquire the weights using ML and hand-coded them, it could still be called AI.
    phire7 hours ago
    It is 100% valid to label an algorithm that plays tic-tac-toe as "AI"
    Much of the early AI research was spent on developing various algorithms that could play board games.
    Didn't even need computers, one early AI was MENACE [1], a set of 304 matchboxes which could learn how to play noughts and crosses.
    [1] https://en.wikipedia.org/wiki/Matchbox_Educable_Noughts_and_...
    FarmerPotatoan hour ago
    I built the Matchbox for Hexapawn, detailed in National Geographic Kids!
    I didn't know what a Jujube was, but I got the idea.
    stingraycharles6 hours ago
    Yup this is exactly my point, in the 80s there were plenty of “AI” companies and “fuzzy logic” was the buzzword of the day.
armcat5 hours ago
Not on the same extreme level, but I know that some coffee machines use a tiny CNN based model locally/embedded. There is a small super cheap camera integrated in the coffee machine, and the model does three things: (1) classifies the container type in order to select type of coffee, (2) image segmentation - to determine where the cup/hole is placed, (3) regression - to determine the volume and regulate how much coffee to pour.
Surac6 hours ago
Very important! This is not a LLM like the ones so often called AI these days. Its a neural network in a FPGA.
- duskdozer5 hours ago
  I guess shows the LLM-companies' marketing worked very well because that's what I immediately thought of.
- IshKebab6 hours ago
  > FPGA
  So they aren't "burned into silicon" then? The article mentions FPGAs and ASICs but it's a bit vague. I would be surprised if ASICs actually made sense here.
  - fecal_henge4 hours ago
    They make sense when you consider that 'on detector' electronics has all sorts of constraints that FPGAs cant compete on: Power, Density, Radiation hardness, Material budget.
TORcicada5 hours ago
Thanks for the thoughtful comments and links really appreciated the high-signal feedback. We've updated the article to better reflect the actual VAE-based AXOL1TL architecture (variational autoencoder for anomaly detection). Added the arXiv paper and Thea Aarrestad's talks to the Primary Sources.
- dguest27 minutes ago
  While you are at it:
  > To meet these extreme requirements, CERN has deliberately moved away from conventional GPU or TPU-based artificial intelligence architectures.
  This isn't quite right either: CERN is using more GPUs than ever. The data processing has quite a few steps and physicists are more than happy to just buy COTS GPUs and CPUs when they work.
Kapura2 hours ago
Why did we stop calling this stuff machine learning again? this isn't even an llm, which has become the common bar for 'ai'
- dguestan hour ago
  Because every principle investigator in academia works in sales.
  Some tried to hold out and keep calling it "ML" or just "neural networks" but eventually their colleagues start asking them why they aren't doing any AI research like the other people they read about. For a while some would say "I just say AI for the grant proposals", but it's hard to avoid buzzwords when you're writing it 3 times a day I guess.
  Although note that the paper doesn't say "AI". The buzzword there is "anomaly detection" which is even weirder: somehow in collider physics it's now the preferred word for "autoencoder", even though the experiments have always thrown out 99.998% of their data with "classical" algorithms.
Aegis_Labs2 hours ago
This is the spirit. I'm doing something similar: scaling a 1.8T logic system using a budget mobile device as the primary node. Just hit 537 clones today. It's all about how you structure the logic, not the CPU power.
WhyNotHugo7 hours ago
Intuitively, I’ve always had an impression that using an analogue circuit would be feasible for neural networks (they just matrix multiplication!). These should provide instantaneous output.
Isn’t this kind of approach feasible for something so purpose-built?
- incognito1246 hours ago
  You might wanna look at https://taalas.com/
  - lsaferite39 minutes ago
    They aren't using analog circuits, are they?
v9v7 hours ago
Do they actually have ASICs or just FPGAs? The article seems a bit unclear.
rakel_rakel8 hours ago
Hey Siri, show me an example of an oxymoron!
> CERN is using extremely small, custom large language models physically burned into silicon chips to perform real-time filtering of the enormous data generated by the Large Hadron Collider (LHC).
- sh3rl0ck8 hours ago
  There's no mention of SLMs or LLMs, though.
  > This work represents a compelling real-world demonstration of “tiny AI” — highly specialised, minimal-footprint neural networks
  FPGAs for Neural Networks have been s thing since before the LLM era.
  - 8 hours ago
    undefined
  - 1007218 hours ago
    Huh? The first paragraph literally says they are using LLMs
    > [ GENEVA, SWITZERLAND — March 28, 2026 ] — CERN is using extremely small, custom large language models physically burned into silicon chips to perform real-time filtering of the enormous data generated by the Large Hadron Collider (LHC).
    SiempreViernes8 hours ago
    the site might have fixed it, to me it says "artificial intelligence" instead of LLM, still bad but not" steaming pile of poo on you bank statement" bad
    progval6 hours ago
    They changed it from AI to LLM then back to AI: https://theopenreader.org/index.php?title=Journalism:CERN_Us... and https://theopenreader.org/index.php?title=Journalism%3ACERN_...
    6 hours ago
    undefined
- msla8 hours ago
  Are they some ancient small-scale integration VLSI design? Do they broadcast on a low-frequency VHF band? Face it: Oxymorons like those are part of the technical world. "VLSI" was a current term back when whole CPUs were made out of fewer transistors than we use for register files now, and "VHF" is low frequency even by commercial broadcasting standards.
  - rakel_rakel8 hours ago
    haha, yea they are part of it for sure, and I'm not dunking on the use of them, but I rather smile a bit when I stumble upon them.
    Like (~9K) Jumbo Frames!
quantum_state5 hours ago
CERN has been doing HEP experiments for decades. What did it use before the current incarnation of AI? The AI label seems to be more marketing and superficial than substantial. It’s a bit sad that a place like CERN feels the need to make it public that it is on the bandwagon.
- FarmerPotato30 minutes ago
  It was ten years ago I worked on an oscilloscope for CERN with FPGA trigger. You were able to update the trigger portion of the bitstream at any time, without a reset. Typically that was a FIR filter but it could be anything.
  Like anything else, once you work with a system, it gives you ten ideas where to go next...
- eqvinox5 hours ago
  It doesn't say LLM anywhere.
  - quantum_state5 hours ago
    Good catch. Corrected. Thanks!
randomNumber78 hours ago
Does string theory finally make sense when we ad AI hallucinations?
- quantum_state5 hours ago
  This is a good one
Janicc7 hours ago
I think chips having a single LLM directly on them will be very common once LLMs have matured/reached a ceiling.
seydor7 hours ago
cern has been using neural networks for decades
mentalgear7 hours ago
That's what Groq did as well: burning the Transformer right onto a chip (I have to say I was impressed by the simplicity, but afterwards less so by their controversial Kushner/Saudi investment) .
- NitpickLawyer7 hours ago
  > That's what Groq did as well: burning the Transformer right onto a chip
  Are you perhaps confusing Groq with the Etched approach? IIUC Etched is the company that "burned the transformer onto a chip". Groq uses LPUs that are more generalist (they can run many transformers and some other architectures) and their speed comes from using SRAM.
nerolawa7 hours ago
the fact that 99% of LHC data is just gone forever is insane
- johngossman3 hours ago
  Not really. Think of the experiment as a very, very high speed camera. They can't store every frame, so they try to capture just the "interesting" ones. They also store some random ones that can be used later as controls or in case they realize they've missed something. That's the whole job of these various layers of algorithms: recognizing interesting frames. Sometimes a new experiment basically just changes the definition of "interesting"
amelius6 hours ago
When is the price of fabbing silicon coming down, so every SMB can do it?
- IshKebab6 hours ago
  My guess would be never. The closest you can get is "multi project wafers" where you get bundled with a load of other projects. As I understand it they're on the order of $100k which is cheap, but if you actually want to design and verify a chip you're looking at at least several million in salaries and software costs. Probably more like $10m, especially if you're paying US salaries. And of course that would be for a low performance design.
  I think a better question would be "when are FPGAs going to stop being so ridiculously overpriced". That feels more possible to me (but still unlikely).
  - fc417fc8024 hours ago
    Doesn't this vary wildly depending on the process node though? The cutting edge stuff keeps getting increasingly ridiculous meanwhile I thought you could get something like 50 nm for cheap. I also remember seeing years ago that some university had a ~micron (IIRC) process that you could order from.
1007218 hours ago
Does anyone know why they are using language models instead of a more purpose-built statistical model? My intuition is that a language model would either be overfit, or its training data would have a lot of noise unrelated to the application and significantly drive up costs.
- LeoWattenberg8 hours ago
  It's not an LLM, it is a purpose built model. https://arxiv.org/html/2411.19506v1
  5 years ago we would've called it a Machine Learning algorithm. 5 years before that, a Big Data algorithm.
  - IanCal8 hours ago
    We’ve been calling neural nets AI for decades.
    > 5 years before that, a Big Data algorithm.
    The DNN part? Absolutely not.
    I don’t know why people feel the need for such revisionism but AI has been a field encompassing things far more basic than this for longer than most commenters have been alive.
    magicalhippo7 hours ago
    > AI has been a field encompassing things far more basic than this for longer than most commenters have been alive.
    When I was 13, having just started programming, I picked up a book from a "junk bin" at a book store on Artificial Intelligence. It must have been from the mid-80s if not older.
    It had an entire chapter on syllogism[1] and how to implement a program to spit them out based on user input. As I recall it basically amounted to some string exteaction assuming user followed a template and string concatenation to generate the result. I distinctly recall not being impressed about such a trivial thing being part of a book on AI.
    [1]: https://en.wikipedia.org/wiki/Syllogism
    rjh297 hours ago
    Eliza was 1960s.
    In the 1990s I remember taking my friend's IRC chat history and running it through a Markov model to generate drivel, which was really entertaining.
  - t0lo8 hours ago
    i hate that we're in this linguistic soup when it comes to algorithmic intelligence now.
- kevmo3148 hours ago
  This might be some journalistic confusion. If you go to the CERN documentation at https://twiki.cern.ch/twiki/bin/view/CMSPublic/AXOL1TL2025 it states
  > The AXOL1TL V5 architecture comprises a VICReg-trained feature extractor stacked on top of a VAE.
- dmd7 hours ago
  … they’re not? Who said they are? The article even explicitly says they’re not?
  - progval6 hours ago
    For 40 minutes, the article claimed they used LLMs. They changed the wording twice: https://theopenreader.org/index.php?title=Journalism:CERN_Us... and https://theopenreader.org/index.php?title=Journalism%3ACERN_...
logicallee3 hours ago
I hope they have good results and keep all the data they need, and identify all the interesting data they're looking for. I do have a cautionary tale about mini neural networks in new experiments. We recently spent a large amount of time training a mini neural network (200k parameters) to make new predictions in a very difficult domain (predicting specific trails for further round collisions in a hash function than anyone did before.) We put up a spiffy internal dashboard[1] where we could tune parameters and see how well the neural network learns the existing results. We got to r^2 of 0.85 (that is very good correlation) on the data that already existed, from other people's records and from the data we solved for previously. It showed such a nicely dropping loss function as it trained, brings tears to the eye, we were pumped to see how it performs on data it didn't see before, data that was too far out to solve for. So many parameters to tune! We thought we could beat the world record by 1 round with it (40 instead of 39 rounds), and then let the community play with it to see if they can train it even better, to predict the inputs that let us brute force 42 round collisions, or even more. We could put up a leaderboard. The possiblities were endless, all it had to do was do extrapolate some input values by one round. We'd take the rest from there with the rest of our solving instrastructure.
After training it fully, we moved on to the inference stage, trying it on the round counts we didn't have data for! It turned out ... to have zero predictive ability on data it didn't see before. This is on well-structured, sensible extrapolations for what worked at lower round counts, and what could be selected based on real algabraic correlations. This mini neural network isn't part of our pipeline now.
[1] screenshot: https://taonexus.com/publicfiles/mar2026/neural-network.png
7 hours ago
undefined
seankwon81639 minutes ago
[dead]
devnotes773 hours ago
[dead]
Remi_Etien8 hours ago
[dead]
claytonia8 hours ago
[dead]
TORcicada8 hours ago
[dead]