Open-sourcing circuit tracing tools(www.anthropic.com)

161 pointsby jlaneve8 days ago7 comments

rob-olmos8 days ago
Anthropic employees Sholto Douglas & Trenton Bricken did an interview recently with Dwarkesh Patel, pieces here and there was about the circuit tracing insights.
https://www.dwarkesh.com/p/sholto-trenton-2 -- search the transcript for "circuit" for the quick bits.
Eg, "If you look at the circuit, you can see that it's not actually doing any of the math, it's paying attention to that you think the answer's four and then it's reasoning backwards about how it can manipulate the intermediate computation to give you an answer of four."
https://transformer-circuits.pub/
Tostino8 days ago
This type of stuff is really important in my opinion. Getting this type of stuff open sourced allows academics and other researchers to try and do this type of interpretability research on a more level playing field.
I think the more people looking at this the better. I have a feeling there will be some breakthroughs in identifying important circuits and being able to make more efficient model architectures that are bootstrapped from some identified primitives.
sanex8 days ago
The conversation about this on Dwarkesh was interesting and I'm glad we're getting access to the tool.
https://open.spotify.com/episode/3H46XEWBlUeTY1c1mHolqh?si=L...
jexp8 days ago
Imported the graph json into Neo4j
Have fun
https://gist.github.com/jexp/8d991d1e543c5a576a3f1ee70132ce7...
ofou8 days ago
Is this Garcon [1], or a new tool?
[1]: https://transformer-circuits.pub/2021/garcon/index.html
- e_ameisen8 days ago
  Hi, paper author here.
  This is a new tool which relies on existing introspection libraries like TransformerLens (which is similar in spirit to Garcon) to build an attribution graph. This graph displays intermediate computational steps the model took to sample a token.
  For more details on the method, see this paper: https://transformer-circuits.pub/2025/attribution-graphs/met....
  For examples of using it to study Gemma 2, check out the linked notebooks: https://github.com/safety-research/circuit-tracer/blob/main/...)
  We also document some findings on Claude 3.5 Haiku here: https://transformer-circuits.pub/2025/attribution-graphs/bio...)
Eduard8 days ago
thought this was about PCB tracing and was disappointed.
- dvh8 days ago
  If you only want to trace veroboards (stripboards) and not full blown PCBs I made a browser tool for that: https://github.com/dvhx/stripboard2schematic
  - Workaccount28 days ago
    By total coincidence I have a project at the prototype stage that I will be building (hopefully starting tonight) on a strip board. Thanks!
- 1wheel8 days ago
  It can be! Here's a circuit showing how the model processes "PCB tracing stands for" to output "printed":
  https://www.neuronpedia.org/gemma-2-2b/graph?slug=pcb-tracin...
- Henchman218 days ago
  Same here! Then I immediately thought: I wish people would stop misusing words followed by I guess I think I’m in charge of words now. Then, idly: I’m starting to resemble that “Old Man Yells at Cloud” meme
  Funny things, thoughts.
- AdamH121138 days ago
  Yeah, I actually have a decades-old two-layer board that I need to reproduce and I would love to be able to feed images of it into some sort of tool and have it generate a schematic (or at least a netlist) automatically.
  - duskwuff8 days ago
    It's not automatic, but one way I've seen people reverse-engineer PCBs (and ICs!) is to import scans of the subject in Kicad, then start tracing out the connections on screen.
- buescher8 days ago
  You and me both. The reverse engineering tools are out there even if most of the search results are AI slop that recommends common layout tools. If I really needed the work done though I'd just pay one of the overseas services and clean up from there.
- Archit3ch7 days ago
  I was excited for a moment.
- tacker20008 days ago
  haha same here
- mrheosuper8 days ago
  Same. Those AI bros keep stealing our terminology.
- asadm8 days ago
  ugh same here.
- forgotpwagain8 days ago
  thought this was about tracing neural circuits in the brain and was disappointed.
qtwhat8 days ago
Curious if we say "thank you", the model will be more activated and result in better answer. ^^