Show HN: CodeTracer – A time-traveling debugger implemented in Nim and Rust(github.com)

334 pointsby alehander424 months ago25 comments

JoeAltmaier4 months ago
Neat!
Years and years ago I had the opportunity to give Intel processor designers (the time of the 386!) requests for features.
I requested a system tick timer for stamping logs (they did that), bus mask and value registers that triggered a debug interrupt on a match (they did that).
And a jump source history. Maybe 10 jumps back. So on a breakpoint you could figure out how you got there. A time travelling debug feature.
At this point Intel sold an expensive debug probe for recording the bus, you plugged this insane cable into the processor socket and it actually executed in their external hardware, recording every instruction.
My jmp history would have replaced much of that, obviating it's need for the vast majority of users.
Ah well, it didn't happen. So now we all rebuild code 'debug' so we can add tracing and tracking, disrupting the execution path, changing timing and code size and on and on.
I always regretted not getting that.
- __init4 months ago
  Intel x86 cores have had Last Branch Records (LBRs) and Branch Trace Store (BTS) since at least Merom in 2006 [1][2]. Nowadays, there's Processor Trace (PT) or Precise Event-Based Sampling (PEBS) which can provide even more information. PT in particular is almost purpose-built to enable this kind of trace reconstruction.
  [1] https://stackoverflow.com/questions/14670586/what-is-the-ove...
  [2] The MSRs for LBRs (MSR_LASTBRANCH_*_{TO,FROM}_IP) and BTS (IA32_DS_AREA) are described in Volume 4, Section 2.2 of the SDM: "MSRS IN THE INTEL® CORE™ 2 PROCESSOR FAMILY". Core 2 was launched in 2006.
  - Veserv4 months ago
    Those are only sufficient to produce a execution trace rather than a full data trace which is needed for time-travel debugging.
    Though they are sufficient to do what the person you responded to asked for which is just execution trace.
- phire4 months ago
  I'm not entirely surprised, adding 40 bytes of SRAM to the 386 for a debug-only feature would have been hard to justify.
cxie4 months ago
The Noir support makes sense given its use in ZK proofs where execution tracing is particularly valuable, but I'm really looking forward to the Python and Ruby implementations. Those languages' dynamic nature makes bugs particularly elusive sometimes.
Has anyone here tried using this with Noir yet? I'm curious about the performance overhead of the tracing mechanism, especially for longer-running programs. Also wondering if there are plans to support JavaScript/TypeScript for web development use cases.
- cxie4 months ago
  The planned RR recordings integration is what I'm most excited about though. Having this capability for systems languages like Rust and C++ would be transformative for complex debugging scenarios where you're often forced to restart debugging sessions from scratch after stepping past a crucial point.
  - alehander424 months ago
    the support for system languages (the rr integration "backend") is currently closed source.
    It's not ready yet, and it might be proprietary: it would be great if we can open source it, if we find a sustainable business model for that
  - Apofis4 months ago
    This happens all the time and is super irksome. Being able to step backwards as well as forwards is super cool. Also, being able to do that with a loop using a slider is cool.
    cxie4 months ago
    I need a VSCode extension for this. But alas, it's just sitting in their roadmap... Typical. Guess I'll have to roll up my sleeves and build one myself. Not like I have enough on my plate already. At least their trace files are in an open format, so it shouldn't be impossible to hook into the VS Code debugging API.
    alehander424 months ago
    We'd love additional contributors! We also have some more detailed plans for such an extension. If you're interested in chatting about it, you can join our discord[1] (or we can expand here/in a github issue as well)
    1: https://discord.com/invite/aH5WTMnKHT
slifin4 months ago
If you are using Clojure or ClojureScript check out FlowStorm:
https://www.flow-storm.org/
- udkl4 months ago
  https://github.com/coekie/flowtracker for Java
- alehander424 months ago
  Very impressive! Lisp people are always good at tooling.
dloss4 months ago
Noir is a Domain Specific Language for SNARK proving systems. https://noir-lang.org/
- 01HNNWZ0MV43FF4 months ago
  I see, I see. And what is a SNARK proving system?
  - michaelsbradley4 months ago
    This paper is a great resource if you're unfamiliar with zk-SNARK and how it works:
    Why and How zk-SNARK Works (2019)
    https://arxiv.org/abs/1906.07221
  - conradludgate4 months ago
    "Succinct Non-interactive Arguments of Knowledge", it's a system for zero-knowledge proofs, which allow proving a fact of some kind without disclosing the inputs
- 4 months ago
  undefined
throw-the-towel4 months ago
Just out of curiosity, why did you use two languages to write CodeTracer and not just one of them?
- alehander424 months ago
  Nim is the original language we use. Zahary is a prolific contributor to Nim, and we have a good relationship with the Nim community, they've helped a lot!
  Nim and some Python are used for our closed source rr backend currently, and the frontend is written in Nim (compiling to JavaScript).
  The backend for blockchain and scripting language which is open sourced, is newer and we used Rust there for several reasons. One of them is related to the fact that many blockchain languages are implemented in Rust and this makes it easier to interoperate/contribute. There are other aspects as well: both languages have pros and cons.
  Some pros of Nim are e.g. it's metaprogramming support; the ability to share easily code/types between backend and frontend(it's an alternative to both e.g. C++/Go and TypeScript for us).
  We're thankful to both language communities!
rubenvanwyk4 months ago
Looks really cool, but in production systems, won't the trace files proliferate at extreme speed? How would you correlate the files to a certain session for user identification for example?
- alehander424 months ago
  We are also planning to develop a distributed tracing platform, similar to Jaeger and OpenTelemetry, that continuously records the execution of many distributed processes (e.g. micro-services).
  Unlike the existing platforms, which capture only message flows and require you to make educated guesses when some anomaly is observed, our system will let you accurately replay the processing code for each message to quickly identify the root cause for the anomaly.
  This would rely on our ability to jump to the specific moment in time when a certain incoming message starts being processed. This moment can be identified either by a log line with a specific format or by a call to some special tracking function (e.g. track_incoming_message(request_id)).
  For the system languages, the RR[1] recordings try to be practical by capturing only the non-deterministic events in the program execution. You can pair this with a ring buffer that discards the data after a certain retention period.
  For the scripting languages(or any implementation using the db-like traces) we might add some advanced record filtering options.
  (But maybe we are misunderstanding the question?)
  1: https://rr-project.org/
  - Veserv4 months ago
    You can not just discard the oldest data of a long-running execution trace when doing replay-based time-travel debugging.
    You can not replay execution without a known state followed by all non-determinism after that state which is most easily done by starting from the initial state. To discard data, you need to manifest a state snapshot corresponding to that time to enable forward reconstruction from that state.
    alehander424 months ago
    you're right: in the RR case: currently this is not merged yet, but a RR contributor works on persistent checkpoints; they can act as snapshots
- kreco4 months ago
  Especially since the trace files are in .json. [0]
  [0] https://github.com/metacraft-labs/runtime_tracing#format
  - alehander424 months ago
    True! The next major version of the format should use a more optimized format, as mentioned.
    However, some of the important optimizations, that we're preparing are not related so much to the format, but to record more specific things and reconstruct more in the postprocessing.
pzo4 months ago
I love it, I always wished for something like that. Will try to later test with python. Wish there was also for JS/TS. As for rr debugger did it got by now any support for MacOS or Windows or Android? I'm also wondering how heavy are those recording for typical apps.
This would be also great for LLM to give some context via MCP server or even let LLM pick what variables history wanna see instead of giving full recording file.
Also nice would be some recording filtering that you wanna e.g. pick few variables and display history during whole execution maybe with some specific formating and maybe even for some numeric variables or like tensors, images, etc pass to rerun for visual debugging so you can see plot it
- alehander424 months ago
  Thank you!
  The Python initial prototype is not yet finished. It's easy to play with, so anyone interested can actually work on it! Currently, in the experimental tracers, Ruby is usable for smaller programs, so one can try Ruby immediately.
  I do plan on improving some of the prototypes, and on adding additional ones: for Lua, but JavaScript: e.g. v8 is also a good target. Scripting language users that find it useful, are welcome to discuss/chat with us, or even directly contribute or propose support for new languages.
  A form of record filtering is planned indeed.
  We have experimented with automatic chart visualizations of some things, we've planned custom visual representation as well, great to see interest in those
- anougaret4 months ago
  I just released a JS/TS/Python time travel debugger that overlays variable values on top of vscode. It's just a npm or pip install and VsCode extension so it might be easier to use for you: https://github.com/dedale-dev/ariana I'm also planning to add MCP integration today actually!
- vlovich1234 months ago
  > Wish there was also for JS/TS.
  There’s Replay for browsers and Wallaby for Node.
forrestthewoods4 months ago
Windows support? What languages? How does it work?
I don’t like that the headline is “designed to support multiple languages” but it only actually supports an obscure language I’ve never heard of. Feels like a bait and switch.
- alehander424 months ago
  We're working on Windows support for the scripting and blockchain languages.
  I am sorry if the headline felt misleading or the current support disappointing: we do have experimental Ruby support, that you can try right now if you record a `<somepath>.rb` program.
  We do design the frontend, trace format/lib and backends to support multiple languages. Ruby is already having experimental support, and we try to keep various other languages/usecases in mind. We hope to find a model that lets us work more on supporting many more scripting languages. We'd also love contributors/the community adding support for languages or codetracer itself!
  We also do have a closed source backend based on RR[1] that has partial C/Rust/Nim support, but it is not yet ready. It might be released as a proprietary solution. (However if we find an alternative sustainable business model, it would be great to be able to open source it.)
  The scripting/blockchain languages backend is more db-like: it collects a trace by hooking in tracing API-s or instrumenting/patching vm-s (the trace is later postprocessed before replay).
  The system languages backend is based on RR[1] recordings currently.
  We'd be happy to discuss more usecases or languages!
  1: https://rr-project.org/
esafak4 months ago
Thank you for building up the nim ecosystem.
- hugs4 months ago
  came here to say the same. the big problem with nim is not enough people use it. and the way to fix it is with more people using it. (classic catch-22). i struggle with this myself, i might be hiring soon and i know it's going to be hard to find nim programmers. current plan is to recruit python and js developers who wouldn't mind also coding in nim when we'd otherwise need to drop down to the c/c++ layer to integrate with some low-level library for speed/efficiency.
elcritch4 months ago
Very excited for this! I donated on open collective already. The team is full of talented people. A nice interface to time travel debugging, with Nim support soon nonetheless.
Though if it uses rr it won’t be able to run on macOS. Bummer, macOS seems to get harder and debug on. Luckily lima vms make it easy to remote :/
tester7564 months ago
Hi alehander42
I've been searching for something like this, so my question is
I have almost identical program in version 1 and 1.01 and I need to find how their behaviour changed
So, I run both of them ./binary1.exe input.txt ./binary2 input.txt and record their execution with your tool
And now, I'd want to extract such data from your tool:
Visited functions and how locals were changing. e.g
int test(int n) {
```
    n++;

    std::cout << n << std::endl

    n += 15;

    if (n > 22)
    {
      n--;
    }
    return n + 1;
```
}
int main(int argc, char* argv) {
```
    auto result = test(argv[1]);

    std::cout << result << std::endl
```
}
Visited Function: main with arguments (argc: 2, argv ["path", 7])
Visited Function test with arguments (n: 7)
test: N set to 8
test: N set to 23
test: Entered If (n > 22):
N set to 22
Exit If (n > 22)
test: N set to 23
Exit function test
Main: result set to 23
Exit function main
Can I achieve it with your tool / recording data format?
muizelaar4 months ago
How does the implementation compare to RR?
- alehander424 months ago
  We are building our future support for the system languages for now directly on top of RR recordings: credit to Robert(roca) and Kyle and all other contributors for RR and Pernosco, they're amazing technologies.
  We've researched possible alternative approaches/tools as well, especially keeping in mind Windows/Mac support.
  The traces for Noir and the scripting languages work in a completely different way, capturing all the relevant data which is later indexed into a db-like structure. With some future optimizations this can be very useful for various shorter programs in scripting languages, and generally for blockchain languages(as the running time there is usually low) and we hope that eventually with flexible record filtering it can be practical even for capturing important segments/aspects of long-running real world projects.
kingforaday4 months ago
Congrats on the release! Looks like you have done a great job so far. Doesn't fit a need for me at the moment, but I will keep an eye out for the alternative back-end evolution and additional programming language support. Thanks!
profstasiak4 months ago
For frontend there is https://www.replay.io/
I loved debugging with that when I was working on react web app.
Wish there was something like this for react native :(
dinnertime4 months ago
Congrats on your release!
One question I have is, how exactly does it record and what are the boundaries of the recording?
For example does it only record the userland execution of a single process, or does it have broader boundaries like including kernel code and/or execution of multiple processes? How does it handle shared memory regions that may be modified outside of the recording?
- alehander424 months ago
  We are currently working on two "backends" where recording works in different ways.
  For the scripting languages and smart contract/ZK languages, we instrument the interpreters using high level hooking API-s or direct patches, and we produce a trace.
  For system languages, we directly build on top of RR[1] recordings for now: RR can record multiple processes, and it works in userland. IIRC it doesn't support modifications of shared memory outside of the recording. It's very well documented in their paper: Engineering Record And Replay For Deployability: Extended Technical Report[2].
  1: https://rr-project.org/
  2: https://arxiv.org/abs/1705.05937
Apofis4 months ago
Any chance of something like this being available for Java/Kotlin on JVM?
- aoli-al4 months ago
  https://github.com/cmu-pasta/fray
  Fray is a controlled concurrency testing tool for the JVM that supports record and replay. It could be a perfect backend for codetracer. (I'm the author of Fray)
- bramhaag4 months ago
  https://undo.io/ supports Java
  - Apofis4 months ago
    You don't see software going for $8000 frequently! Wow!
jv222224 months ago
Would love this for JavaScript if that was even possible.
Alifatisk4 months ago
Even support for D? Wow they thought of everything
- alehander424 months ago
  A contributor to the D language helped us for many aspects of the project. He would really like to see D support haha!
optymizer4 months ago
Is there anything out there for Android?
nextn4 months ago
How does it record?
anougaret4 months ago
[dead]
fdsafd4 months ago
[dead]
sizediterable4 months ago
[flagged]
- 4 months ago
  undefined
fdasdffda4 months ago
[flagged]
jedisct14 months ago
Because in an HN title, the language a tool is written in or the kind of music the author listens to matters more than what the tool actually does.
- esafak4 months ago
  You can appreciate it for building up good, under-supported languages like nim, zig, and D.