We also plan to convert structured logs into OpenTelemetry attributes [2].
[1] https://demo.coroot.com/p/tbuzvelk/applications/default:Depl... [2] https://github.com/coroot/coroot/issues/490
Couple of questions:
What's the overhead of tracing + logging observed by users? I see many tools being built on top of the OpenTelemetry eBPF tracer, which is nice to see.
The OpenTelemetry eBPF tracer uses sampling to capture traces. Do other types of logging in the tool use sampling as well (HTTP traces)?
When finding SLO violations, can this tool find the bug if the latency spikes do not happen frequently (ie, latency spikes happens every 5minutes - 1hour)? I'm curious if the team have had experienced such events and even if those pmax latencies matter to customers since it may not happen frequently.
I see that the flamegraph is a CPU flamegraph - does off-cpu sampling matter (Disk/Network, etc...)? Or does the CPU flamegraph provide enough for developers to solve the issue?
2. Coroot’s agent captures pseudo-traces (individual spans) and sends them to a collector via OTLP. This stream can be sampled at the collector level. In high-load environments, you can disable span capturing entirely and rely solely on eBPF-based metrics for analysis.
3. We’ve built automated root cause analysis to help users explain even the slightest anomalies, whether or not SLOs are violated. Under the hood, it traverses the service dependency graph and correlates metrics — for example, linking increased service latency to CPU delay or network latency to a database. [2]
4. Currently, Coroot doesn’t support off-CPU profiling. The profiler we use under the hood is based on Grafana Pyroscope’s eBPF implementation, which focuses on CPU time.
[1]: https://docs.coroot.com/installation/performance-impact [2]: https://demo.coroot.com/p/tbuzvelk/anomalies/default:Deploym...
From a user perspective, having several tools that overlap heavily but differ in subtle ways makes evaluation and adoption harder. It feels like if any two of these projects consolidated, they’d have a good shot at becoming the "default" eBPF observability solution.
At Coroot, we use eBPF for a couple of reasons:
1. To get the data we actually need, not just whatever happens to be exposed by the app or OS.
2. To make integration fast and automatic for users.
And let’s be real, if all the right data were already available, we wouldn’t be writing all this complicated eBPF code in the first place:)
- Accurate distributed traces with eBPF, including context propagation. Without going into other tools, I highly recommend trying to generate distributed traces using any other eBPF solution and observing the results firsthand.
- We are agent-only. Our data is produced in OpenTelemetry format, allowing you to integrate it seamlessly with your existing observability system.
I hope this clarifies the differences.
Can i use Coroot to show my existing data, without it taking control of my DDL?
Basically anywhere you'd previously need to write a kernel module but now can have user space run arbitrary kernel code that's secure and won't crash the kernel.
You can also now write custom schedulers in eBPF with sched_ext.