Should we be starting to prepare for the original HyperDX product to be deprecated and potentially move over to ClickStack?
HyperDX isn't being deprecated, you can probably see on the marketing page it's still really prominently featured as an integral part of the stack - so nothing changing there.
We do of course want to get users onto HyperDX v2 and the overall ClickStack pattern. This doesn't mean HyperDX is going away by any means - just that HyperDX is focused a lot more on the end-user experience, and we get to leverage the flexibility, learnings and performance of a more exposed ClickHouse-powered core which is the intent of ClickStack. On the engineering side, we're working on making sure it's a smooth path for both open source and cloud.
side note: weird I thought I replied to this one already but I've been dealing with spotty wifi today :)
Is HyperDX === ClickStack?
Is ClickStack = HyperDX + something closed source?
Is ClickStack just a cloud version of HyperDX?
Is it same thing, HyperDX, rebranded as ClickStack?
HyperDX v2, the version that is now stable and shipped in ClickStack, focuses more on the querying layer. It lets users have more customization around ClickHouse (virtually any schema, any deployment).
Optionally, users can leverage other ways of getting data into ClickHouse like Vector, S3, etc. but still use HyperDX v2 on top. Previously in HyperDX v1 you _had_ to use OTel and our ingestion pipeline and our schemas. This is no longer true in v2.
Let me know if this explanation helps
I'm just asking because you mention OTel and "other ways" in your post, and you must have a good overview over the options and where the market is headed.
My take is that OTel is overall the best investment, it's widely supported across the board by many companies and other vendors. It's also constantly being improved with interesting ideas like otel-arrow which will make it even more performant (and columnar friendly!)
We'll also continue invest in the OTel ecosystem ourselves in making it easier and easier to get started :)
That being said, I'm not saying that OTel collector is always the right choice, we want to meet users where they are. Some users have data that gets piped into S3 files and we ingest off of a S3 bucket just due to how they've collected data, some use Vector due to its flexibility with VRL, focus on logs, or specific integrations it provides out of the box. So the answer is always - it depends :) but I do like OTel and think the future is bright.
But in this case there's probably no reason for you :) These improvements will come to our cloud offering of course as we work on rolling out upgrades from HyperDX v1 to v2 in cloud.
I spent some time writing an integration for HyperDX after seeing this post and hope you can help me roll it out! Would love to add a new "integrations" section to my page that links to the docs on how to use HyperDX with LogLayer.
Does ClickStack have a way to ingest statsd data, preferably with Datadog extensions (which adds tagging)?
Does ClickStack offer correlations across traces, logging, and metrics via unified service tagging? Does the UI offer the ability to link to related traces, logging, and metrics?
Why does the Elixir sdk use the hyperdx library instead of the otel library?
Are Notebooks in the roadmap?
What about OTel metrics is difficult?
You can set up receivers for other metrics sources like stasd or even the DD agent, so there's no need to immediately replace your metrics stack.
I’d certainly appreciate hearing success stories of OTEL + serverless.
Sometimes, you just want to export ALL of your metrics to the server and let it deal with histograms, aggregation, etc.
Another annoyance is the API, you can't just put "metrics.AddMeasurement('my_metric', 11)", you have to create a `Meter` (which also requires a library name), and then use it.
OTel Metrics: I get it, it's specified as almost a superset of everyone's favorite metric standards with config for push/pull, monotonic vs delta, exponential/"native" histograms, etc. I have my preferences as well which would be a subset of the standard but I get why a unifying standard needed to be flexible.
Statsd: The great thing about the OTel collector is that it allows ingesting a variety of different data formats, so you can take in statsd and output OTel or write directly to ClickHouse: https://github.com/open-telemetry/opentelemetry-collector-co...
We correlate across trace/span id as well as resource attributes. The correlation across logs/traces with span/trace id is a pretty well worn path across our product. Metrics to the rest is natively done via resource attributes and we primarily expose correlation for K8s-based workloads with more to come. We don't do exemplars _yet_ to solve the more generic correlation case for metrics (though I don't think statsd can transmit exemplars)
Elixir: We try to do our best to support wherever our users are, the OTel SDK and ours have continued to change in parallel over time - we'll want to likely re-evaluate if we should start pointing towards the base OTel SDK for Elixir. We've been pretty early on the OTel SDK side across the board so things continue to evolve, for example our Deno OTel integration came out I think over a year before Deno officially launched one with native HyperDX documentation <3
Notebooks: Yes, it should land in an experimental state shortly, stay tuned :) There's a lot of exciting workflows we're looking to unlock with notebooks as well. If you have any thoughts in this direction, please let me know. I'd love to get more user input ahead of the first release.
I think this is enough features for me to seriously take a look at it as a Datadog alternative.
I'm primarily interested in logs, though, and the existing log shipping pipeline is around Vector on Kubernetes. Admittedly Vector has an OTel sink in beta, but I'm curious if that's the best/fastest way to ship logs, especially given that the original data comes out of apps as plain JSON rather than OTel.
The current system is processing several TB/day and needs fairly serious throughput to keep up.
Vector supports directly writing into ClickHouse - several companies use this at scale (iirc Anthropic does exactly this, they spoke about this recently at our user conference).
Please give it a try and let us know how it goes! Happy to help :)
We don't have any lock in to our ingestion pipeline or schema. Of course we optimize a lot for the OTel path, but it works perfectly fine without it too.
(The UI looks similar too, although I guess a lot of observability tools seem to adopt that kind of UI).
- We're flexible on top of any ClickHouse instance, you can use virtually any schema in ClickHouse and things will still work. Custom schemas are pretty important for either tuned high performance or once you're at a scale like Anthropic. This makes it also incredibly easy to get started (especially if you already have data in ClickHouse). - The above also means you don't need to buy into OTel. I love OTel but some companies choose to use Vector, Cribl, S3, a custom writing script, etc for good reasons. All of that is supported natively due to the various ClickHouse integrations, and naturally means you can use ClickStack/HyperDX in that scenario as well. - We also have some cool tools around wrangling telemetry at scale, from Event Deltas (high cardinality correlation between slow spans and normal spans to root cause issues) to Event Patterns (clustering similar logs or spans together automatically with ML) - all of these help users dive into their data in easier ways than just searching & charting. - We also have session replay capability - to truly unify everything from click to infra metrics.
We're built to work at the 100PB+ scale we run internally here for monitoring ClickHouse Cloud, but flexible enough to pin point specific user issues that get brought up once in a support case in an end-to-end manner.
There's probably a lot more I'm missing. Ultimately from a product philosophy standpoint, we aren't big believers in the "3 pillars" concept, which tends to manifest as 3 silos/tabs for "logs", "metrics", "traces" (this isn't just Signoz - but across the industry). I'm a big believer that we're building tools to unify and centralize signals/clues in one place and giving the right datapoint at the right time to the engineer. During an incident I just think about what's the next clue I can get to root cause an issue, not if I'm in the logging product or the tracing product.
I found the UX very difficult to read. The monospace font, the unusually small text, the bold white and bright green text on a dark background… I found it a little more readable by changing the font to system-ui, but not by much. Please consider a more traditional style instead of leaning into the 80s terminal gimmick. This factor alone makes me want to not use it. It needs to be easy to read, not a pain to read.
Is Clickhouse the only stateful part of this stack? Would love to see compatbility with Rotel[0], a Rust implementation of the OTEL collector, so that this becomes usable for serverless runtime environments.
One key thing Datadog has is their own proprietary alternative to the OTEL collector that is much more performant.
I've also been in touch with Mike & Ray for a bit, who've told me they've added ClickHouse support recently which makes the story even better :)
We're excited to test our Clickhouse integration with Clickstack, as we believe OTel and Clickhouse make for a powerful observability stack. Our open-source Rust OpenTelemetry collector is designed for high-performance, resource-constrained environments. We'd love for you to check it out!
The dashboards and their creation are intuitive. Creating alerts and things from airflow logs is easy using their DSL. Connecting and sending notifications to things like slack just works tm.
So this is how we justify the datadog costs because of all the engineering time (engineers are still expensive, ai hasn’t replaced us yet) it saves and how quickly we can move from raw logs and metrics to useful insights.
Beyond raw performance and cost effectiveness, which is quite important at scale, we work a lot on making sure the application layer itself is intuitive to use. You can always play around with what ours looks like at play.hyperdx.io :)
While I do like the stack we have, it is a lot of components to run and configure. Don’t think we have ever had any issues once it was up and running.
Does anyone have any thoughts about how this compares? We don’t have a huge amount of days, 1 month of metrics is about 200GB and logs isn’t a whole lot more, less than a TB I think for 2 weeks.
How would I self host this in k8s? Would I deploy a ClickHouse cluster using the Altinity operator and then connect it using the HyperDX local mode or what is the recommended approach to self-host ClickStack?
You see an explosion in offerings, and then eventually it's whittled down to a handful of survivors.
So they switch to Prometheus and Grafana and now have to manage a Prometheus cluster. Far cheaper, but far more annoying.
It doesn't help that typically most software is ancient, spits out heaps of stack traces and wall-of-text output, doesn't use structured logging and generally doesn't let itself be monitored easily.
So yeah, getting meaningful insights from a highly available observability stack will take some serious time and resources, and I can understand smaller companies just handing it over to a third party so they can get on with their core business (AKA easy billing).
We have this on the medium term roadmap to investigate proper support for a compatibilty layer such as ferret or more likely just using ClickHouse itself as the operational data store.
I've seen single traces over 100KB of absolute pure randomness encoded as base64... Because! Oh and also, we have to pay for the service, so it looks important.
Sure they tell you it is super helpful for debugging issues, but in a VERY large proportion of cases, it is 1) WAY too much, and 2) never used anyway. And most of the time what's interesting is the last 10 minutes of the debug version, you don't need a "service" for that.
/me gets down his horse :-)
Otherwise yes you can authenticate against the other versions with a email/password (really the email doesn't do anything in the open source distribution, just a user identifier but we keep there to be consistent)
Because right now without the message on HN here, I wouldn't know what "open source observability stack" meant when the webpage does not explain what HyperDX is, nor does it provide a link to it or its code. I was expecting the whole thing "Open Source Datadog" to be ClickStack Repo inside Clickhouse Github. Which is not found anywhere.
But other than that congrats!. I have long wondered why no one has built anything on top of Clickhouse for Datadog / New Relic competition.
The Clickhouse DB opened up the ocean of open source "Scalable" Web Analytics that wont previously available or possible. I am hoping we see this change again to observability platform as well.
We started building signoz as an OS alternative of Datadog/New Relic four years back and opentelemetry-native from day 1. We have shipped some good features on top of Opentelemetry and because of OTel's semantic conventions & our query builder, you can correlate any telemetry across signals.
That being said - as you've mentioned so many different "store tons of data" apps have been enabled from ClickHouse. Observability is at a point where it's in the same category of: ClickHouse can store a ton of data, OTel can help you collect/process it, and now we just need that analytics user experience layer to present it to the engineers that need an intuitive way to dive in to it all.