Show HN: ClickStack – Open-source Datadog alternative by ClickHouse and HyperDX(github.com)

227 pointsby mikeshi42a day ago19 comments

readdita day ago
I like and use HyperDX in production and like it a lot. So kudos to the team for building and merging with Clickhouse. I found a lot of monetary value switching over to HyperDX considering it's significantly more cost efficient for our needs.
Should we be starting to prepare for the original HyperDX product to be deprecated and potentially move over to ClickStack?
- mikeshi42a day ago
  First off, always really excited to hear from our production users - glad to hear you're getting good value out of the platform!
  HyperDX isn't being deprecated, you can probably see on the marketing page it's still really prominently featured as an integral part of the stack - so nothing changing there.
  We do of course want to get users onto HyperDX v2 and the overall ClickStack pattern. This doesn't mean HyperDX is going away by any means - just that HyperDX is focused a lot more on the end-user experience, and we get to leverage the flexibility, learnings and performance of a more exposed ClickHouse-powered core which is the intent of ClickStack. On the engineering side, we're working on making sure it's a smooth path for both open source and cloud.
  side note: weird I thought I replied to this one already but I've been dealing with spotty wifi today :)
  - HatchedLake72120 hours ago
    Still confused where HyperDX ends and where ClickStack starts.
    Is HyperDX === ClickStack?
    Is ClickStack = HyperDX + something closed source?
    Is ClickStack just a cloud version of HyperDX?
    Is it same thing, HyperDX, rebranded as ClickStack?
    mikeshi4217 hours ago
    This is good feedback to make things more clear :) HyperDX is part of ClickStack, so ClickStack = { HyperDX, ClickHouse, OTel }. This is the stack we recommend that will deploy in seconds and _just work_, and can scale up to PB+ and beyond as well with some additional effort (more than a few seconds unfortunately, but one day...)
    HyperDX v2, the version that is now stable and shipped in ClickStack, focuses more on the querying layer. It lets users have more customization around ClickHouse (virtually any schema, any deployment).
    Optionally, users can leverage other ways of getting data into ClickHouse like Vector, S3, etc. but still use HyperDX v2 on top. Previously in HyperDX v1 you _had_ to use OTel and our ingestion pipeline and our schemas. This is no longer true in v2.
    Let me know if this explanation helps
    wvh11 hours ago
    What's your opinion on OTel when trying to keep things small and performant? I've got some experience working with OTel the last few years, and I'm a bit afraid of the expanding scope and complexity compared to "simpler", more targeted solutions, like for instance Vector.
    I'm just asking because you mention OTel and "other ways" in your post, and you must have a good overview over the options and where the market is headed.
    mikeshi424 hours ago
    It's actually not clear to me that Vector is any simpler than OTel imo (VRL is way more complicated than OTTL for instance). You can also use otel collector builder (ocb) to build a slimmed binary.
    My take is that OTel is overall the best investment, it's widely supported across the board by many companies and other vendors. It's also constantly being improved with interesting ideas like otel-arrow which will make it even more performant (and columnar friendly!)
    We'll also continue invest in the OTel ecosystem ourselves in making it easier and easier to get started :)
    That being said, I'm not saying that OTel collector is always the right choice, we want to meet users where they are. Some users have data that gets piped into S3 files and we ingest off of a S3 bucket just due to how they've collected data, some use Vector due to its flexibility with VRL, focus on logs, or specific integrations it provides out of the box. So the answer is always - it depends :) but I do like OTel and think the future is bright.
    3dteemu12 hours ago
    I'm also a bit confused. I'm using HyperDX cloud and sending telemetry directly from NextJS. What's the benefit of using ClickStack compared to HyperDX cloud?
    mikeshi424 hours ago
    ClickStack is currently just open source - so there's no cloud or a fully hosted offering yet! (Of course you can always pair ClickStack with ClickHouse Cloud to have your ClickHouse hosted for you).
    But in this case there's probably no reason for you :) These improvements will come to our cloud offering of course as we work on rolling out upgrades from HyperDX v1 to v2 in cloud.
theogravity15 hours ago
This is really cool considering how expensive DataDog can get. I'm the author of LogLayer (https://loglayer.dev), which is a structured logger for TypeScript that allows you to use multiple loggers together. I've written transports that allows shipping to other loggers like pino and cloud providers such as DataDog.
I spent some time writing an integration for HyperDX after seeing this post and hope you can help me roll it out! Would love to add a new "integrations" section to my page that links to the docs on how to use HyperDX with LogLayer.
https://github.com/hyperdxio/hyperdx-js/pull/184
- wrn1489713 hours ago
  Hey this looks awesome! We will take a look at it
hosha day ago
I liked Otel for traces and maybe logging -- but I think the Otel metrics is over-engineered.
Does ClickStack have a way to ingest statsd data, preferably with Datadog extensions (which adds tagging)?
Does ClickStack offer correlations across traces, logging, and metrics via unified service tagging? Does the UI offer the ability to link to related traces, logging, and metrics?
Why does the Elixir sdk use the hyperdx library instead of the otel library?
Are Notebooks in the roadmap?
- phillipcartera day ago
  > but I think the Otel metrics is over-engineered.
  What about OTel metrics is difficult?
  You can set up receivers for other metrics sources like stasd or even the DD agent, so there's no need to immediately replace your metrics stack.
  - carefulfungia day ago
    My foray into otel with aws lambda was not a success (about 6 months ago). Many of my issues were with the prom remote writer that I had to use. The extension was not reliable. Queue errors were common in the remote writer. Interop with Prometheus labels was bad. And the various config around delta and non-delta metrics was a bit of a mess. The stack I was using at least didn’t support exponential histograms. Got it to work mostly after days of fiddling but never reliably. Ripped it out and was happier. Maybe a pure OTEL stack would have been a much better experience than needing the prom remote writer - which I’d like to try in the future.
    I’d certainly appreciate hearing success stories of OTEL + serverless.
  - cyberax19 hours ago
    One critical problem for me: no support for raw metrics.
    Sometimes, you just want to export ALL of your metrics to the server and let it deal with histograms, aggregation, etc.
    Another annoyance is the API, you can't just put "metrics.AddMeasurement('my_metric', 11)", you have to create a `Meter` (which also requires a library name), and then use it.
- mikeshi42a day ago
  Great questions!
  OTel Metrics: I get it, it's specified as almost a superset of everyone's favorite metric standards with config for push/pull, monotonic vs delta, exponential/"native" histograms, etc. I have my preferences as well which would be a subset of the standard but I get why a unifying standard needed to be flexible.
  Statsd: The great thing about the OTel collector is that it allows ingesting a variety of different data formats, so you can take in statsd and output OTel or write directly to ClickHouse: https://github.com/open-telemetry/opentelemetry-collector-co...
  We correlate across trace/span id as well as resource attributes. The correlation across logs/traces with span/trace id is a pretty well worn path across our product. Metrics to the rest is natively done via resource attributes and we primarily expose correlation for K8s-based workloads with more to come. We don't do exemplars _yet_ to solve the more generic correlation case for metrics (though I don't think statsd can transmit exemplars)
  Elixir: We try to do our best to support wherever our users are, the OTel SDK and ours have continued to change in parallel over time - we'll want to likely re-evaluate if we should start pointing towards the base OTel SDK for Elixir. We've been pretty early on the OTel SDK side across the board so things continue to evolve, for example our Deno OTel integration came out I think over a year before Deno officially launched one with native HyperDX documentation <3
  Notebooks: Yes, it should land in an experimental state shortly, stay tuned :) There's a lot of exciting workflows we're looking to unlock with notebooks as well. If you have any thoughts in this direction, please let me know. I'd love to get more user input ahead of the first release.
  - hosh16 hours ago
    Thank you. I saw a different thread about Otel statsd receiver, so that works out better. The last time I had looked into it, the otel metrics specs were very complex.
    I think this is enough features for me to seriously take a look at it as a Datadog alternative.
atombender21 hours ago
I'm looking for a new logging solution to replace Kibana. I have very good experience with ClickHouse, and HyperDX looks like a decent UI for it.
I'm primarily interested in logs, though, and the existing log shipping pipeline is around Vector on Kubernetes. Admittedly Vector has an OTel sink in beta, but I'm curious if that's the best/fastest way to ship logs, especially given that the original data comes out of apps as plain JSON rather than OTel.
The current system is processing several TB/day and needs fairly serious throughput to keep up.
- mikeshi4221 hours ago
  Luckily ClickHouse and serious throughput are pretty synonymous. Internally we're at 100+PB of telemetry stored in our own monitoring system.
  Vector supports directly writing into ClickHouse - several companies use this at scale (iirc Anthropic does exactly this, they spoke about this recently at our user conference).
  Please give it a try and let us know how it goes! Happy to help :)
  - atombender21 hours ago
    Thanks! Very familiar with ClickHouse, but can logs then be ingested into CH without going through HyperDX? Doesn't HyperDX require a specific schema that the Vector pipeline would have to adapt the payloads to?
    mikeshi4221 hours ago
    Nope! We're virtually schema agnostic, you can map your custom schema to observability concepts (ex. the SQL expression for TraceID, either a column or a full function/expression will work).
    We don't have any lock in to our ingestion pipeline or schema. Of course we optimize a lot for the OTel path, but it works perfectly fine without it too.
    atombender21 hours ago
    That's great to hear. I will take a closer look ASAP.
- smetj13 hours ago
  I think settling to otel as transport/wire-format is an excellent strategic choice offering most possibilities towards the future. Two concerns less.
  - atombender9 hours ago
    I'm less concerned about the wire format than reducing complexity and bottlenecks in a high-volume, high-throughput system. Needing an intermediate API just to ingest into ClickHouse adds another step where things can slow down or break, not to mention that a gRPC API just to convert JSON payloads into INSERTs is quite wasteful if you can just insert directly.
wiradikusuma13 hours ago
Can I say it's similar to Signoz, in that both are ClickHouse-powered and available as both open-source and hosted versions? How are you guys different compared to Signoz?
(The UI looks similar too, although I guess a lot of observability tools seem to adopt that kind of UI).
- oulipo11 hours ago
  Interested in a comparison between both too!
codegeeka day ago
How are you different than Signoz, another YC company that also does Observability using clickhouse ?
- mikeshi4221 hours ago
  Echoing the comment below, I guess one obvious thing is that we are a team at ClickHouse and an official first-party product on top. That translates into:
  - We're flexible on top of any ClickHouse instance, you can use virtually any schema in ClickHouse and things will still work. Custom schemas are pretty important for either tuned high performance or once you're at a scale like Anthropic. This makes it also incredibly easy to get started (especially if you already have data in ClickHouse). - The above also means you don't need to buy into OTel. I love OTel but some companies choose to use Vector, Cribl, S3, a custom writing script, etc for good reasons. All of that is supported natively due to the various ClickHouse integrations, and naturally means you can use ClickStack/HyperDX in that scenario as well. - We also have some cool tools around wrangling telemetry at scale, from Event Deltas (high cardinality correlation between slow spans and normal spans to root cause issues) to Event Patterns (clustering similar logs or spans together automatically with ML) - all of these help users dive into their data in easier ways than just searching & charting. - We also have session replay capability - to truly unify everything from click to infra metrics.
  We're built to work at the 100PB+ scale we run internally here for monitoring ClickHouse Cloud, but flexible enough to pin point specific user issues that get brought up once in a support case in an end-to-end manner.
  There's probably a lot more I'm missing. Ultimately from a product philosophy standpoint, we aren't big believers in the "3 pillars" concept, which tends to manifest as 3 silos/tabs for "logs", "metrics", "traces" (this isn't just Signoz - but across the industry). I'm a big believer that we're building tools to unify and centralize signals/clues in one place and giving the right datapoint at the right time to the engineer. During an incident I just think about what's the next clue I can get to root cause an issue, not if I'm in the logging product or the tracing product.
- oatsandsugara day ago
  "You" here is ClickHouse
  - codegeek4 hours ago
    Yes but that is because they got acquired by Clickhouse. But my question still remains.
JimDabell11 hours ago
I’m not sure what this is intended to do, but when I created an account, I saw in the left sidebar a widget saying “Was this search result helpful?” with thumbs up and thumbs down buttons. I hadn’t searched for anything. I pressed the “Hide” button instead, and the widget changed to an “Any feedback?” button. I thought I would tell you about this weird bug, so I clicked the feedback button. The widget changed back into the “Was this search result helpful?” widget.
I found the UX very difficult to read. The monospace font, the unusually small text, the bold white and bright green text on a dark background… I found it a little more readable by changing the font to system-ui, but not by much. Please consider a more traditional style instead of leaning into the 80s terminal gimmick. This factor alone makes me want to not use it. It needs to be easy to read, not a pain to read.
bilalqa day ago
This is really interesting.
Is Clickhouse the only stateful part of this stack? Would love to see compatbility with Rotel[0], a Rust implementation of the OTEL collector, so that this becomes usable for serverless runtime environments.
One key thing Datadog has is their own proprietary alternative to the OTEL collector that is much more performant.
[0]: https://github.com/streamfold/rotel
- mikeshi42a day ago
  I agree - rotel seems like a really good fit for a lightweight lambda integration for OTel, it of course should work already since we stand up an OTel ingest endpoint so it should be seamless to send data over! (Kind of the beauty of OTel of course)
  I've also been in touch with Mike & Ray for a bit, who've told me they've added ClickHouse support recently which makes the story even better :)
  - mike_heffner21 hours ago
    Hi all — one of the authors of Rotel here. Thanks for the kind words, Bilal and Michael.
    We're excited to test our Clickhouse integration with Clickstack, as we believe OTel and Clickhouse make for a powerful observability stack. Our open-source Rust OpenTelemetry collector is designed for high-performance, resource-constrained environments. We'd love for you to check it out!
    smetj13 hours ago
    wow didn't know about rotel ... looks very interesting indeed. Especially those python bindings ... Bookmarked!
gigatexal15 hours ago
Datadog is expensive this is true. But I have never felt it be slow. Speed is not its killer feature. It’s everything you can do with it once you have logs and or metrics flowing into it.
The dashboards and their creation are intuitive. Creating alerts and things from airflow logs is easy using their DSL. Connecting and sending notifications to things like slack just works tm.
So this is how we justify the datadog costs because of all the engineering time (engineers are still expensive, ai hasn’t replaced us yet) it saves and how quickly we can move from raw logs and metrics to useful insights.
- mikeshi4213 hours ago
  Totally agree - you use an observability tool because it answers your questions quickly, not just return searches quickly.
  Beyond raw performance and cost effectiveness, which is quite important at scale, we work a lot on making sure the application layer itself is intuitive to use. You can always play around with what ours looks like at play.hyperdx.io :)
regnerba13 hours ago
We run a full Grafana stack (Loki, tempo, Prometheus, alloy agent, Grafana) and back out with self hosted S3 (we are all onprem physical hardware).
While I do like the stack we have, it is a lot of components to run and configure. Don’t think we have ever had any issues once it was up and running.
Does anyone have any thoughts about how this compares? We don’t have a huge amount of days, 1 month of metrics is about 200GB and logs isn’t a whole lot more, less than a TB I think for 2 weeks.
dustedcodes8 hours ago
Very cool, reminds me of SigNoz.
How would I self host this in k8s? Would I deploy a ClickHouse cluster using the Altinity operator and then connect it using the HyperDX local mode or what is the recommended approach to self-host ClickStack?
user3939382a day ago
There’s so many of these log aggregators I’ve completely lost track. I used Datadog extensively and found it overpriced and a very confusing UI.
- RhodesianHuntera day ago
  That's what happens when there's a need for something.
  You see an explosion in offerings, and then eventually it's whittled down to a handful of survivors.
- secondcoming21 hours ago
  Everyone has found Datadog to be overpriced!
  So they switch to Prometheus and Grafana and now have to manage a Prometheus cluster. Far cheaper, but far more annoying.
  - wvh10 hours ago
    I have no experience with Datadog, but I'm not sure "cheaper" is an easy adjective to quantify. The whole metrics/logs/traces thing in Kubernetes is still painful, a lot of work and there's no end to the confusion. After several years in the trenches, it still takes me longer (i.e. more money) to install, configure and make sense of a monitoring stack than to set up the software it is monitoring.
    It doesn't help that typically most software is ancient, spits out heaps of stack traces and wall-of-text output, doesn't use structured logging and generally doesn't let itself be monitored easily.
    So yeah, getting meaningful insights from a highly available observability stack will take some serious time and resources, and I can understand smaller companies just handing it over to a third party so they can get on with their core business (AKA easy billing).
- landl0rd19 hours ago
  Datadog is a good product but one of the most blatantly overpriced things I’ve had the displeasure to use.
ensignavenger21 hours ago
Really interesting, Unfortunately, it looks like HyperDX depends on Mongo? I wonder if there are any open source document stores (possibly a mongo compatible one)( that could work with it?
- ensignavenger19 hours ago
  FerretDB looks like a great alternative, thanks! I'll be keeping Ferret and ClickStack on my radar!
- mikeshi4221 hours ago
  In theory you should be able to try using FerretDB for example.
  We have this on the medium term roadmap to investigate proper support for a compatibilty layer such as ferret or more likely just using ClickHouse itself as the operational data store.
  - ptrfarkas21 hours ago
    FerretDB maintainer here - we'll be looking at this
    wrn1489714 hours ago
    Hey, I'm a maintainer of HyperDX. I'd love to chat with you about a potential collaboration. we're planning to migrate off mongodb. Please reach out to me on Discord (warren)
    mikeshi4220 hours ago
    That'd be awesome! Ferret has been on my radar for a while now :) If you want to chat with us on Discord: https://hyperdx.io/discord
buserrora day ago
I am absolutely amazed at the amount of garbage being "logged", enough that it is not just a huge business, but also one of the primary task for some devops guys. It's like a goal in itself, you have a look at the output and it is absolutely scary, HUGE messages being "logged" for purpose unknown.
I've seen single traces over 100KB of absolute pure randomness encoded as base64... Because! Oh and also, we have to pay for the service, so it looks important.
Sure they tell you it is super helpful for debugging issues, but in a VERY large proportion of cases, it is 1) WAY too much, and 2) never used anyway. And most of the time what's interesting is the last 10 minutes of the debug version, you don't need a "service" for that.
/me gets down his horse :-)
- smetj13 hours ago
  I totally agree with this. Same for metrics.
- metta2uall15 hours ago
  I think you're at least partially right - not everything but a lot of data is not useful - wasting money, bandwidth, electricity, etc. There should be more dynamic controls over what gets logged/filtered at the client-side..
SOLAR_FIELDS21 hours ago
Comparison to the other player in this space, Signoz? Also uses clickhouse as backend
ah2718217 hours ago
Do i need to sign-in when using the docker container?
- mikeshi4217 hours ago
  There's a version that we call local mode which is intended for engineers using it as part of their local debugging workflow: https://clickhouse.com/docs/use-cases/observability/clicksta...
  Otherwise yes you can authenticate against the other versions with a email/password (really the email doesn't do anything in the open source distribution, just a user identifier but we keep there to be consistent)
Immortalina day ago
I remember back in the day Mike was building Huggingface before Huggingface was a thing. He was ahead of his time. It's a pity model depot is no longer around.
- mikeshi42a day ago
  Wow this is an incredible throwback! Can't believe your memory is this good. It's quite funny and I totally agree - I met the Gradio founders in an accelerator (when they were just getting started) after we shut down ModelDepot - and they of course ended up getting acquired into Hugging Face. It's funny how things end up sometimes :)
kseca day ago
It would have even much better if the link was pointing to https://github.com/hyperdxio/hyperdx the actual source code.
Because right now without the message on HN here, I wouldn't know what "open source observability stack" meant when the webpage does not explain what HyperDX is, nor does it provide a link to it or its code. I was expecting the whole thing "Open Source Datadog" to be ClickStack Repo inside Clickhouse Github. Which is not found anywhere.
But other than that congrats!. I have long wondered why no one has built anything on top of Clickhouse for Datadog / New Relic competition.
The Clickhouse DB opened up the ocean of open source "Scalable" Web Analytics that wont previously available or possible. I am hoping we see this change again to observability platform as well.
- ankit01-oss17 hours ago
  check out SigNoz: https://github.com/SigNoz/signoz
  We started building signoz as an OS alternative of Datadog/New Relic four years back and opentelemetry-native from day 1. We have shipped some good features on top of Opentelemetry and because of OTel's semantic conventions & our query builder, you can correlate any telemetry across signals.
- mikeshi42a day ago
  Hey that's a good point on the link! Not something I can change now unfortunately, I was hoping having it near the top of the text post would help too for those that wanted to dig in more :)
  That being said - as you've mentioned so many different "store tons of data" apps have been enabled from ClickHouse. Observability is at a point where it's in the same category of: ClickHouse can store a ton of data, OTel can help you collect/process it, and now we just need that analytics user experience layer to present it to the engineers that need an intuitive way to dive in to it all.
- sirfza day ago
  SigNoz is a dd/nr alternative built on clickhouse that I know of
- cbhla day ago
  Looks like it is pointing there now; old link was https://clickhouse.com/use-cases/observability for posterity
- a day ago
  undefined
ewuhica day ago
[flagged]
- a day ago
  undefined
- verdverma day ago
  Agree, the LGTM + Alloy stack is quite nice imho, and something that should be compared against if one is offering an observability stack