Show HN: Squey, an open-source GPU-accelerated data visualization software(squey.org)

66 pointsby jbleonesio9 months ago6 comments

jmakov9 months ago
10GB, 1TB, 100TB? Memory mapping or does it need to fit into memory (RAM, VRAM?)? Is streaming supported - can I point to a 100TB dataset and cruise through it? 1 parquet file or parquet dataset? What about Delta lake? Are outliers drawn or are you doing some sort of sampling/smoothing? Also would be great to have some comparison to similar tools in this space e.g. https://github.com/finos/perspective and HvPlot+Datashader.
- jbleonesio9 months ago
  Data needs to fit in RAM and graphics in VRAM. Let's say 100GB or more if you filter some rows during import. Data is ingested in a in-house database designed to refresh the ever changing selected rows as quickly as possible to conduct a true investigation. You can load as many parquet files as you want in one go provided they have the same structure. Any outlier in any visual representation will be drawn as this is a requirement to detect weak signals and anomalies
  Comparisons with the tools you mentioned would indeed be interesting, writing a blog post would be a good idea I guess! I wrote a comparison with ELK here : https://squey.org/domains/cybersecurity/pentesteracademy-mac...
macros9 months ago
Neat tool.
Couldn't find anything in the docs on mapping file sources to resource needs on the host, how much is too much data to dump into the tool on a single workstation?
- jbleonesio9 months ago
  Thanks!
  It depends on the number of rows/columns and the types of the values, but the application displays a dialog asking you if you want to stop the import before completion when it feels like resources are being exhausted.
  The software was specifically developed to be able to handle as much data as possible while remaining responsive so the workstation resources will likely be the bottleneck here.
  On my 32GB development machine, I can easily load tens of millions rows with tens of columns.
JacobiX9 months ago
Impressive project, judging by commits and features, it's clear that significant effort has been poured into this :) Unfortunately, there's no specific MacOS installation method provided, unsure if buildable from source ?
- jbleonesio9 months ago
  Thanks for your feedback. Unfortunately there is currently only a Linux build (which happens to also be running under Windows thanks to WSL2) because there is a lot of dependencies[1] to build. Any help to implement a MacOS build would of course be warmly welcomed :)
  In the meantime, you can deploy the software from AWS Marketplace[2] and use it through your web browser but note that this is an on-demand paying product.
  [1]: https://gitlab.com/squey/squey/-/tree/main/buildstream/eleme...
  [2]: https://aws.amazon.com/marketplace/pp/prodview-l363lrih42bhm
  - JacobiX9 months ago
    Thank you, I'll look into it more closely, fortunately it builds using CMake / Clang, and cross-platform libs ... might be possible to port it to MacOS after some tweaks.
    jbleonesio9 months ago
    We are using BuildStream to export a flatpak application but the build system is indeed CMake with both Clang and GCC compiling the project without warnings.
    Feel free to open an issue[1] on the project repository to further discuss about a MacOS port :)
    [1]: https://gitlab.com/squey/squey/-/issues/new
bbor9 months ago
Very cool, and it’s already on version five! I’m impressed. Only one question for now, since I’m don’t yet have experience with these specific data viz techniques:
Skew-ey? Skoo-ey? Squee?
- jbleonesio9 months ago
  Version five indeed because it already has quite a bit of an history as an ex-proprietary product.
  We pronounce it "Skwey" (like in "query") but you can really pronounce it as you wish since its not even an existing word x)
jmakov9 months ago
Would be interestimg to see how this compares to hvplot+datashader
Iwan-Zotow9 months ago
is it comparable to ParaView?
- jbleonesio9 months ago
  While Squey does not claim to be as versatile as Paraview (it is not designed to visualize 3D mesh data for example) it is on the other hand focused on conducting iterative analyses over massive columnar datasets to improve its understanding and find weak signals and anomalies through the use of parallel coordinates, data series and scatter plots.