66 pointsby jbleonesio4 days ago6 comments
  • jmakov3 days ago
    10GB, 1TB, 100TB? Memory mapping or does it need to fit into memory (RAM, VRAM?)? Is streaming supported - can I point to a 100TB dataset and cruise through it? 1 parquet file or parquet dataset? What about Delta lake? Are outliers drawn or are you doing some sort of sampling/smoothing? Also would be great to have some comparison to similar tools in this space e.g. https://github.com/finos/perspective and HvPlot+Datashader.
    • jbleonesio3 days ago
      Data needs to fit in RAM and graphics in VRAM. Let's say 100GB or more if you filter some rows during import. Data is ingested in a in-house database designed to refresh the ever changing selected rows as quickly as possible to conduct a true investigation. You can load as many parquet files as you want in one go provided they have the same structure. Any outlier in any visual representation will be drawn as this is a requirement to detect weak signals and anomalies

      Comparisons with the tools you mentioned would indeed be interesting, writing a blog post would be a good idea I guess! I wrote a comparison with ELK here : https://squey.org/domains/cybersecurity/pentesteracademy-mac...

  • macros3 days ago
    Neat tool.

    Couldn't find anything in the docs on mapping file sources to resource needs on the host, how much is too much data to dump into the tool on a single workstation?

    • jbleonesio3 days ago
      Thanks!

      It depends on the number of rows/columns and the types of the values, but the application displays a dialog asking you if you want to stop the import before completion when it feels like resources are being exhausted.

      The software was specifically developed to be able to handle as much data as possible while remaining responsive so the workstation resources will likely be the bottleneck here.

      On my 32GB development machine, I can easily load tens of millions rows with tens of columns.

  • JacobiX4 days ago
    Impressive project, judging by commits and features, it's clear that significant effort has been poured into this :) Unfortunately, there's no specific MacOS installation method provided, unsure if buildable from source ?
    • jbleonesio4 days ago
      Thanks for your feedback. Unfortunately there is currently only a Linux build (which happens to also be running under Windows thanks to WSL2) because there is a lot of dependencies[1] to build. Any help to implement a MacOS build would of course be warmly welcomed :)

      In the meantime, you can deploy the software from AWS Marketplace[2] and use it through your web browser but note that this is an on-demand paying product.

      [1]: https://gitlab.com/squey/squey/-/tree/main/buildstream/eleme...

      [2]: https://aws.amazon.com/marketplace/pp/prodview-l363lrih42bhm

      • JacobiX4 days ago
        Thank you, I'll look into it more closely, fortunately it builds using CMake / Clang, and cross-platform libs ... might be possible to port it to MacOS after some tweaks.
        • jbleonesio3 days ago
          We are using BuildStream to export a flatpak application but the build system is indeed CMake with both Clang and GCC compiling the project without warnings.

          Feel free to open an issue[1] on the project repository to further discuss about a MacOS port :)

          [1]: https://gitlab.com/squey/squey/-/issues/new

  • bbor4 days ago
    Very cool, and it’s already on version five! I’m impressed. Only one question for now, since I’m don’t yet have experience with these specific data viz techniques:

    Skew-ey? Skoo-ey? Squee?

    • jbleonesio3 days ago
      Version five indeed because it already has quite a bit of an history as an ex-proprietary product.

      We pronounce it "Skwey" (like in "query") but you can really pronounce it as you wish since its not even an existing word x)

  • jmakov4 days ago
    Would be interestimg to see how this compares to hvplot+datashader
  • Iwan-Zotow3 days ago
    is it comparable to ParaView?
    • jbleonesio3 days ago
      While Squey does not claim to be as versatile as Paraview (it is not designed to visualize 3D mesh data for example) it is on the other hand focused on conducting iterative analyses over massive columnar datasets to improve its understanding and find weak signals and anomalies through the use of parallel coordinates, data series and scatter plots.