29 pointsby AlpinDale15 hours ago5 comments
  • adrian_b2 hours ago
    The claim of being 7x faster than rsync is very dubious. I would like to know the test conditions for such a result.

    I use every day rsync over SSH, and even between 7 to 10 years old computers it reaches the maximum link speed over 2.5 Gb/s Ethernet.

    So in order to need something faster than rsync and be able to test it, one must use at least 10 Gb/s Ethernet, where I do not know how good must be your CPU to reach link speed.

    For 7x faster, one would need to use at least 25 Gb/s Ethernet, and this in the worst case for rsync, when it were not faster on higher speed Ethernet than what I see on cheap 2.5 Gb/s Ethernet.

    If on a higher-speed Ethernet the link speed would not be reached due to an ancient CPU that has insufficient speed for AES-GCM or for AES-UMAC, then using multiple connections would not improve the speed. If the speed is not limited by encryption, then changing TCP parameters, like window sizes, would probably have the same effect as using multiple connections, even when using just rsync over ssh.

    If the transfers are done over the Internet, then the speed is throttled by some ISP and it is not determined by your computers. There are some cases when a small number of connections, e.g. 2 or 3 may have a higher aggregate throughput than 1, but in most cases that I have seen the ISPs limit the aggregated throughput for the traffic that goes to 1 IP address, so if you open more connections you get the same throughput as with fewer connections.

    • i_think_soan hour ago
      > I use every day rsync over SSH, and even between 7 to 10 years old computers it reaches the maximum link speed over 2.5 Gb/s Ethernet.

      What are you rsyncing? Is it Maildirs for 5000 users? Or a multi-TB music and movie archive? The former might benefit greatly if the filesystem and its flash backing store is bottlenecking on metadata lookup, not bandwidth. The latter, not so much.

      I too would like to know the test conditions. This is probably one of those tools that is lovely for the right use case, useless for the wrong one.

      • wolttaman hour ago
        Anecdote: I have rsync’d maildirs and I recall managing a ~7x perf improvement by combining rsync with GNU parallel (trivial to fan out on each maildir)
        • i_think_so2 minutes ago
          Awww yeah. +1 for GNU parallel.

          When I think of those obscenely ugly scripting hacks I used to do back in the day....

          "Well, trust me, this way's easier." -- Bill Weasley

  • an hour ago
    undefined
  • an hour ago
    undefined
  • 15 hours ago
    undefined
  • nurettin4 hours ago
    Why not tar.gz and send as a single stream?
    • exceptione4 hours ago
      Because (afaik), the single-threaded ssh program is the bottle-neck.
      • i_think_so2 hours ago
        It used to be possible in openssh to use -c none and skip the overhead of encryption for the transport (while retaining the protection of rsa keys for authentication). Even the deprecated blowfish-cbc was often faster than aes-ni for bulk transfers. I remember cutting off hours of wait time in backup jobs using these options.

        Sadly it appears those days are gone now. 3des is still supported, probably for some governmental environments, but it was always a slower algorithm. Unless there are undocumented hacks I think we're stuck with using proper crypto. Oh darn.

      • nurettin2 hours ago
        It is a bottleneck for multiple files, but will it speed up with a single file?This is how we sent files for decades. Archive, transfer, unarchive. So I'm wondering what the point is.
        • i_think_so2 hours ago
          It depends on the size of the file, of course. For copying your 90 line .bashrc, probably not noticeable in the noise. For copying an 800GB database? Um, yeah. :-)

          I see this project's main value in turning loose the power of multiple cores on a filesystem full of manifold directories, backed by flash based storage that only runs optimally at queue depth >1 (which is most of them). On spinning rust this will probably just thrash the heads.

          Hmmm. I wonder how 2 or 3 threads perform with zfs and a reasonable sized ARC?