18 pointsby mosura16 hours ago1 comment
  • Veserv13 hours ago
    While it is nice that it is faster, ~7 Gb/core-second using a in-process "virtual network", (thus only measuring the protocol implementation itself instead of the rest of the network stack) is not exactly a fast network protocol implementation. That is ~500,000-700,000 full packets per second or ~1.5-2 core-us/packet.

    Under those same conditions, you can quite readily do ~100 Gb/core-second (ignoring encryption, encryption will bottleneck you to 30-50 Gb/core-second on modern chips with AES acceleration instructions) in software with feature parity with proper protocol design and implementation.

    • joeturki7 hours ago
      SCTP isn't just a UDP pipe, It's a message oriented, congestion-controlled, reliability protocol, with bunch of other semantics.

      We measured:

      1. Association state + Per PATH CC/RTO, timers, RTT tracking, cwnd etc.

      2. Selective ACKs and re-transmit logic.

      3. chunk framing + tsn sequences.

      4. ordered vs unordered delivery, and fragmentation/reassembly.

      much more ...

      Also our vnet-based implementation isn't just dumb buffer, we have packet on wire validation, SCTP parsing, CRC32c validations. deterministic network conditions emulator. With real time conditions.

      Sure you can get 100 GB/Core second if you bypass all of that and just do huge batching

      The blog post claim is just under the same SCTP semantics and the same test harness, enabling RACK has a huge win. not the absolute ceilings of in-process "virtual network" sockets :)

      • Veserv5 hours ago
        Yes, I meant all of that when I explicitly said feature parity at 100 Gb/core-second. Reliable delivery of multiple independent bytestreams (which is actually more than SCTP gives since SCTP still suffers from head-of-line blocking due to SCTP SACKs being by TSN instead of a per-stream identifier) with dynamic stream count (again, more than SCTP gives) over a unreliable network that may reorder or lose packets.
        • joeturki2 hours ago
          Okay, I see your point, but our test harness isn't meant to be an absolute "max throughput" benchmark. every packet is parsed, corrected (if needed), and validated in real time (CRC32c, on-the-wire checks, deterministic network emulation, etc.).

          If we ever want a true ceiling number, we could add a separate fast path (e.g, a dump-writer / sink that skips most validation) or validate after run, but that's not in scope right now. our scope was: (1) validate Pion/SCTP PRs and (2) compare performance against other branches and version. so for relative benchmark under identical conditions.

          on head-of-line blocking: we have a pending RFC 8260 message interleaving (I-DATA) implementation, and we've tested with it; it helps reduce HoL on the sender side (especially around fragmentation). our benchmark tool has a flag to run with interleaving, and we tested it quit a bit. We plan to release it in Jan.