"Unsafe code Ropey uses unsafe code to help achieve some of its space and performance characteristics. Although effort has been put into keeping the unsafe code compartmentalized and making it correct, please be cautious about using Ropey in software that may face adversarial conditions.
Auditing, fuzzing, etc. of the unsafe code in Ropey is extremely welcome. If you find any unsoundness, please file an issue! Also welcome are recommendations for how to remove any of the unsafe code without introducing significant space or performance regressions, or how to compartmentalize the unsafe code even better."
All safe code in rust is built on unsafe code. The standard library is full of unsafe code. The purpose of `unsafe` is to encourage that dangerous things are safely wrapped. In business logic I'd question using unsafe code directly, but in a performance critical low level memory management library that's exactly where I'd expect to see it.
Yes, and this means that for me to trust that the code is memory safe I need to trust the people who develop the standard library (or validate the unsafe usage myself). Rust has a good track record and a very good review process to ensure correctness of their "unsafe" block.
This library however? Do they know how to write "unsafe" blocks? I don't know! Maybe? If there were zero uses of "unsafe" in this library I would be able to adopt it without worrying about memory safety at all. In addition, I'm not that good at knowing whether an "unsafe" block is safe myself. It's not like I can review this cases myself and be confident.
(Memory safety is of course not everything, but bugs related to memory safety are much more annoying than other types of bugs.)
"Please be cautious about using Linux/macOS/Windows/Firefox/Chrome/Safari in adversarial conditions." I've never read a statement like that, even though it would be more warranted than in this case.
And even unsafe Rust is far safer than C and C++. It still provides automatic memory management by default, the thread safety guarantees that come with ownership, and abstraction mechanics that make it harder to commit blunders that can lead to unsafety.
In another language, like C, you can have a good structure and well organized attractions, but you have your "unsafe" potentially sprinkled all over.
C# has the concept of “Sequences” which is basically a generalization of a deque with associated classes and apis such as ReadOnlySequence and SequenceReader to encourage reduced allocations, reuse of existing buffers/slices even for composition, etc
Knowing the rust community, I wouldn’t be surprised if there’s already an RFC for something like this.
In general this sort of structure is the sort of thing I'd expect to see in an external crate in rust, not the standard library. So it's unlikely there's any RFCs, and more likely there's a few competing implementations lying around.
std does actually have a vague version of what the root comment wants: https://doc.rust-lang.org/std/io/struct.IoSlice.html and its sibling IoSliceMut (slicing, appending, inserting, etc. is out of scope for both - so not usable for rope stuff)
https://docs.rs/bytes/1.9.0/bytes/buf/trait.Buf.html
https://github.com/hyperium/hyper/blob/3817a79b213f840302d7e...
Network IO doesn't need contiguous memory, no, but each side of the duplex kind of benefits from it in its own way:
1. on receive, you can treat a contiguous received network datagram as its own little memory arena — write code that sends sliced references to the contents of the datagram to other threads to work with, where those references keep the datagram arena itself alive for as long as it's being worked with; and then drop the whole thing when the handling of the datagram is complete.
(This is somewhat akin to the Erlang approach — where the received message is a globally-shared binary; it gets passed by refcount into an actor started just for handling that request; that actor is spawned with its own preallocated memory arena; into that arena, the actor spits any temporaries related to copying/munging the slices of the shared binary, without having to grow the arena; the actor quickly finishes and dies; the arena is deallocated without ever having had to GC, and the refcount of the shared binary goes to zero — unless non-copied slices of it were async-forwarded to other actors for further processing.)
Also note that the whole premise here is zero-copy networking (as the bytes docs say: https://docs.rs/bytes/1.9.0/bytes/#bytes). The "message" being received here isn't a copy of the one from the network card, but literally the same physical wired memory the PHY sees as being part of its IO ring-buffer — just also mapped into your process's memory on (zero-copy) receive. If this data came chunked, you'd need to copy some of it to assemble those chunks into a contiguous string or data structure. But since it arrives contiguously, you can just slice it, and cast the resulting slice into whatever type you like.
2. on send — presuming you're doing non-blocking IO — it's nice to once again have a preallocated arena into which you can write out byte-sequences before flinging them at the kernel as [vectors of] large, contiguous DMA requests, without having to stop to allocate. (This removes the CPU as a bottleneck from IO performance — think writev(2).)
The ideal design here is that you allocate fixed-sized refcounted buffers; fill them up until the next thing you want to write doesn't fit†; and then intentionally drop the current buffer, switching your write_arena reference to point to a freshly-allocated buffer; and repeating. Each buffer then lives until all its slice-references get consumed. This forms kind of a "memory-lifetime-managed buffer-persisted message queue" — with the backing buffers of your messages living until all the messages held in them get "ACKed" [i.e. dropped by the receiving threads.]
Also, rather than having the buffers deallocate when you "use them up" — requiring you to allocate the next time you need a buffer — you can instead have the buffer's destructor release the memory it's holding into a buffer pool; and then have your next-buffer-please logic pull from that pool in preference to allocating. But then you'll want a higher-level "writable stream that is actually a mempool + current write_arena reference" type. (Hey, that's BufMut!)
† And at that point, when the next message doesn't fit, you do not split the message. That violates the whole premise of vectorizing the writes. Instead, you leave some of the buffer unused, and push the large message into a fresh buffer, so that the message will still correspond to a single vectorized-write element / io_uring call / DMA request / etc. If the message is so large it won't fit in your default buffer size, you allocate a buffer just for that one message, or better yet, you utilize a special second pool of larger fixed-size buffers. "Jumbo" buffers, per se.
(Get it yet? Networking hardware is also doing exactly what I'm describing here to pack and unpack your packets into frames. For a NIC or switch, the buffers are the [bodies of the] frames; a jumbo buffer is an Ethernet jumbo frame; and so on.)
I'm not sure if your comment was meant to be condescending, but it really does come across at that. I'm very well versed in this domain.
Having a per-request/connection arena isn't the only option. What I have seen/use, which is still zero copy (as far as IO zero copy can be in Rust without resorting to bytemuck/blittable types), is to have a pool of buffers of a specific length - typically page-sized by default and definitely page-aligned. These buffers can come from a single large contiguous allocation. If you run out of space in a buffer you grab a new/reused one from the pool, add it to your vec of buffers, and carry on. At the end of the story you would use vectored IO to submit all of them at once - all the way down to the NIC and everything.
This approach is more widespread mainly due to historical reasons: it's really easy to fragment 32bit address space, so allocating jumbo buffers simply wasn't an option if you didn't want your server OOMing with 1GB of available (but non-contiguous) memory.
https://man7.org/linux/man-pages/man3/iovec.3type.html
https://learn.microsoft.com/en-us/windows/win32/api/ws2def/n...
Apologies, I wasn't really responding to you directly; I was just taking the opportunity to write an educational-blog-post-as-comment aimed at the average HN reader (who has likely never considered what an Ethernet frame even is, or how a device that uses what are essentially DSPs does TDM packet scheduling) — with your comment being the parent because it's the necessary prerequisite reading to motivate the lesson.
> Having a per-request/connection arena isn't the only option. What I have seen/use, which is still zero copy (as far as IO zero copy can be in Rust without resorting to bytemuck/blittable types), is to have a pool of buffers of a specific length - typically page-sized by default and definitely page-aligned. These buffers can come from a single large contiguous allocation. If you run out of space in a buffer you grab a new/reused one from the pool, add it to your vec of buffers, and carry on. At the end of the story you would use vectored IO to submit all of them at once - all the way down to the NIC and everything.
I think you're focusing too much on the word "arena" here, because AFAICT we're both describing the same concept.
In your model (closer to the one used in actual switching), there's a single global buffer pool that all concurrent requests lease from; in my model, there's global heap memory, and then a per-thread/actor/buf-object elastic buffer pool that allocates from the global heap every once in a while, but otherwise reuses buffers internally.
I would say that your model is probably the one used in most zero-copy networking frameworks like DPDK. While my model is probably the one used in most language runtimes — especially managed + garbage-collected runtimes, where contending over a global language-exposed pool, can be more expensive than "allocating" (especially when the runtime has its own buffer pool and "allocation" rarely hits the kernel.)
But both models are essentially the same from the perspective of someone using the buffer ADT and trying to understand why it's designed the way it is, what it gets them, etc. :)
> it's really easy to fragment 32bit address space, so allocating jumbo buffers simply wasn't an option if you didn't want your server OOMing with 1GB of available (but non-contiguous) memory.
Maybe you're imagining something else here, but when I say "jumbo buffer", I don't mean custom buffers allocated on demand and right-sized to hold one message; rather, I'm speaking of something very closely resembling actual jumbo frames — i.e. another pre-allocated pool containing a smaller number of larger, fixed-size MTU-slot buffers.
With this kind of jumbo-buffer-pool, when your messages get big, you switch over from filling regular buffers to filling jumbo buffers — which holds off message fragmentation, but also means new messages go "out the door" a bit slower, maybe "platoon" a bit and potentially overwhelm the recipient with each burst, etc (which is why you don't just use the larger buffer pool as the only pool.)
But if your messages can be bigger than your set jumbo-buffer size, then there's nowhere to go from there; you still need to have a way to split messages across frames.
(Luckily, in the case of `bytes`, splitting a message across frames just means the message now needs multiple iovec-list entries to submit, rather than implying a framing protocol / L2 message encoding with a continuation marker / sequence ID / etc.)
As far as I know that is not possible: there's always a copy.
`bytes` can give you "ring-buffer-like" one-copy kernel-socket receive by e.g. using the Buf as the target for scheduling io_uring read/recv into.
Also, RDMA is technically networking! (Though I think all the Rust RDMA libraries already provide ADTs that work like Buf/MutBuf, rather than just saying "here's some network-shared memory, build your own ADT on top.")
> before flinging them at the kernel as [vectors of] large, contiguous DMA requests, without having to stop to allocate
So I had assumed you were taking about kernel networking elsewhere as well.
BTW, on the kernel send path, there is again a copy, contiguous or not, regardless of what API you use.
When using kernel networking I don't think contiguity matters as you suggest, as there is always a copy. Furthermore "contiguous" in userspace doesn't correspond to contiguous in physical address space so in any case the hardware is just often going to see a userspace buffer as a series of discontiguous pages anyway: that's what happens with direct IO disk writes, which _are_ zero copy (huge pages helps).
It is not a tool for composing disparate pieces into one (while avoiding copies)
I'm afraid I might not have that much free time again for a long time, but maybe when I do, somebody will have solved the regex issue for me...
A `BufRead + Seek` need not be backed by memory, though, except in the midst of being read. (A buffered normal file implements `BufRead + Seek`, for example.)
I feel like either Iterator or in some rare case of requiring generic indexing, Index, are more important than "it is composed of some number of linked memory allocations"?
A ReadOnlySequence seems to imply a linked-list of memory sections though; I'm not sure a good rope is going to be able to non-trivially interface with that, since the rope is a tree; walking the nodes in sequence is possible, but it's a tree walk, and something like ReadOnlySequenceSegment::Next() is then a bit tricky. (You could gather the set of nodes into an array ahead of time, but now merely turning it into that is O(nodes) which is sad.)
(And while it might be tempting to say "have the leaf nodes be a LL", I don't think you want to, as it means that inserts need to adjust those links, and I think you would rather have mutations produce a cheaply made but entirely new tree, which I don't think permits a LL of the leafs. You want this to make undo/redo cheap: it's just "go back to the last rope", and then all the ropes share the underlying character data that's not changing rope to rope. The rope in the OP seems to support this: "Cloning ropes is extremely cheap. Rope clones share data,")
In the announcement post, they mention that work on Xi is considered "on hold" rather than strictly discontinued: https://raphlinus.github.io/rust/gui/2022/05/07/ui-architect...
"I want to build an editor, but first I must solve rendering 2D graphics purely on the GPU, invent a parallelizable path solver, and code a human perception-based color value manipulation library."
I think we're at five or six levels of yaks by now.
(xi -> xilem -> masonry -> vello -> peniko -> color)
You can see the current projects (13 active) on https://linebender.org , and several members post interesting checkins in https://xi.zulipchat.com/
To be fair, the original author of Xi ('raphlinus) has been working on GPU-side 2D rendering much longer than on Xi.
[0] https://github.com/helix-editor/helix
[1] https://github.com/helix-editor/helix/blob/master/docs/archi...
The extra keypress for switching between "i" (inserting text) "ESC" (moving cursor) "i"... would drive me insane (just not used to it, but used to very fast, friction-free typing/editing).
I'll grant that Esc is further away, but it can be remapped.
I'm seriously impressed by the level of quality out of the box
(I couldn't find a mention of this in the README, design.md, or examples.)
In Emacs buffers, the concepts include text properties, overlays, and markers.
For example, if you delete some text these Ropey data structure, does Ropey have facilities to update the associated non-character data (such as deleting all or part of one or more chunks of the non-character data, and/or updating positional information)? Or do you have to do that separately outside of Ropey?
I was a little confused, because the lede sentence was "Ropey is a utf8 text rope for Rust, designed to be the backing text-buffer for applications such as text editors."
Pretty much all text editors are expected to implement decorations and references, somehow, and some popular text buffer APIs support those.
From the wrapper's point of view, there's no difference between character and non-character data, and the whole buffer can be modeled as a collection of indices mapping ranges of the document to different kinds of data.
One of those indices could be a rope (mapping document ranges to character data, for the document text). Other kinds of indices could also be used. The important thing is that all edits go through the wrapper so that all the relevant indices get updated.
https://github.com/swiftlang/swift-foundation/tree/main/Sour...
https://github.com/apple/swift-collections/tree/main/Sources...
What would be some good use-cases for using Ropey with Emacs? Maybe re-formatting/beautifying huge json files or something like that?
I didn't have time yet to explore the project more closely, but it looks very interesting.
Kudos to the author.
I have always thought that a text editor using rrb-trees would probably be the easiest option that would ensure decent performance for small files, and good performance for large files while also being great for random access or linear search.
> On the other hand, Ropey is not good at:
> Handling texts that are larger than available memory. Ropey is an in-memory data structure.
Also, this is a rust library, not an editor application.
(Or I mean can you shard the files/store the files more efficiently)
That seems to make it of dubious use, not really suitable for a well-engineered text editor.
The fact that it's UTF-8 only is also a serious problem since files can contain arbitrary byte sequences.
I think its mostly due to multiple buffers showing the same content, as opposed to this Ropey library directly.
Also, I have to wonder when this fad of loudly announcing when something is written in Rust will finally come to pass. Maybe software written in other languages should loudly announce it in every second sentence? To me at least it's become as self-aggrandizingly cringe as "Sent from my iPhone" at the end of every email...