Ropey – A UTF8 text rope for manipulating and editing large text(github.com)

249 pointsby keepamovin6 months ago15 comments

Validark6 months ago
From the Readme:
"Unsafe code Ropey uses unsafe code to help achieve some of its space and performance characteristics. Although effort has been put into keeping the unsafe code compartmentalized and making it correct, please be cautious about using Ropey in software that may face adversarial conditions.
Auditing, fuzzing, etc. of the unsafe code in Ropey is extremely welcome. If you find any unsoundness, please file an issue! Also welcome are recommendations for how to remove any of the unsafe code without introducing significant space or performance regressions, or how to compartmentalize the unsafe code even better."
- rendaw6 months ago
  I assume your commentary is that this is bad, but I'd like to know why. I see this criticism thrown at lots of libraries.
  All safe code in rust is built on unsafe code. The standard library is full of unsafe code. The purpose of `unsafe` is to encourage that dangerous things are safely wrapped. In business logic I'd question using unsafe code directly, but in a performance critical low level memory management library that's exactly where I'd expect to see it.
  - Validark5 months ago
    I gave no commentary. I don't even know what to say about that disclaimer. But I am not a Rust fan and I think "safety" should be achieved through making good programming easy.
  - judofyr6 months ago
    > The standard library is full of unsafe code.
    Yes, and this means that for me to trust that the code is memory safe I need to trust the people who develop the standard library (or validate the unsafe usage myself). Rust has a good track record and a very good review process to ensure correctness of their "unsafe" block.
    This library however? Do they know how to write "unsafe" blocks? I don't know! Maybe? If there were zero uses of "unsafe" in this library I would be able to adopt it without worrying about memory safety at all. In addition, I'm not that good at knowing whether an "unsafe" block is safe myself. It's not like I can review this cases myself and be confident.
    (Memory safety is of course not everything, but bugs related to memory safety are much more annoying than other types of bugs.)
- p-e-w6 months ago
  That's such a weird disclaimer, considering that an overwhelming majority of mission-critical software is written entirely in "unsafe" code (that is, C/C++).
  "Please be cautious about using Linux/macOS/Windows/Firefox/Chrome/Safari in adversarial conditions." I've never read a statement like that, even though it would be more warranted than in this case.
  And even unsafe Rust is far safer than C and C++. It still provides automatic memory management by default, the thread safety guarantees that come with ownership, and abstraction mechanics that make it harder to commit blunders that can lead to unsafety.
  - oguz-ismail6 months ago
    Those are well-established languages. Rust's only selling point is its alleged safety
    johannes12343216 months ago
    Marking code as unsafe in Rust is the escape hatch to do optimisations and reaching out to other systems. It draws the attention to that area for audits and allows building safer stuff atop.
    In another language, like C, you can have a good structure and well organized attractions, but you have your "unsafe" potentially sprinkled all over.
    p-e-w6 months ago
    Rust is pretty well-established now, being used in production by companies like Amazon. Safety is most certainly not its "only selling point". And the underlying mechanisms have been evaluated in detail by many researchers, both theoretically and practically, so labeling it as "alleged safety" is disingenuous.
ComputerGuru6 months ago
Rust is missing an abstraction over non-contiguous chunks of contiguous allocations of data that would make handling ropes seamless and more natural even for smaller sizes.
C# has the concept of “Sequences” which is basically a generalization of a deque with associated classes and apis such as ReadOnlySequence and SequenceReader to encourage reduced allocations, reuse of existing buffers/slices even for composition, etc
Knowing the rust community, I wouldn’t be surprised if there’s already an RFC for something like this.
- gpm6 months ago
  I think you might be looking for the bytes crate, which is pretty widely used in networking code: https://docs.rs/bytes/latest/bytes/index.html
  In general this sort of structure is the sort of thing I'd expect to see in an external crate in rust, not the standard library. So it's unlikely there's any RFCs, and more likely there's a few competing implementations lying around.
  - zamalek6 months ago
    Bytes is essentially multiple slices over a optimistically single contiguous arc buffer. It's basically the inverse of what the root comment is after (an array of buffers). It's a rather strange crate because network IO doesn't actually need contiguous memory.
    std does actually have a vague version of what the root comment wants: https://doc.rust-lang.org/std/io/struct.IoSlice.html and its sibling IoSliceMut (slicing, appending, inserting, etc. is out of scope for both - so not usable for rope stuff)
    Arnavion6 months ago
    The bytes crate does support what ComputerGuru asked for via the Buf trait. The trait can be implemented over a sequence of buffers but still provides functions that are common with single buffers. For example the hyper crate uses the trait in exactly this way - it has an internal type that is a VecDeque of chunks but also implements the Buf trait.
    https://docs.rs/bytes/1.9.0/bytes/buf/trait.Buf.html
    https://github.com/hyperium/hyper/blob/3817a79b213f840302d7e...
    derefr6 months ago
    > It's a rather strange crate because network IO doesn't actually need contiguous memory.
    Network IO doesn't need contiguous memory, no, but each side of the duplex kind of benefits from it in its own way:
    1. on receive, you can treat a contiguous received network datagram as its own little memory arena — write code that sends sliced references to the contents of the datagram to other threads to work with, where those references keep the datagram arena itself alive for as long as it's being worked with; and then drop the whole thing when the handling of the datagram is complete.
    (This is somewhat akin to the Erlang approach — where the received message is a globally-shared binary; it gets passed by refcount into an actor started just for handling that request; that actor is spawned with its own preallocated memory arena; into that arena, the actor spits any temporaries related to copying/munging the slices of the shared binary, without having to grow the arena; the actor quickly finishes and dies; the arena is deallocated without ever having had to GC, and the refcount of the shared binary goes to zero — unless non-copied slices of it were async-forwarded to other actors for further processing.)
    Also note that the whole premise here is zero-copy networking (as the bytes docs say: https://docs.rs/bytes/1.9.0/bytes/#bytes). The "message" being received here isn't a copy of the one from the network card, but literally the same physical wired memory the PHY sees as being part of its IO ring-buffer — just also mapped into your process's memory on (zero-copy) receive. If this data came chunked, you'd need to copy some of it to assemble those chunks into a contiguous string or data structure. But since it arrives contiguously, you can just slice it, and cast the resulting slice into whatever type you like.
    2. on send — presuming you're doing non-blocking IO — it's nice to once again have a preallocated arena into which you can write out byte-sequences before flinging them at the kernel as [vectors of] large, contiguous DMA requests, without having to stop to allocate. (This removes the CPU as a bottleneck from IO performance — think writev(2).)
    The ideal design here is that you allocate fixed-sized refcounted buffers; fill them up until the next thing you want to write doesn't fit†; and then intentionally drop the current buffer, switching your write_arena reference to point to a freshly-allocated buffer; and repeating. Each buffer then lives until all its slice-references get consumed. This forms kind of a "memory-lifetime-managed buffer-persisted message queue" — with the backing buffers of your messages living until all the messages held in them get "ACKed" [i.e. dropped by the receiving threads.]
    Also, rather than having the buffers deallocate when you "use them up" — requiring you to allocate the next time you need a buffer — you can instead have the buffer's destructor release the memory it's holding into a buffer pool; and then have your next-buffer-please logic pull from that pool in preference to allocating. But then you'll want a higher-level "writable stream that is actually a mempool + current write_arena reference" type. (Hey, that's BufMut!)
    † And at that point, when the next message doesn't fit, you do not split the message. That violates the whole premise of vectorizing the writes. Instead, you leave some of the buffer unused, and push the large message into a fresh buffer, so that the message will still correspond to a single vectorized-write element / io_uring call / DMA request / etc. If the message is so large it won't fit in your default buffer size, you allocate a buffer just for that one message, or better yet, you utilize a special second pool of larger fixed-size buffers. "Jumbo" buffers, per se.
    (Get it yet? Networking hardware is also doing exactly what I'm describing here to pack and unpack your packets into frames. For a NIC or switch, the buffers are the [bodies of the] frames; a jumbo buffer is an Ethernet jumbo frame; and so on.)
    zamalek6 months ago
    > Get it yet
    I'm not sure if your comment was meant to be condescending, but it really does come across at that. I'm very well versed in this domain.
    Having a per-request/connection arena isn't the only option. What I have seen/use, which is still zero copy (as far as IO zero copy can be in Rust without resorting to bytemuck/blittable types), is to have a pool of buffers of a specific length - typically page-sized by default and definitely page-aligned. These buffers can come from a single large contiguous allocation. If you run out of space in a buffer you grab a new/reused one from the pool, add it to your vec of buffers, and carry on. At the end of the story you would use vectored IO to submit all of them at once - all the way down to the NIC and everything.
    This approach is more widespread mainly due to historical reasons: it's really easy to fragment 32bit address space, so allocating jumbo buffers simply wasn't an option if you didn't want your server OOMing with 1GB of available (but non-contiguous) memory.
    https://man7.org/linux/man-pages/man3/iovec.3type.html
    https://learn.microsoft.com/en-us/windows/win32/api/ws2def/n...
    derefr6 months ago
    > I'm very well versed in this domain.
    Apologies, I wasn't really responding to you directly; I was just taking the opportunity to write an educational-blog-post-as-comment aimed at the average HN reader (who has likely never considered what an Ethernet frame even is, or how a device that uses what are essentially DSPs does TDM packet scheduling) — with your comment being the parent because it's the necessary prerequisite reading to motivate the lesson.
    > Having a per-request/connection arena isn't the only option. What I have seen/use, which is still zero copy (as far as IO zero copy can be in Rust without resorting to bytemuck/blittable types), is to have a pool of buffers of a specific length - typically page-sized by default and definitely page-aligned. These buffers can come from a single large contiguous allocation. If you run out of space in a buffer you grab a new/reused one from the pool, add it to your vec of buffers, and carry on. At the end of the story you would use vectored IO to submit all of them at once - all the way down to the NIC and everything.
    I think you're focusing too much on the word "arena" here, because AFAICT we're both describing the same concept.
    In your model (closer to the one used in actual switching), there's a single global buffer pool that all concurrent requests lease from; in my model, there's global heap memory, and then a per-thread/actor/buf-object elastic buffer pool that allocates from the global heap every once in a while, but otherwise reuses buffers internally.
    I would say that your model is probably the one used in most zero-copy networking frameworks like DPDK. While my model is probably the one used in most language runtimes — especially managed + garbage-collected runtimes, where contending over a global language-exposed pool, can be more expensive than "allocating" (especially when the runtime has its own buffer pool and "allocation" rarely hits the kernel.)
    But both models are essentially the same from the perspective of someone using the buffer ADT and trying to understand why it's designed the way it is, what it gets them, etc. :)
    > it's really easy to fragment 32bit address space, so allocating jumbo buffers simply wasn't an option if you didn't want your server OOMing with 1GB of available (but non-contiguous) memory.
    Maybe you're imagining something else here, but when I say "jumbo buffer", I don't mean custom buffers allocated on demand and right-sized to hold one message; rather, I'm speaking of something very closely resembling actual jumbo frames — i.e. another pre-allocated pool containing a smaller number of larger, fixed-size MTU-slot buffers.
    With this kind of jumbo-buffer-pool, when your messages get big, you switch over from filling regular buffers to filling jumbo buffers — which holds off message fragmentation, but also means new messages go "out the door" a bit slower, maybe "platoon" a bit and potentially overwhelm the recipient with each burst, etc (which is why you don't just use the larger buffer pool as the only pool.)
    But if your messages can be bigger than your set jumbo-buffer size, then there's nowhere to go from there; you still need to have a way to split messages across frames.
    (Luckily, in the case of `bytes`, splitting a message across frames just means the message now needs multiple iovec-list entries to submit, rather than implying a framing protocol / L2 message encoding with a continuation marker / sequence ID / etc.)
    BeeOnRope6 months ago
    How does bytes crate, or anyone else, offer zero copy receive from kernel (as opposed to kernel bypass) sockets?
    As far as I know that is not possible: there's always a copy.
    derefr6 months ago
    For network receive, I was assuming kernel-bypass sockets, not kernel sockets.
    `bytes` can give you "ring-buffer-like" one-copy kernel-socket receive by e.g. using the Buf as the target for scheduling io_uring read/recv into.
    Also, RDMA is technically networking! (Though I think all the Rust RDMA libraries already provide ADTs that work like Buf/MutBuf, rather than just saying "here's some network-shared memory, build your own ADT on top.")
    BeeOnRope6 months ago
    Thanks, you mention explicitly kernel networking right below about the send path:
    > before flinging them at the kernel as [vectors of] large, contiguous DMA requests, without having to stop to allocate
    So I had assumed you were taking about kernel networking elsewhere as well.
    BTW, on the kernel send path, there is again a copy, contiguous or not, regardless of what API you use.
    When using kernel networking I don't think contiguity matters as you suggest, as there is always a copy. Furthermore "contiguous" in userspace doesn't correspond to contiguous in physical address space so in any case the hardware is just often going to see a userspace buffer as a series of discontiguous pages anyway: that's what happens with direct IO disk writes, which _are_ zero copy (huge pages helps).
    cmrdporcupine6 months ago
    Yah I'd Bytes' chief use is avoiding copies when dealing with distinct portions of (contiguous) buffers.
    It is not a tool for composing disparate pieces into one (while avoiding copies)
  - 6 months ago
    undefined
- caconym_6 months ago
  I wrote a utf-8 capable (but also fully generic over element type) rope implementation in Rust last fall (edit: 2023) and the main issue I ran into was the lack of a suitable regex library capable of working across slice boundaries. With some finagling I did manage to get it to work with most/all of the other relevant iterator/reader traits IIRC, and it benchmarked fairly well from a practical perspective, though it's not as fast as some of the other explicitly performance-focused implementations out there.
  I'm afraid I might not have that much free time again for a long time, but maybe when I do, somebody will have solved the regex issue for me...
  - celeritascelery6 months ago
    There is a crate that can handle regex over non contiguous data. It had some weak points, but overall is really good
    https://crates.io/crates/regex-cursor
    funny_falcon6 months ago
    And looks like Helix editor uses it together with ropey.
- deathanatos6 months ago
  Hmm. It's similar to, but not fully, a `BufRead`? Maybe a `BufRead + Seek`. The slicing ability isn't really covered by those traits, though, but I think you could wrap a BufRead+Seek in something that effectively slices it.
  A `BufRead + Seek` need not be backed by memory, though, except in the midst of being read. (A buffered normal file implements `BufRead + Seek`, for example.)
  I feel like either Iterator or in some rare case of requiring generic indexing, Index, are more important than "it is composed of some number of linked memory allocations"?
  A ReadOnlySequence seems to imply a linked-list of memory sections though; I'm not sure a good rope is going to be able to non-trivially interface with that, since the rope is a tree; walking the nodes in sequence is possible, but it's a tree walk, and something like ReadOnlySequenceSegment::Next() is then a bit tricky. (You could gather the set of nodes into an array ahead of time, but now merely turning it into that is O(nodes) which is sad.)
  (And while it might be tempting to say "have the leaf nodes be a LL", I don't think you want to, as it means that inserts need to adjust those links, and I think you would rather have mutations produce a cheaply made but entirely new tree, which I don't think permits a LL of the leafs. You want this to make undo/redo cheap: it's just "go back to the last rope", and then all the ropes share the underlying character data that's not changing rope to rope. The rope in the OP seems to support this: "Cloning ropes is extremely cheap. Rope clones share data,")
- jzelinskie6 months ago
  Is buf-list[0] what you're describing?
  [0]: https://crates.io/crates/buf-list
- theolivenbaum6 months ago
  There's also a really nice implementation of Rope for C# here: https://github.com/FlatlinerDOA/Rope
- duped6 months ago
  Vec<Vec<T>>?
rdimartino6 months ago
I hadn't heard of rope data structures until I read about the xi editor (also written in Rust) a few years ago, but it looks like that's been discontinued.
https://github.com/xi-editor/xi-editor
- kibwen6 months ago
  The authors of Xi are currently working on Xilem, an experimental reactive UI framework for Rust: https://github.com/linebender/xilem
  In the announcement post, they mention that work on Xi is considered "on hold" rather than strictly discontinued: https://raphlinus.github.io/rust/gui/2022/05/07/ui-architect...
  - infogulch6 months ago
    Legendary-tier yak shaving.
    "I want to build an editor, but first I must solve rendering 2D graphics purely on the GPU, invent a parallelizable path solver, and code a human perception-based color value manipulation library."
    PoignardAzur6 months ago
    You have no idea.
    I think we're at five or six levels of yaks by now.
    (xi -> xilem -> masonry -> vello -> peniko -> color)
    bbkane6 months ago
    It's a lot of fun to follow, especially as its so different than my developmental expertise.
    You can see the current projects (13 active) on https://linebender.org , and several members post interesting checkins in https://xi.zulipchat.com/
    mananaysiempre6 months ago
    > first I must solve rendering 2D graphics purely on the GPU
    To be fair, the original author of Xi ('raphlinus) has been working on GPU-side 2D rendering much longer than on Xi.
    kstrauser6 months ago
    This is the path to Enlightenment (17).
    mech4226 months ago
    They just had a release as well :-) It doesn't really seem to get much press anymore... Back in the 'early days' - wow! E! was something pretty special :-D
    kstrauser6 months ago
    E15 or so was mindblowing at the time. It was living in a sci-fi movie.
  - amanda996 months ago
    Repo says "discontinued".
    daveguy6 months ago
    Yes, the xi repo is discontinued. They recommend the lapce editor as the spiritual successor:
    https://github.com/lapce/lapce
    satvikpendem6 months ago
    I'd also recommend Helix [0] (which also uses the rope data structure [1]), that's a more widely used editor also written in Rust.
    [0] https://github.com/helix-editor/helix
    [1] https://github.com/helix-editor/helix/blob/master/docs/archi...
    jll296 months ago
    This looks pretty cool - except... modes like in vi are a no-go area; has anyone tried to modify the GUI (key bindings) to make it similar to Emacs or Notepad?
    The extra keypress for switching between "i" (inserting text) "ESC" (moving cursor) "i"... would drive me insane (just not used to it, but used to very fast, friction-free typing/editing).
    sandbach6 months ago
    I'm curious, how do you avoid friction between inserting text and editing? Surely moving your hand to the mouse, arrow keys, or Ctrl key for an Emacs keybinding takes longer than pressing "i"?
    I'll grant that Esc is further away, but it can be remapped.
    satvikpendem6 months ago
    I rebind ESC to caps lock, works wonders as it's right on the same keyboard row as the home keys.
    satvikpendem6 months ago
    There might be mods but it is fundamentally built around those vi like keybindings so you'll be fighting an uphill battle.
    VeejayRampay6 months ago
    helix is a really really good text editor / terminal IDE
    I'm seriously impressed by the level of quality out of the box
    pryelluw6 months ago
    Thanks for posting. I discovered floem https://github.com/lapce/floem I’ve been looking for something like it
- itishappy6 months ago
  Zed uses something similar to ropes as well:
  https://zed.dev/blog/zed-decoded-rope-sumtree
  - infogulch6 months ago
    Zed's Sum Tree is my favorite datastructure ever and is the future of database indexes.
    senderista6 months ago
    I think this is what Guy Steele called a "monoid-cached tree":
    https://www.youtube.com/watch?v=ftcIcn8AmSY
    infogulch6 months ago
    Cool talk, thanks for sharing!
  - PittleyDunkin6 months ago
    Zed seems to be a gui-oriented editor here: https://zed.dev/
    supriyo-biswas6 months ago
    You still need a backing data structure that holds the contents of your editor, and that's where you'd use a rope.
- secondcoming6 months ago
  Is it even possible to write any text editor without some form of rope data structure?
  - cschmidt6 months ago
    Here's a paper reviewing the various choices, that is often mentioned in discussions around data structures for text editors:
    https://www.cs.unm.edu/~crowley/papers/sds.pdf
  - ben-schaaf6 months ago
    VSCode uses a piece table (https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...).
    canucker20166 months ago
    MS Word used piece table, which led to the "fast save" feature - which saved only the newer pieces.
  - caconym_6 months ago
    Gap buffers are the other classic option, and there are others too, e.g. piece tables.
  - marssaxman6 months ago
    Most certainly: gap buffers, piece tables, and line arrays are also popular choices.
neilv6 months ago
How would you associate non-character data with ranges of characters, such as for syntax coloring, semantic links, and references to points in the text?
(I couldn't find a mention of this in the README, design.md, or examples.)
In Emacs buffers, the concepts include text properties, overlays, and markers.
- filcuk6 months ago
  That would depend on your editor's implementation.
  - neilv6 months ago
    But, within this API, is there any support for the associations with non-character data?
    For example, if you delete some text these Ropey data structure, does Ropey have facilities to update the associated non-character data (such as deleting all or part of one or more chunks of the non-character data, and/or updating positional information)? Or do you have to do that separately outside of Ropey?
    zaphar6 months ago
    A rope is only concerned with manipulating a string with very low cpu overhead while maintaining the illusion of a sequence of characters or bytes. It doesn't really care or maintain any other text decoration you might be maintaining. That is the concern of the consumer of the rope and I'm not sure there is a good common interface for that.
    neilv6 months ago
    Thanks.
    I was a little confused, because the lede sentence was "Ropey is a utf8 text rope for Rust, designed to be the backing text-buffer for applications such as text editors."
    Pretty much all text editors are expected to implement decorations and references, somehow, and some popular text buffer APIs support those.
    alaroldai6 months ago
    For data that isn't part of the document, you could use a write-through wrapper around the rope, with a secondary data structure mapping ranges of the document to this extra data.
    From the wrapper's point of view, there's no difference between character and non-character data, and the whole buffer can be modeled as a collection of indices mapping ranges of the document to different kinds of data.
    One of those indices could be a rope (mapping document ranges to character data, for the document text). Other kinds of indices could also be used. The important thing is that all edits go through the wrapper so that all the relevant indices get updated.
    favorited6 months ago
    If you'd like an example of how this can be done, Swift's AttributedString type is exactly that. It manages the association of runs of attributes (font, kerning, color, etc.) with Unicode text, and its backing storage uses a rope type provided by the swift-collections package. (The rope module itself isn't stabilized yet, so it's really only used for AttributedString at this point, AFAIK.)
    https://github.com/swiftlang/swift-foundation/tree/main/Sour...
    https://github.com/apple/swift-collections/tree/main/Sources...
    adastra226 months ago
    I thought the defining feature of a text editor (as opposed to a word processor) is that it didn’t have rich text decorations. Are we talking about the same thing?
    nicoburns6 months ago
    Most text editors will support things like syntax highlighting, which are text-decorations even if they're nor user-managed.
    neilv6 months ago
    Not rich text, but decorations like decorations on a data structure. (I was trying to match the terminology that I thought a previous commenter was using.)
- iLemming6 months ago
  I'm sorry, it's only vaguely related, but maybe someone can share some ideas.
  What would be some good use-cases for using Ropey with Emacs? Maybe re-formatting/beautifying huge json files or something like that?
  I didn't have time yet to explore the project more closely, but it looks very interesting.
qup6 months ago
What a perfect readme.
Kudos to the author.
- mathfailure6 months ago
  No need for sarcasm, maybe target auditory already knows what 'rope' is.
  - qup6 months ago
    No sarcasm, it was good enough to compliment
  - doormatt6 months ago
    Why do you assume sarcasm? I thought the readme was unsarcastically excellent as well.
bjoli6 months ago
Can someone explain to me why ropes are better than RRB trees? I havent implemented ropes in a long time, but I remember that there was very little performance benefits over gap buffers even for things like multiple cursors until the document became 10+ mb.
I have always thought that a text editor using rrb-trees would probably be the easiest option that would ensure decent performance for small files, and good performance for large files while also being great for random access or linear search.
ggregoire6 months ago
Didn't know about this data structure. What are some use cases other than text editors? The article on wikipedia [1] doesn't expand much on this.
[1] https://en.wikipedia.org/wiki/Rope_(data_structure)
- zahlman6 months ago
  I imagine that text template renders could benefit. The general use case is any time you need to repeatedly splice (take slices, insert data in the middle etc) a text of significant length and performance is important. (I heard of this concept a long time ago, but rarely think about it.)
- josephg6 months ago
  We use them pretty heavily in realtime collaborative editing libraries for text. Ie, text CRDTs. In diamond types, merging a branch into another branch requires (essentially) replaying a whole lot of edit operations. Using a rope makes that fast.
seiferteric6 months ago
I am interested in this. At my job we have shared systems where engineers often open very large files (10+GB) using vim/gvim and it loads the entire thing into memory and often means these servers are memory starved. Would ropey help in these situations?
- CodesInChaos6 months ago
  No:
  > On the other hand, Ropey is not good at:
  > Handling texts that are larger than available memory. Ropey is an in-memory data structure.
  Also, this is a rust library, not an editor application.
- runeblaze6 months ago
  I am literally guessing at this point — but can they mmap those files?
  (Or I mean can you shard the files/store the files more efficiently)
  - seiferteric6 months ago
    I am not sure, I have looked into whether vim mem maps files or not over the years and get conflicting answers. Some say no, some say newer versions do, or it depends file type (binary vs text) or it depends on what plugins are enabled etc. All I know is I often see gvim processes using tons of rss memory.
devit6 months ago
"Handling texts that are larger than available memory. Ropey is an in-memory data structure."
That seems to make it of dubious use, not really suitable for a well-engineered text editor.
The fact that it's UTF-8 only is also a serious problem since files can contain arbitrary byte sequences.
- abtinf6 months ago
  What kind of wild use case involves editing a text file larger than modern ram?
- jbaber6 months ago
  I disagree that there are a lot of editors that can handle files not in RAM. But I'd like one. All I know of is 'less'.
- TheRealPomax6 months ago
  What text editor do you use where loading a large text file doesn't put that entire text file in memory? In a world where even sub-$100 devices have gigabytes of memory, this doesn't seem even remotely problematic.
  - eviks6 months ago
    those gigabytes of memory on sub-$100 devices are usually occupied by other apps taking the same careless approach to memory, so it's very problematic
    TheRealPomax6 months ago
    Devices with lots of ram can fit huge files, devices with very little ram can still fit very large, but not gigabytes-huge files. You're not going to fire up VS Code on pi 2 and load a 5GB log file, but you are going to fire up a simpler editor and open a 100MB log file. So as long as you can fit files that make sense for your device in memory, so why would "it loads files into memory" be a problem?
    eviks6 months ago
    You're just making up numbers to fit the conclusion. For example, the only alternative is either a 5G or 100M file. But in reality, just like you have many tabs in a browser, each eating their 100M away, you'll be opening multiple log files. So those cool sounding "several gigs" is actually very constrained, for example: 3G is "several gigs", but on Windows you will be almost always memory-constrained with a few browsers running (web, chat, etc) and you won't have much to open a few of those 100M log files
    TheRealPomax6 months ago
    Same to you: do you actually? Because I don't use a single device where I'm loading more into memory than it's designed for. If you're a 100 tab kind of person, there's help, you don't have to suffer yourself in silence. And if you open so many logs at the same time as to memory constrain the device you're using, that's the proverbial "are you sure you're not doing it wrong?"
    So again: which editor are you using that's not currently loading your logs in memory? Because that's still not been answered. Hell even emacs and vim do that, so what are you even doing? what are you using that makes this solution loading data into memory not okay when even the most hardcore unix user's "pry it from my cold dead hands" tools already do exactly the same?
javier26 months ago
Any editors using this?
- cschmidt6 months ago
  It seems like Helix is using it https://github.com/helix-editor/helix/blob/master/docs/archi...
  - abound6 months ago
    FWIW, I use Helix as my main editor and every time it has crashed (probably a few dozen times over a year or two, I've filed issues), it's related to bad text position stuff, where it effectively goes "out of bounds" on the text data structure.
    I think its mostly due to multiple buffers showing the same content, as opposed to this Ropey library directly.
    polyaniline6 months ago
    That's unexpected. I've used it for over a year and haven't had a single crash. I've used it for data files millions of lines long, and often open the same file in buffers side by side too. Could you share the issue you opened?
    abound6 months ago
    - https://github.com/helix-editor/helix/issues/11677
    - https://github.com/helix-editor/helix/issues/6752
- recov6 months ago
  Not that library in particular, but https://zed.dev/blog/zed-decoded-rope-sumtree
rileytg6 months ago
i hope someone can use this to create an editor similar to notepad++ that is cross platform. I have not found an editor that can handle large files as well as notepad++ on non windows systems. Last I looked into this, the issue was lack of low level libraries to handle large files.
- nicoburns6 months ago
  It's not open source, but sublime text does well with large files (depending on your definition of large, but several GB is fine)
- ori_b6 months ago
  You're someone. Start typing.
- polyaniline6 months ago
  Helix is on Windows, right?,
pixelpoet6 months ago
Author is perhaps better known for his really great path tracer (and attendant blog), Psychopath: https://github.com/cessen/psychopath
Also, I have to wonder when this fad of loudly announcing when something is written in Rust will finally come to pass. Maybe software written in other languages should loudly announce it in every second sentence? To me at least it's become as self-aggrandizingly cringe as "Sent from my iPhone" at the end of every email...
- mattdw6 months ago
  When it’s a library of code, the language it is written in is pretty pertinent information as that’s the language it has to be consumed from…
  - pixelpoet6 months ago
    I get that of course, but on the other hand I'm sure you know what I'm getting at too: users of certain languages, platforms etc feel the need to announce it as a point of pride, or feature in itself, and frequently. Every language will have this to some degree (with the possible exception of COBOL, lol), but there are definite outliers.
    itishappy6 months ago
    I believe HN actually had an article for a Minecraft server written in COBOL about a month ago, lol.
    https://news.ycombinator.com/item?id=42513022
- xboxnolifes6 months ago
  I think you've just become primed to reflex on seeing the word rust. The Readme has only 2 uses of the word rust: the header, and in the features. One tells you the language the library is for, the other is used as context for a type.
- ricciardo6 months ago
  Think about the implications of the language though. When something is written in rust, management of memory is safer, multi threaded applications are safer, etc... (due to the nature of the language). If something is written in C++, the developer might be more inclined to review the code and tests to ensure proper handling of memory as well as determining if (when there is) non-deterministic behavior is safe. Hence, when highlighting something is written in Rust, it might not be just for the buzzword but also for something like developer confidence.
- J_Shelby_J6 months ago
  When it stops baiting engagement with these sort of comments, probably.
- uecker6 months ago
  Maybe it is already considered an achievement if someone manages to write a program in Rust.
- keybored6 months ago
  I for one am fed up with people advertising that they are using rope for their string implementation in an editor or editor-adjacent library. Ugh. We get it. Niche datastructure, woah so cool. You really went above and beyond and read that appendix to your data structures and algorithms textbook...!
  - Validark6 months ago
    Is it advertised a lot? I know the basic idea of Ropes and that it's used for scripting languages a lot but I haven't seen it in ads. Maybe I haven't looked at enough new editors though.
6 months ago
undefined
cryptonector6 months ago
Reminds me of xi.
RockRobotRock6 months ago
built in rust btw