The repercussions of missing an Ampersand in C++ and Rust(www.nablag.com)

80 pointsby nablags5 months ago18 comments

Arnavion5 months ago
Rust's behavior of moving without leaving a moved-out shell behind also simplifies the implementation of the type itself, because its dtor doesn't have to handle the special case of a moved-out shell, and the type doesn't even need to be able to represent a moved-out shell.
For example, a moved-out-from tree in C++ could represent this by having its inner root pointer be nullptr, and then its dtor would have to check for the root being nullptr, and all its member fns would have the danger of UB (nullptr dereference) if the caller called them on a moved-out shell. But the Rust version could use a non-nullable pointer type (Box), and its dtor and member fns would be guaranteed to act on a valid pointer.
- nixpulvis5 months ago
  This was one of the most unsatisfying things about learning C++ move semantics. They only kinda move the thing, leaving this shell behind is a nightmare.
  - flohofwoe5 months ago
    C++ doesn't have ownership baked into the language like Rust does, and "move semantics" is all about ownership (under the hood it's just a plain old shallow copy both in C++ and Rust). Making the moved from object inaccessible like in Rust would have required static ownership tracking which I guess the C++ committee was afraid to commit to (and once you have that, you're basically halfway to Rust, including the downside of a more restrictive programming model).
    motorest5 months ago
    > Making the moved from object inaccessible like in Rust would have required static ownership tracking which I guess the C++ committee was afraid to commit to (...)
    I'm not sure the "afraid to commit to" is a valid interpretation. The requirements that the C++ standard specifies for moved-from objects turns that hypothetical issue into a non-problem. In C++, if you move an object then after the move the object must be left in a valid state. That's it. This means the object can be safely destroyed.
    You are also free to implement whatever semantics your moved-from object has. If you want your moved-from object to throw an exception, you are free to implement that. If instead you want to ensure your moved-from can be reused you are also free to do so. If you want to support zombie objects then nothing prevents you from going that path. It's up to you. The only thing the standard specifies is that once the lifetime of that object ends, it can be safely destroyed. That sounds both obvious and elegant, don't you agree?
    Dylan168075 months ago
    You'd have to mark some functions as deleting their arguments. But I wouldn't really call that ownership. And it shouldn't restrict the language: If the compiler can't solve it statically then it can set a flag or null and check it before calling the destructor. Instead of a guard being built into every destructor use.
  - motorest5 months ago
    > This was one of the most unsatisfying things about learning C++ move semantics. They only kinda move the thing, leaving this shell behind is a nightmare.
    I don't know what nightmares you have. The only requirement that C++ specifies for moved-from objects is that they remain valid. Meaning, they can be safely destroyed.
    You can go way out of your way and reuse an object that was just moved, but that's a decision you somehow made, and you have the responsibility of adding your reinitialization or even move logic to get that object back in shape. That is hardly something that sneaks up on you.
  - DLoupe5 months ago
    Since I use move semantics all the time, this is for me the most frustrating thing about C++ full stop. I really wish they'd fix this instead of adding all those compile-time features.
    motorest5 months ago
    > Since I use move semantics all the time (...)
    Everyone who ever uses C++ uses move semantics all the time,including move elision. It's not an obscure feature.
    > (...) this is for me the most frustrating thing about C++ full stop.
    I've been using C++ for years and I have no idea what you could be possibly referring to. The hardest aspect of move semantics is basically the rule of 5. From that point, when you write a class you have the responsibility to specify how you want your class to be moved and how you want your moved-from class to look like, provided that you ensure you leave it in a valid state.
    That's it.
    What exactly do you believe needs fixing?
    catlifeonmars5 months ago
    How would you fix this in C++?
    DLoupe5 months ago
    By adding syntax and semantics for destructible moves, meaning the moved object is removed from its scope (without calling its destructor.)
    motorest5 months ago
    I've worked with C++ for a number of years, with a few codebases that were >1M LoC. Never did I stumbled upon a situation where an object was moved and an existing symbol became a problem. I wonder what you are doing to get yourself in that situation.
    DLoupe5 months ago
    > I wonder what you are doing to get yourself in that situation.
    The problem with the current move semantics is that, compared to e.g. Rust: 1) the compiler generates unnecessary code and 2) instead of just implementing class T you must implement a kind of optional<T>.
    Which means, that after all those years of using smart pointers I find myself ditching them in favor of plain pointers like we did in the 90's.
    catlifeonmars5 months ago
    When you say you must, do you mean that it’s best practice, that or that this is UB or similar?
    motorest5 months ago
    > When you say you must, do you mean that it’s best practice, that or that this is UB or similar?
    I'm not OP, but the only requirements that C++ imposed on moved-from objects is that they remain valid objects. Meaning, they can be safely destroyed or reused by reassigning or even moving other objects into them. I have no idea what OP could be possibly referring to.
    motorest5 months ago
    > The problem with the current move semantics is that, compared to e.g. Rust: 1) the compiler generates unnecessary code and 2) instead of just implementing class T you must implement a kind of optional<T>.
    I don't know what you mean by "compiler generates unnecessary code" or why you see that as a problem. I also have no idea what you mean by "a kind of optional". The only requirement on moved-from objects is that they must be left in a valid state. Why do you see that as a problem?
    DLoupe5 months ago
    The compiler generates code for calling the destructor after the object was moved. This was problem #1.
    Regarding #2, take Resource Acquisition Is Initialization (RAII) as an example - in RAII, the existence of an object implies the existence of a resource. Now, if you want to be able to move, the object becomes "either the resource exists or it was moved out". As someone else noted in the comments, this affects not only the destructor. Methods cannot assume the existence of the resource, they have to check it first. Kind of like optional<MyResource>.
  - tialaramex5 months ago
    When I looked into the history of the C++ move (which after all didn't even exist in C++ 98 when the language was first standardized) I discovered that in fact they knew nobody wants this semantic. The proposal paper doesn't even try to hide that what programmers want is the destructive move (the thing Rust has) but it argues that was too hard to do with the existing C++ design so...
    The more unfortunate, perhaps disingenuous part is that the proposal paper tries to pretend you can make the destructive move later if you need it once you've got their C++ move.
    But actually what they're proposing is that "move + create" + "destroy" = "move". So, that's extra work it's not the same thing at all and sure enough in the real world this means extra work, from compilers, from programmers and sometimes (if it isn't removed by the optimiser) from the runtime program.
    reactordev5 months ago
    C++ is riddled with “good enough” without completeness. Resulting in more bandaids to the language to fix stuff they half implemented in the first place.
    5 months ago
    undefined
    aw16211075 months ago
    > When I looked into the history of the C++ move (which after all didn't even exist in C++ 98 when the language was first standardized) I discovered that in fact they knew nobody wants this semantic. The proposal paper doesn't even try to hide that what programmers want is the destructive move (the thing Rust has) but it argues that was too hard to do with the existing C++ design so...
    > The more unfortunate, perhaps disingenuous part is that the proposal paper tries to pretend you can make the destructive move later if you need it once you've got their C++ move.
    For reference, I think N1377 is the original move proposal [0]. Quoting from that:
    > Alternative move designs
    > Destructive move semantics
    > There is significant desire among C++ programmers for what we call destructive move semantics. This is similar to that outlined above, but the source object is left destructed instead of in a valid constructed state. The biggest advantage of a destructive move constructor is that one can program such an operation for a class that does not have a valid resourceless state. For example, the simple string class that always holds at least a one character buffer could have a destructive move constructor. One simply transfers the pointer to the data buffer to the new object and declares the source destructed. This has an initial appeal both in simplicity and efficiency. The simplicity appeal is short lived however.
    > When dealing with class hierarchies, destructive move semantics becomes problematic. If you move the base first, then the source has a constructed derived part and a destructed base part. If you move the derived part first then the target has a constructed derived part and a not-yet-constructed base part. Neither option seems viable. Several solutions to this dilemma have been explored.
    <snip>
    > In the end, we simply gave up on this as too much pain for not enough gain. However the current proposal does not prohibit destructive move semantics in the future. It could be done in addition to the non-destructive move semantics outlined in this proposal should someone wish to carry that torch.
    [0]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2002/n13...
    binary1325 months ago
    Now that would be a cool first proposal and implementation. I wonder if there’s any prior art in C++ yet.
    aw16211075 months ago
    If there is any prior art I'm not aware of it. The problems described in the part I snipped out around how destructive moves would work with class hierarchies sound thorny, for what it's worth.
  - triknomeister5 months ago
    Destructive vs non-destructive move.
- Someone5 months ago
  > For example, a moved-out-from tree in C++ could represent this by having its inner root pointer be nullptr, and then its dtor would have to check for the root being nullptr,
  delete null is fine in C++ [1], so, assuming root either is a C++ object or a C type without members that point to data that also must be freed, its destructor can do delete root. And those assumptions would hold in ‘normal’ C++ code.
  [1] https://en.cppreference.com/w/cpp/language/delete.html: “If ptr is a null pointer value, no destructors are called, and the deallocation function may or may not be called (it's unspecified), but the default deallocation functions are guaranteed to do nothing when passed a null pointer.”
- spacechild15 months ago
  In practice, move operations typically just leave an empty object behind. The destructor already has to deal with that. And of course you can't call certain methods on an empty object. So in practice you don't need special logic except for the move operations themselves.
  - Dylan168075 months ago
    > The destructor already has to deal with that.
    That's partly true, partly circular. Because moves work this way, it's harder to make a class that doesn't have empty states, so I don't design my class to avoid empty states, so the destructor has to handle them.
    motorest5 months ago
    > That's partly true, partly circular.
    I don't think there is anything "partly" about it being true. A moved-from object is expected to remain valid and preserve class invariants. If you wrote a class whose objects fails to remain valid after being moved,you wrote bugs into your code.
    > Because moves work this way, it's harder to make a class that doesn't have empty states, so I don't design my class to avoid empty states, so the destructor has to handle them.
    You are not required to implement an empty state. You are only required to write your classes so that after moving an object it remains valid. You are free to specify what this means to your classes, and can be anything from leaving the object as if it was default initialized or have literally a member variable such as bool moved. It's up to you. In C++'s perspective as long as your moved-from object can be safely destroyed them it's all good. Anything else is the behavior you chose to have, and bugs you introduced.
    mort965 months ago
    It's not like it's the only part of the language that mandates a default constructor though. There are plenty of situations where default-constructible types are desirable. Even simple things like having a non-default-constructible type in a map is awkward.
    masklinn5 months ago
    > It's not like it's the only part of the language that mandates a default constructor though
    It’s… not a part of the langage which mandates a default ctor in the first place.
    motorest5 months ago
    > It’s… not a part of the langage which mandates a default ctor in the first place.
    Why should it, tough? Think about it. The goal of move semantics is performance, mainly avoiding to copy/initialize expensive objects using a standard syntax. Why do you believe it would be a good idea to force constructors when they can very well be the reason why move should be used?
    masklinn5 months ago
    Did you reply to the wrong comment?
    mort965 months ago
    It doesn't, but it does mandate that the object has some "empty state". If you have an empty state you might as well have a default constructor which initializes the object to that empty state.
    masklinn5 months ago
    moved-from objects are not in an empty state but in an unspecified state, they are only required to be destructible, every other operation can be disallowed. That is not a useful state for default construction. Thus being movable does not imply defaulting is any sort of good idea.
    The other way around makes more sense, but even then it is not systematic, if default construction is costly (allocation, syscall, …) then you don’t want to do that for a moved-from object which will just be destroyed, which is the fate of most.
    mort965 months ago
    > moved-from objects are not in an empty state but in an unspecified state, they are only required to be destructible, every other operation can be disallowed. That is not a useful state for default construction. Thus being movable does not imply defaulting is any sort of good idea.
    This is only true for standard library objects. The C++ standard specifies that e.g std::unordered_map will be in a "valid but unspecified state" when moved from. You can define your own classes to behave however you want, including defining a moved-from object to be identical to a default-constructed object.
    Regardless, you're missing the point. Even if the standard specified that every moved-from object can only be destructed and every other use was UB, you still need the moved-from object to be in some empty state. Your move constructor and move operator= needs to put the moved-from object into some state where it doesn't own any resources but can be safely destructed. There's typically little reason to not make this "doesn't own any resources" state available through a default constructor.
    > The other way around makes more sense, but even then it is not systematic, if default construction is costly (allocation, syscall, …) then you don’t want to do that for a moved-from object which will just be destroyed, which is the fate of most.
    If your class can be in some "empty" state that doesn't own any resources and doesn't require syscalls to construct, you want this to be both the default constructed state and the moved-from state. Default-constructible objects end up getting default-constructed then overwritten all the time in C++, such as by the common pattern 'my_unordered_map["foo"] = MyClass(...)' which will first default-construct a value and then call its move operator=.
    spacechild15 months ago
    Please give me an example for a class that needs to handle empty state in the destructor only because of move operations. These exist, but IME they are very rare. As soon as you have a default constructor, the destructor needs to handle the case of empty state.
    ninkendo5 months ago
    It’s not just the destructor you have to worry about, it’s all of the state accessible to callers.
    If you have any type that represents validated data, say a string wrapper which conveys (say) a valid customer address, how do you empty it out?
    You could turn it into an empty string, but now that .street() method has to return an optional value, which defeats the purpose of your type representing validated data in the first place.
    The moved-from value has to be valid after move (all of its invariants need to hold), which means you can’t express invariants unless they can survive a move.
    It is much better for the language to simply zap the moved-from value out of existence so that you don’t have to deal with any of that.
    spacechild15 months ago
    First, one shouldn't use a moved-from object in the first place (except for, maybe, reassigning it).
    Second, why can't the .street() method simply return an empty string in this case?
    > The moved-from value has to be valid after move (all of its invariants need to hold)
    The full quote from the C++ standard is: "Unless otherwise specified, such moved-from objects shall be placed in a valid but unspecified state" AFAIK, it only makes such requirements for standard library types, but not for user defined types. Please correct me if I'm wrong.
    ninkendo5 months ago
    > First, one shouldn't use a moved-from object in the first place (except for, maybe, reassigning it).
    It still requires you to come up with somethkng to do to the old value in the move constructor. What would you do in the ValidatedAddress case? Set a flag in the struct called “moved_from” and use that to throw an exception if it’s ever used? Wouldn’t it be nice if you just didn’t need to worry about it?
    > Second, why can't the .street() method simply return an empty string in this case?
    In this example I’m referring to a type that represents a “validated” address, so, one that has already passed checks to make sure the street isn’t empty, etc. (it’s the whole “parse, don’t validate” idea, although I’ve never understood why the word “parse” is used when I would’ve just called it “validate just once”.)
    It is an extremely useful concept for your type system to represents invariants in your data like this. Having to make every type contain an “empty” case, just to make the language’s move semantics work, pokes an enormous hole through this idea.
    > AFAIK, it only makes such requirements for standard library types, but not for user defined types
    It makes the requirement because the compiler is not going to stop anyone from using the moved-from value, so you have to think of something to do in the move constructor. You can pinky-swear to never use the moved-from value in your own code (and linters can help here) but the possibility still exists, so it must be solved for.
    spacechild15 months ago
    > Having to make every type contain an “empty” case, just to make the language’s move semantics work, pokes an enormous hole through this idea.
    Nobody says that the invariants must hold after the object has been moved-from! The only thing you need to do is make sure that the destructor can run and do the right thing.
    > You can pinky-swear to never use the moved-from value in your own code (and linters can help here) but the possibility still exists, so it must be solved for.
    Letting the program crash would be a valid solution (for your own types).
    For me the issue with C++ move semantics is not so much that you have to add special logic to your classes, but the fact that moved-from objects can be accessed in the first place. In this respect I definitely agree that destructive moves are better.
    motorest5 months ago
    > Wouldn’t it be nice if you just didn’t need to worry about it?
    Do you worry about it? I mean, to begin with, do you purposely try to reuse objects that you explicitly moved? If you do, in the very least you can be lazy and reassign a newly constructed object right after you explicitly move its contents, but I don't see any reason that would justify such a thing.
    Can you point out what you feel is the scenario that worries you the most?
    5 months ago
    undefined
    tialaramex5 months ago
    This means C++ is riddled with types that have unrelated "I'm empty" state inside them rather than this being relegated to a separate wrapper type. It's Tony's Billion Dollar Mistake but smeared across an entire ecosystem.
    The smart pointer std::unique_ptr<T> is an example of this, sometimes people will say it's basically a boxed T, so analogous to Rust's Box<T> but it isn't quite, it's actually equivalent to Option<Box<T>>. And if we don't want to allow None? Too bad, you can't express that in C++
    But you're right that C++ people soldier on, there aren't many C++ types where this nonsense unavoidably gets in your face. std::variant's magic valueless_by_exception is such an example and it's not at all uncommon for C++ people to just pretend it can't happen rather than take it square on.
    spacechild15 months ago
    > This means C++ is riddled with types that have unrelated "I'm empty" state
    Again, these cases are still rare. Most classes either don't require user-defined move operations, or they have some notion of emptiness or default state.
    > And if we don't want to allow None? Too bad, you can't express that in C++
    That's actually a good example! Nitpick: you can express it in C++, just not without additional logic and some overhead :)
    7jjjjjjj5 months ago
    >you can express it in C++, just not without additional logic and some overhead :)
    How?
    spacechild15 months ago
    E.g. with a boolean member or by setting a bit in the pointer value.
    steveklabnik5 months ago
    (And that difference leads to an ABI difference that makes it not a zero overhead abstraction in the way that Box is…)
    spacechild15 months ago
    Great point! Chandler Carruth explained this in one of this cppcon talks: https://youtu.be/rHIkrotSwcc?t=1047
    sgsjchs5 months ago
    A socket.
    spacechild15 months ago
    How so? Doesn't your socket class have a default constructor and a notion of open and closed?
    sgsjchs5 months ago
    If the moves were destructive, I'd design it to have the default constructor call `::socket` and destructor call `::close`. And there wouldn't be any kind of "closed" state. Why would I want it?
    spacechild15 months ago
    Your socket class would have no default constructor? And you would never want to close the socket before the object's lifetime ends? Really?
    sgsjchs5 months ago
    In this case, I would want the address family and protocol to be statically known, so it would have default constructor. But for example, a file might not have one, sure. As for closing before lifetime ends, why? I can just end lifetime. Wrap it in an optional if the type system can't figure it out like with a struct member.
    spacechild15 months ago
    > so it would have default constructor.
    And what's the underlying value of such a default constructed socket? I assume it would be -1 resp. INVALID_SOCKET, in which case the destructor would have to deal with it.
    > Wrap it in an optional if the type system can't figure it out like with a struct member.
    So you essentially must wrap it in an optional if you want to use it as a member variable. I find this rather pointless as sockets already have a well-defined value for empty state (-1 resp. INVALID_SOCKET). By wrapping it in a optional you are just wasting up to 8 bytes.
    Sure, you can implement a socket class like that, but it's neither necessary nor idiomatic C++.
    sgsjchs5 months ago
    > And what's the underlying value of such a default constructed socket? I assume it would be -1 resp. INVALID_SOCKET
    No, as explained, the default value would be the result of `::socket` call, i.e. a fresh OS-level socket.
    > So you essentially must wrap it in an optional if you want to use it as a member variable.
    No, you only must wrap it if you really want this closed state to exist.
    > Sure, you can implement a socket class like that, but it's neither necessary nor idiomatic C++.
    Obviously. Because the moves are not destructive. If they were, this design would be superior. And the wasted space for optional is solvable, just like for non-nullable pointers.
    spacechild15 months ago
    > If they were, this design would be superior.
    I see how destructive moves would slightly simplify the implementation, but what difference would it make apart from that? (Don't get me wrong, I totally think that destructive moves are a good idea in general, I just don't see the qualitative difference in this particular case.)
    > And the wasted space for optional is solvable, just like for non-nullable pointers.
    In the case of non-nullable pointers the library author knows that they can use NULL as a sentinel value and write a corresponding specialization. But what could you possibly do with an arbitrary user-defined class?
    sgsjchs5 months ago
    > what difference would it make
    The same difference as making pointers always non-nullable and reintroducing nullability via an optional wrapper only when semantically appropriate.
    > what could you possibly do with an arbitrary user-defined class
    Just add some customization points to std::optional so that users can define which value of the class to treat as noneopt internally.
    spacechild15 months ago
    > The same difference as making pointers always non-nullable and reintroducing nullability via an optional wrapper only when semantically appropriate.
    Again, I don't see what this has to do with destructive moves. If you want a socket class that always refer to an open socket, you can already do that. Same for non-nullable pointer wrappers. Conversely, destructive moves don't prevent you from implementing a socket class with a close() method. These concepts are really orthogonal.
    > Just add some customization points to std::optional so that users can define which value of the class to treat as noneopt internally.
    How is this supposed to work? The very point of your socket class is that it always contains a valid socket handle. Once you introduce a sentinel value, you are back to square one. If the optional class is able to construct a socket with the sentinel value, so is the user.
    sgsjchs5 months ago
    > Again, I don't see what this has to do with destructive moves. If you want a socket class that always refer to an open socket, you can already do that.
    Technically you can, but it's unreasonable to create an os-level socket just to put into the moved-out object where it will be immediately destroyed again. This is not an issue when the moves are destructive.
    > How is this supposed to work? The very point of your socket class is that it always contains a valid socket handle. Once you introduce a sentinel value, you are back to square one. If the optional class is able to construct a socket with the sentinel value, so is the user.
    That's not true. The sentinel value need not be exposed in the public interface of the class, it can only be accessible via the customization point of the optional.
    spacechild15 months ago
    > Technically you can, but it's unreasonable to create an os-level socket just to put into the moved-out object where it will be immediately destroyed again. This is not an issue when the moves are destructive.
    No, the class can use a sentinel value internally only to mark moved-from objects. That's exactly where we actually started the conversation. That's why I said that destructive moves would only somewhat simplify the move operations, but not make a qualitative difference (in this area).
    > The sentinel value need not be exposed in the public interface of the class, it can only be accessible via the customization point of the optional.
    Since the optional would need to construct an instance with the sentinel value, I thought that the "sentinel" constructor must be public. However, you might be right that one could write a template specialization that contains the template argument as a friend class. In this case you could use a private constructor. Note that the destructor still has to handle the sentinel value... But I guess this is just something you have to accept.
    sgsjchs5 months ago
    > No, the class can use a sentinel value internally only to mark moved-from objects. That's exactly where we actually started the conversation.
    The issue is that the "moved-from" state is exposed to the user when the moves are not destructive. The author of the class has to consider behavior for every method in sentinel state, even when it's just to assert that the state isn't sentinel or "lol it's UB". And the user has to be careful not to accidentally misuse an object in sentinel state. Just like how every time you touch a nullable pointer you have to consider if it can be null and what to do in that case. As long as the sentinel state is exposed at all (via non-destructive move), there is little gain in not providing full support for it. However, with destructive moves the sentinel value either doesn't exist at all or only exists completely internally as an optimization, and all this mental overhead disappears.
    spacechild15 months ago
    I see your point. Just a few things:
    1. This is only relevant when using such class as a local variable. Member variables are typically not moved-from.
    2. In my understanding the user has the freedom to specify what constitutes a "valid but unspecified state" and it would be perfectly ok to mandate that anything you can do with a moved-from object is to either destroy or reassign it.
    3. The problems with the state of moved-from objects from the perspective of a library author could have been prevented simply by imposing stricter requirements in the standard (e.g. every usage except destruction, and possible reassignment, shall be UB).
    4. With all the issues you've pointed out, it is still be perfectly possible and reasonable to design a socket class your way (= no closed socket state) in C++, yet somehow most people seem to prefer open() and close() methods instead of modelling the state with an optional. Even in the presence of destructive moves, I don't think that one way is necessarily better than the other and it is mostly a matter of culture and personal preference.
    All the being said, I definitely agree that destructive moves are good thing, in particular if the compiler prevents you accidentally accessing moved-from objects (which is a mistake that is very easy to make in C++).
    sgsjchs5 months ago
    Indeed, the "valid but unspecified state" refers only to some types defined in the he standard library. It essentially means that you can only call methods which have no preconditions and don't depend on what that state is, e.g. assignment or destruction, or something like string::clear or vstring::assign if you want defined outcomes. In general each type is free to guarantee whatever the author wants about the moved from state, e.g. moved-from std::unique_ptr is always null.
    7jjjjjjj5 months ago
    With destructive moves, you can end an object's lifetime whenever you want.
    spacechild15 months ago
    How would I use such a socket class as a member variable? How do I reopen the socket?
    sgsjchs5 months ago
    Reopen by constructing and assigning a new socket.
    spacechild15 months ago
    So I essentially have to wrap it in something like std::optional. Well, that's certainly one way to write a socket class, but I'd say it's not idiomatic C++. (I have never seen a socket class being implemented like that.)
    sgsjchs5 months ago
    You don't need optional in this case, the assignment would just destroy the old socket and immediately move the new one in its place.
    spacechild15 months ago
    Well, reopening a socket implies that I have manually closed the socket, which does require an optional with your implementation.
masklinn5 months ago
> Granted, these repercussions of these defaults also result in (in my opinion) verbose language constructs like iter, into_iter, iter_mut ↩
Note that assuming the into_iter comes from IntoIterator that’s what the for loop invokes to get an iterator from an iterable. So
```
    for lr in LoadRequests.into_iter() {
```
Is completely unnecessary verbosity,
```
    for lr in LoadRequests {
```
Will do the exact same thing. And the stdlib will generally implement the trait with the relevant semantics on shared and unique references so
```
    for lr in LoadRequests.iter_mut()
```
Can generally be written
```
    for lr in &mut LoadRequests
```
So you rarely need to invoke these methods outside of functional pipelines if you dislike them (some prefer them for clarity / readability).
- Waterluvian5 months ago
  This is where I think linters can shine as educational tools. Underline either as an error and you’ve taught someone something that’s actually quite tricky to discover on your own.
  Similar to all the times I defensively str(something) in Python to find that “oh that has __str__ called on it anyways.”
  - bayesnet5 months ago
    When I was starting out in rust, replacing my IDE’s `cargo check` invocation with pedantic clippy (which has a lint for this use of `into_iter` [0]) was very useful in learning these parts of the language.
    [0]: https://rust-lang.github.io/rust-clippy/master/index.html#ex...
kiitos5 months ago
> I was specifically inspired by a performance bug due to a typo. This mistake is the “value param” vs “reference param” where your function copies a value instead of passing it by reference because an ampersand (&) was missing ... This simple typo is easy to miss
the difference between `const Data& d` and `const Data d` isn't accurately characterized as "a typo" -- it's a semantically significant difference in intent, core to the language, critical to behavior and outcome
even if the author "forgot" to add the `&` due to a typo, that mistake should absolutely have been caught by linting, tests, CI, or code review, well before it entered the code base
so not feelin' it, sorry
- eptcyka5 months ago
  If the implications of a one char diff are this egregious that they’re considered obvious, maybe it should take less cognitive effort to spot this? CI and tooling are great, but would be far less necessary if it was more difficult to make this mistake in the first place.
  - kiitos5 months ago
    if you are programming in C++ then you have opted-in to a set of syntax and semantic properties that are ancient and well-defined and core to the language. those properties include at a very basic level exactly the sigil under discussion here.
    it is not productive or interesting to characterize this absolutely core property of the language as "a one char diff" that takes any kind of special cognitive effort to spot
  - Disposal84335 months ago
    What do you suggest? Some kind of std::const_reference<Type>? Clang-tidy is enough in addition to the reviews.
    eptcyka5 months ago
    The person is arguing that it is a massive difference, not a typo. I am saying that if that is the case, then maybe the hamming distance between correct and buggy code that both compile should be greater than 1, regardless if more tooling can help solve the problem or not.
    I specifically take issue with this framing of it is not an issue for we have the tools to help with this, especially where the tools are not part of a standard distribution of a toolchain and require more than minimal effort. C++ has had many a warts for many decades, and the response has always been *you are just holding it wrong* and not running a well covering integration test suite with sanitizers on every commit, you just need to run one more tool in the CI, just a comprehensive benchmarking suite, have more eyes looking for a single char difference in reviews.
    dminik5 months ago
    The easiest* solution would be to do what rust does. You need to use & on both sides and error out on mismatch. Eg.
    fn foo(bar: &Bar) { ... }
    bar(&Baz)
    * This would be a breaking change, so a non-starter.
    Mesopropithecus5 months ago
    I'm seeing this way too often in production code, despite linters and reviews. So we have to keep plastering over.
    qalmakka5 months ago
    The problem is not the reference, the problem is implicit copies and the horses left the barn 40 years ago, it's too late to fix that. The only thing we can do right now is deleting or marking copy constructors explicit whenever possible
- lock15 months ago
  Disclaimer: I didn't have any production experience, only side projects in both C++ & Rust.
  I think the problem with `T &d` and `T d` is that these 2 declarations yield a "name" `d` that you can operate on very similarly. It's not necessarily about reference declaration `T& d` is 1 char diff away compared to value declaration `T d`.
  While there is a significant semantic difference between declaring things as a value and as a reference (&), non-static member function invocation syntax is the same on both `&d` and `d`. You can't tell the difference without reading the original declaration, and the compiler will happily accept it.
  Contrast this to `T *d` or `T d`. Raw pointers require different operations on `d` (deref, -> operator, etc). You're forced to update the code if you change the declaration because the compiler will loudly complain about it.
  It shares the same problem with a type system with nullable-by-default reference type vs an explicit container of [0..1] element Option<T>. Migrating existing code to Option<>-type will cause the compiler to throw a ton of explicit errors, and it will become a breaking change if it was a public API declaration. On the other hand, you're never able to feel safe in nullable-by-default; a public API might claim it never return `null` in the documentation, but you will never know if it's true or not only from the type signature.
  Whether it's good or bad, I guess it depends on the language designer's decision. It is certainly more of a hassle to break & fix everything when updating the declaration, but it also can be a silent footgun as well.
- Dylan168075 months ago
  It's const so you're not changing it, and you're not sneaking a pointer either. So what's the difference in intent?
- dzaima5 months ago
  Problem is it doesn't affect outcome at all unless you do mutation, and as such testing is irrelevant, but still can significantly impacts perf, and performance problems can take a while to surface; like, it may slowly grow from 0.1% of runtime to like 2%, low enough to not get get noticed at all at first, and still be too low to have significant thought put into it afterwards (but still way too high from a single missing character).
  And, as you said, this is a meaningful difference in intent, so linting can't just blanket complain on every single instance of a non-&-ed argument.
  And the difference in writing down intent is the wrong direction - doing a full nested object clone should require adding code in any sane language, whereas, in C++, making code clone takes.. negative one characters.
  Whereas in Rust, the only thing that's ever implicit is a bitwise copy on objects with constant size; everything else requires either adding &-s or .clone()s, or your code won't compile.
- cozzyd5 months ago
  yeah, I assumed this was going to be some sort of 100 screens of template error nonsense, not an obvious mistake (that is also trivial to find while profiling)
- qalmakka5 months ago
  The fact that implicit copies are a feature doesn't mean they were a good design choice to begin with. In new code I've started making the copy constructor explicit whenever I can, for instance, just to avoid this kind of shenanigans
fauigerzigerk5 months ago
I like Rust's approach to this. It's even more important when comparing with languages that hide value/reference semantics at the call site.
I've been writing some Swift code in recent years. The most frequent source of bugs has been making incorrect assumptions on whether a parameter is a class or a struct (reference or value type). C# has the same issue.
It's just a terrible idea to make the value/reference distinction at the type level.
WalterBright5 months ago
This is why the D programming language uses the keyword `ref` rather than the ampersand. Too many overlooked misteaks with the latter.
It extends it a bit, too, with `out` meaning that the referenced argument is initialized by the function, not read.
SuperV12345 months ago
Note that taking a 'const' by-value parameter is very sensible in some cases, so it is not something that could be detected as a typo by the C++ compiler in general.
- Animats5 months ago
  Right. Copying is very fast on modern CPUs, at least up to the size of a cache line. Especially if the data being copied was just created and is in the L1 cache.
  If something is const, whether to pass it by reference or value is a decision the compiler should make. There's a size threshold, and it varies with the target hardware. It might be 2 bytes on an Arduino and 16 bytes on a machine with 128-bit arithmetic. Or even as big as a cache line. That optimization is reportedly made by the Rust compiler. It's an old optimization, first seen in Modula 1, which had strict enough semantics to make it work.
  Rust can do this because the strict affine type model prohibits aliasing. So the program can't tell if it got the original or a copy for types that are Copy. C++ does not have strong enough assurances to make that a safe optimization. "-fstrict-aliasing" enables such optimizations, but the language does not actually validate that there is no aliasing.
  If you are worried about this, you have either used a profiler to determine that there is a performance problem in a very heavily used inner loop, or you are wasting your time.
- spacechild15 months ago
  Yes. For example, if an argument fits into the size of a register, it's better to pass by value to avoid the extra indirection.
  - vitus5 months ago
    > if an argument fits into the size of a register, it's better to pass by value to avoid the extra indirection.
    Whether an argument is passed in a register or not is unfortunately much more nuanced than this: it depends on the ABI calling conventions (which vary depending on OS as well as CPU architecture). There are some examples where the argument will not be passed in a register despite being "small enough", and some examples where the argument may be split across two or more registers.
    For instance, in the x86-64 ELF ABI spec [0], the type needs to be <= 16 bytes (despite registers only being 8 bytes), and it must not have any nontrivial copy / move constructors. And, of course, only some registers are used in this way, and if those are used up, your value params will be passed on the stack regardless.
    [0] Section 3.2.3 of https://gitlab.com/x86-psABIs/x86-64-ABI
- Rubberducky13245 months ago
  clang-tidy can often detect these. If the body of the function doesn't modify the value, for example.
  But it needs to be conservative of course, in general you can't do this.
weinzierl5 months ago
With Rust executing a function for either case deploys the “optimal” version (reference or move) by default, moreover, the compiler (not the linter) will point out the any improper “use after moves”.
```
    struct Data {
      // Vec cannot implement "Copy" type
      data: Vec<i32>,
    }

    // Equivalent to "passing by const-ref" in C++
    fn BusinessLogic(d :&Data) {
      d.DoThing();
    }

    // Equivalent to "move" in C++
    fn FactoryFunction(d: Data) -> Owner {
      owner = Owner{data: d};
      // ...
      return owner
    }
```
Is this really true?
I believe in Rust, when you move a non-Copy type, like in this case, it is up to the compiler if it passes a reference or makes a physical copy.
In my (admittedly limited) understanding of Rust semantics calling
```
     FactoryFunction(d: Data) 
```
could physically copy d despite it being non-Copy. Is this correct?
EDIT:
Thinking about it, the example is probably watertight because d is essentially a Vec (as Ygg2 pointed out).
My point is that if you see
```
     FactoryFunction(d: Data) 
```
and all you know is that d is non-Copy you should not assume it is not physically copied on function call. At least that is my believe.
- aw16211075 months ago
  > could physically copy d despite it being non-Copy. Is this correct?
  I believe the answer is technically yes. IIRC a "move" in Rust is defined as a bitwise copy of whatever is being moved, modulo optimizations. The only difference is what you can do with the source after - for non-Copy types, the source is no longer considered accessible/usable. With Copy types, the source is still accessible/usable.
- tialaramex5 months ago
  Well since you're saying "physically" I guess we should talk about a concrete thing, so lets say we're compiling this for the archaic Intel Core i7 I'm writing this on.
  On that machine Data is "physically" just the Vec, which is three 64-bit values, a pointer to i32 ("physically" on this machine a virtual address), an integer length and an integer capacity, and the machine has a whole bunch of GPRs so sure, one way the compiler might implement FactoryFuncton is to "physically" copy those three values into CPU registers. Maybe say RAX, RCX, RDX ?
  Actually though there's an excellent chance that this gets inlined in your program, and so FactoryFunction never really exists as a distinct function, the compiler just stamps out the appropriate stuff in line every time we "call" this function, so then there was never a "parameter" because there was never a "function".
  - weinzierl5 months ago
    True. When I wrote the comment I did not think about the Vec though.
    The point I am trying to make is more general:
    I believe that when you have a type in Rust that is not Copy it will never be implicitly copied in a way that you end up with two visible instances but it is not guaranteed that Rust never implicitly memcopies all its bytes.
    I have not tried it but what I had in mind instead of the Vec was a big struct that is not Copy. Something like:
    struct Big<const M: usize> { buf: [u8; M], } // Make it non-Copy. impl<const M: usize> Drop for Big<M> { fn drop(&mut self) {} }
    From my understanding, to know if memory is shoveled around it is not enough to know the function signature and whether the type is Copy or not. The specifics of the type matter.
    catlifeonmars5 months ago
    Wouldn’t you need a Pin<T> to guarantee no copying? I think copy has two different meanings, depending on whether you’re talking about the underlying memory representation and the logical representation that is available to the developer.
    Obviously the distinction can matter sometimes and thus copy in the logical sense is a leaky abstraction (although in practice I notice I do not see that leakage often).
    tialaramex5 months ago
    Yes, Rust absolutely might memcpy your Big when you move it somewhere.
    I will say that programmers very often have bad instincts for when that's a bad idea. If you have a mix of abilities and can ask, try it, who in your team thinks that'll perform worse for moving M = 64 or M = 32? Don't give them hours to think about it. I would not even be surprised to find real world experienced programmers whose instinct tells them even M = 4 is a bad idea despite the fact that if we analyse it we're copying a 4 byte value rather than copying the (potentially much bigger) pointer and taking an indirection
    Edited: To fix order of last comparison
    ninkendo5 months ago
    > I will say that programmers very often have bad instincts for when that's a bad idea
    True that. memcpy is basically the literal fastest thing your processor can do, it’s trivially pipelined and can be done asynchronously.
    If the alternative is heap storage you’re almost always cooked: that heap space is far less likely to be in L1 cache, allocating it takes time and requires walking a free list, dealing with memory fragmentation, freeing it when dropped, etc.
    It’s not a bad short-hand to think of the heap as being 10-100x slower than the stack.
- Ygg25 months ago
  Can't run Godbolt on my phone for some reason, but in this case I expect compiler to ignore wrapper types and just pass Vec around.
  If you have
  Vec<i32> // newtype struct struct Data{ data: Vec<i32> } // newtype enum in rust // Possibly but not 100% sure // enum OneVar { Data(Vec<i32>) }
  From my experiments with newtype pattern, operations implemented on data and newtype struct yielded same assembly. To be fair in my case it wasn't a Vec but a [u8; 64] and a u32.
  - tialaramex5 months ago
    The compiler isn't ignoring your new types, as you'll see if you try to pass a OneVar when the function takes a Vec but yes, Rust really likes new types whose representation is identical yet their type is different.
    My favourite as a Unix person is Option<OwnedFd>. In a way Option<OwnedFd> is the same as the classic C int file descriptor. It has the exact same representation, 32 bits of aligned integer. But Rust's type system means we know None isn't a file descriptor, whereas it's too easy for the C programmer to forget that -1 isn't a valid file descriptor. Likewise the Rust programmer can't mistakenly do arithmetic on file descriptors, if we intend to count up some file descriptors but instead sum them in C that compiles and isn't what you wanted, in Rust it won't compile.
    Ygg25 months ago
    > The compiler isn't ignoring your new types
    True, I didn't meant to imply you can just ignore types; I meant to say that the equivalent operations on a naked vs wrapped value return equivalent assembly.
    It's one of those zero cost abstraction. You can writ your newtype wrapper and it will be just as if you wrote implementations by hand.
    > My favourite as a Unix person is Option<OwnedFd>.
    Yeah, but that's a bit different. Compiler won't treat any Option<T> that way out of the box. You need a NonZero type or nightly feature to get that[1].
    That relies on compiler "knowing" there are some values that will never be used.
    [1] https://www.0xatticus.com/posts/understanding_rust_niche/
    tialaramex5 months ago
    You can't make your own types with niches (in stable Rust, yet, though I am trying to change that and I think there's a chance we'll make that happen some day) except for enumerations.
    So if you make an enumeration AlertLevel with values Ominous, Creepy, Terrifying, OMFuckingGoose then Option<AlertLevel> is a single byte, Rust will assign a bit pattern for AlertLevel::Ominous and AlertLevel::Creepy and so on, but the None just gets one of the bit patterns which wasn't used for a value of AlertLevel.
    It is a bit trickier to have Color { Red, Green, Blue, Yellow } and Breed { Spaniel, Labrador, Poodle } and make a type DogOrHat where DogOrHat::Dog has a Breed but DogOrHat::Hat has a Color and yet the DogOrHat fits in a single byte. This is because Rust won't (by default) avoid clashes, so if it asssigned Color::Red bit pattern 0x01 and Breed::Spaniel bit pattern 0x01 as well, it won't be able to disambiguate without a separate dog-or-hat tag, however we can arrange that the bit patterns don't overlap and then it works. [This is not guaranteed by Rust unlike the Option<OwnedFd> niche which is guaranteed by the language]
    Ygg25 months ago
    > You can't make your own types with niches in stable Rust
    You can, provided they are wrapper around NonZero types. See https://docs.rs/nonmax/latest/nonmax/
    Hence my comment before NonZero types or Rust nightly.
machina_ex_deus5 months ago
I would never have this typo as I usually delete the copy constructor in heavy structures.
- colonwqbang5 months ago
  Do you ever use the C++ standard library? Most types have a copy ctor defined, also the really "heavy" ones.
- lionkor5 months ago
  this is the defensive and correct C++ approach, anyways.
  - Ygg25 months ago
    Isn't that just same old "skill issue", "No True C(++) programmer" refrain?
    If people could keep entirety of J.2 appendix in their mind at all time we would not have these issues. And if they had entirety of J appendix in mind all C code would be portable.
    Or if people just always ran -Wall -Wpedantic -Wall_for_real_this_time -fsanitize=thread,memory,address,leaks,prayers,hopes,dreams,eldritch_beings,elder_gods -fno-omit-frame-pointer
    I mean if this was all it took then C and C++ programs would be as safe as Rust. Which is not what we see in practice. And it's not like C programmers are an average web dev. It's a relatively niche and well versed community.
    lionkor5 months ago
    Yes, it is the old "skill issue" argument.
    When your language is that unsafe and difficult to hold correctly, you have to make sure that you at least try your very best.
jmull5 months ago
This isn't a C++ vs. Rust thing.
If you care about performance, you measure it. If you don't measure performance, you don't care about it.
- Ygg25 months ago
  Problem is there is a huge number of pitfalls when measuring performance.
  You have to do it correct or you might be just measuring: when your system is pulling updates, how big is your username, the performance of the least critical thing in your app.
  And at worst you can speed up your least performing function only to yield a major slowdown to overall performance.
- tialaramex5 months ago
  That's a fair observation about performance, but I think this goes to correctness too. For some types copying them affects the program correctness, and so in C++ you're more likely to write an incorrect program as a result of this choice.
bubblebeard5 months ago
Great article. It think it raises a good point. An important aspect of modern programming languages should be to simplify the syntax, to help developers avoid mistakes.
This reminds me of arguing more than once with JS developers about the dangers of loose typing (especially in the case of JS) and getting the inevitable reply ”I just keep track of my type casting.”.
- lionkor5 months ago
  I don't think the syntax has to be simple, it just needs to be expressive
qalmakka5 months ago
The real issue is that C++ does implicit _deep_ copies by default on assignment and that you can't retrofit the language to change that. One quick, fast solution to avoid such shenanigans is to follow the one parameter `explicit` constructor rule religiously and always mark copy constructors explicit unless you know as a fact the type is trivially memcpy-able. This fixes most of the issues.
Another problem with C++ references is that they aren't really reference types, they are aliases, so they have wonky semantics and crazy nonsensical features like `const T&` doing lifetime extension
5 months ago
undefined
b0gb5 months ago
while doing math... would you call a missing sign a typo rather than a mistake? if so, anything can be a typo...
- 17186274405 months ago
  The difference between a typo and an error is what the author had in mind to write. A typo is a subtype of mistake.
rurban5 months ago
I guess he prefers the magic action at a distance pattern over functional and concurrency safeties. Then he should also mention it at least.
All good linters complain about const buffer data missing the ampersand btw
squirrellous5 months ago
This might be an unpopular opinion - I think const by-value parameters in C++ shouldn’t exist. Const reference and mutable values are enough for 99% cases, and the other 1% is r-value refs.
Regarding const by-value parameters, they should never appear in function declarations (without definition) since that doesn’t enforce anything. In function definitions, you can use const refs (which have lifetime extension) to achieve the same const-correctness, and const refs are better for large types.
Admittedly this further proves the point that c++ is needlessly complicated for users, and I agree with that.
- quuxplusone5 months ago
  Absolutely correct. Basically, C++ has value semantics — you pass arguments of type X like `void f(X x)`, and you return them like `X f()`, and that's good enough for a first approximation. (This is the only thing C lets you do.)
  The second refinement is that you can use `const X&` as an optimization of `X`. (Perfectly safe for parameters; somewhat treacherous for return values.) Passing by `X&` without the const, or by `const X` without the ampersand, are both typos, and you should regularly use tooling to find and fix that kind of typo.
  https://quuxplusone.github.io/blog/2019/01/03/const-is-a-con...
  And that's it, for business-logic code. If you're writing your own resource-management type, you'll need to know about `X(X&&)` and `X& operator=(X&&)`, but ordinary business-logic code never does.
  "What about `X&` for out-parameters?" Pass out-parameters by pointer. It's important and helpful to indicate their out-parameter-ness at the call-site, which is exactly what passing by pointer does. (And the pointer value itself will be passed by value, just like in C.)
  "What about return by const value, like Scott Meyers recommended 20–30 years ago?" No, don't do that. It disables the ability to move-assign or move-construct from the return value, which means it's a pessimization. Scott found this out, retracted that advice in 2009, and correctly issued the opposite advice in his 2014 book.
  https://quuxplusone.github.io/blog/2019/01/03/const-is-a-con...
  At work I use a Clang patched with "-Wqual-class-return-type" to report return-by-const-value typos — since, again, `const X getter()` is almost always a typo for `const X& getter()`.
  You can use that compiler too: https://godbolt.org/z/7177MTfb8
on_the_train5 months ago
> There are plenty of linters and tools to detect issues like this (ex: clang-tidy can scan for unnecessary value params)
Exactly, this is not an issue in any reasonable setup because static analysis catches (and fixes!) this reliably.
> but evidently these issues go unnoticed until a customer complains about it or someone actually bothers to profile the code.
No
- dvratil5 months ago
  This is my gripe with C++ - I have to have a CI pipeline that runs a job with clang-tidy (which is slow), jobs with asan, memsan and tsan, each running the entire test-suite, and ideally also one job for clang and one for gcc to catch all compiler warnings, then finally a job that produces optimized binaries.
  With Rust I have one job that runs tests and another that runs cargo build --release and I'm done...
  - on_the_train5 months ago
    That's a pretty heavy setup. Clang tidy is usually enough. And not slow when running locally on newly typed code in resharper for example.
- Dylan168075 months ago
  I think your estimate of how many C++ devs use linters is too high.
atoav5 months ago
As someone who programs both C++ and Rust, without even reading the article, my own experience with typos in those languages is:
Rust: Typo? Now it just doesn't compile anymore. Worst case is that the compiler does a bad job at explaining the error and you don't find it immediately.
C++: Typo? Good luck. Things may now be broken in so subtle and hard to figure out ways it may haunt you till the rest of your days.
But that of course depends on the nature of the typo. Now I should go and read the article.
- estebank5 months ago
  > Worst case is that the compiler does a bad job at explaining the error and you don't find it immediately.
  By the way, the project considers this a bug and accepts reports for that. In many occasions they are easy to fix. In others large refactors are needed. But being aware of the case is the necessary first step to making them better.
darig5 months ago
[dead]