Parse, Don't Validate and Type-Driven Design in Rust(www.harudagondi.space)

77 pointsby todsacerdoti3 hours ago8 comments

dang2 hours ago
Recent and related: Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=46960392 - Feb 2026 (172 comments)
also:
Parse, Don’t Validate – Some C Safety Tips - https://news.ycombinator.com/item?id=44507405 - July 2025 (73 comments)
Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=41031585 - July 2024 (102 comments)
Parse, don't validate (2019) - https://news.ycombinator.com/item?id=35053118 - March 2023 (219 comments)
Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=27639890 - June 2021 (270 comments)
Parsix: Parse Don't Validate - https://news.ycombinator.com/item?id=27166162 - May 2021 (107 comments)
Parse, Don’t Validate - https://news.ycombinator.com/item?id=21476261 - Nov 2019 (230 comments)
Parse, Don't Validate - https://news.ycombinator.com/item?id=21471753 - Nov 2019 (4 comments)
(p.s. these links are just to satisfy extra-curious readers - no criticism is intended! I add this because people sometimes assume otherwise)
- hacker_homie2 hours ago
  [flagged]
hutaoan hour ago
Note that the division-by-zero example used in this article is not the best example to demonstrate "Parse, Don't Validate," because it relies on encapsulation. The principle of "Parse, Don't Validate" is best embodied by functions that transform untrusted data into some data type which is correct by construction.
Alexis King, the author of the original "Parse, Don't Validate" article, also published a follow-up, "Names are not type safety" [0] clarifying that the "newtype" pattern (such as hiding a nonzero integer in a wrapper type) provide weaker guarantees than correctness by construction. Her original "Parse, Don't Validate" article also includes the following caveat:
> Use abstract datatypes to make validators “look like” parsers. Sometimes, making an illegal state truly unrepresentable is just plain impractical given the tools Haskell provides, such as ensuring an integer is in a particular range. In that case, use an abstract newtype with a smart constructor to “fake” a parser from a validator.
So, an abstract data type that protects its inner data is really a "validator" that tries to resemble a "parser" in cases where the type system itself cannot encode the invariant.
The article's second example, the non-empty vec, is a better example, because it encodes within the type system the invariant that one element must exist. The crux of Alexis King's article is that programs should be structured so that functions return data types designed to be correct by construction, akin to a parser transforming less-structured data into more-structured data.
[0] https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...
strawhatguy2 hours ago
The alternative is one type, with many functions that can operate on that type.
Like how clojure basically uses maps everywhere and the whole standard library allows you to manipulate them in various ways.
The main problem with the many type approach is several same it worse similar types, all incompatible.
- fiddlerwoaroof2 hours ago
  Yeah, there's something of a tension between the Perlis quote "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures" and Parse, don't validate.
  The way I've thought about it, though, is that it's possible to design a program well either by encoding your important invariants in your types or in your functions (especially simple functions). In dynamically typed languages like Clojure, my experience is that there's a set of design practices that have a lot of the same effects as "Parse, Don't Validate" without statically enforced types. And, ultimately, it's a question of mindset which style you prefer.
  - strawhatguyan hour ago
    There's probably a case for both. Core logic might benefit from hard types deep in the bowels of unchanging engine.
    The real world often changes though, and more often than not the code has to adapt, regardless of how elegant are systems are designed.
- packetlost2 hours ago
  I don't really get why this is getting flagged, I've found this to be true but more of a trade off than a pure benefit. It also is sort of besides the point: you always need to parse inputs from external, usually untrusted, sources.
  - doublesocket2 hours ago
    Agree with this. Mismatching types are generally an indicator of an underlying issue with the code, not the language itself. These are areas AI can be helpful flagging potential problems.
- Rygianan hour ago
  This sounds like the "stringly typed language" mockery of some languages. How is it actually different?
  - 36 minutes ago
    undefined
- marcosdumayan hour ago
  There are more than two alternatives, since functions can operate in more than one type.
jaggederest2 hours ago
You can go even further with this in other languages, with things like dependent typing - which can assert (among other interesting properties) that, for example, something like
```
    get_elem_at_index(array, index)
```
cannot ever have index outside the bounds of the array, but checked statically at compilation time - and this is the key, without knowing a priori what the length of array is.
"In Idris, a length-indexed vector is Vect n a (length n is in the type), and a valid index into length n is Fin n ('a natural number strictly less than n')."
Similar tricks work with division that might result in inf/-inf, to prevent them from typechecking, and more subtle implications in e.g. higher order types and functions
- satvikpendem16 minutes ago
  Rust has some libraries that can do dependent typing too, based on macros. For example: https://youtube.com/watch?v=JtYyhXs4t6w
  Which refers to https://docs.rs/anodized/latest/anodized/
- VorpalWay2 hours ago
  How does that work? If the length of the array is read from stdin for example, it would be impossible to know it at compile time. Presumably this is limited somehow?
  - marcosdumayan hour ago
    If you check that the value is inside the range, and execute some different code if it's not, then congratulations, you now know at compile time that the number you will read from stdin is in the right range.
  - mdm122 hours ago
    One option is dependent pairs, where one value of the pair (in this example) would be the length of the array and the other value is a type which depends on that same value (such as Vector n T instead of List T).
    Type-Driven Development with Idris[1] is a great introduction for dependently typed languages and covers methods such as these if you're interested (and Edwin Brady is a great teacher).
    [1] https://www.manning.com/books/type-driven-development-with-i...
  - dernettan hour ago
    Not sure about Idris, but in Lean `Fin n` is a struct that contains a value `i` and a proof that `i < n`. You can read in the value `n` from stdin and then you can do `if h : i < n` to have a compile-time proof `h` that you can use to construct a `Fin n` instance.
  - ratorx2 hours ago
    It doesn’t have to be a compile time constant. An alternative is to prove that when you are calling the function the index is always less than the size of the vector (a dynamic constraint). You may be able to assert this by having a separate function on the vector that returns a constrained value (eg. n < v.len()).
  - jaggederest2 hours ago
    If the length is read from outside the program it's an IO operation, not a static variable, but there are generally runtime checks in addition to the type system. Usually you solve this as in the article, with a constructor that checks it - so you'd have something like "Invalid option: length = 5 must be within 0-4" when you tried to create the Fin n from the passed in value
- esafakan hour ago
  I wish dependent types were more common :(

This reminds me a bit of a recent publication by Stroustrup about using concepts... in C++ to validate integer conversions automatically where necessary.

https://www.stroustrup.com/Concept-based-GP.pdf

  {
     Number<unsigned int> ii = 0;
     Number<char> cc = '0';
     ii = 2; // OK
     ii = -2; // throws
     cc = i; // OK if i is within cc’s range
     cc = -17; // OK if char is signed; otherwise throws
     cc = 1234; // throws if a char is 8 bits
  }

fphan hour ago
The article quickly mentions implementing addition:
```
impl Add for NonZeroF32 { ... }
impl Add<f32> for NonZeroF32 { ... }
impl Add<NonZeroF32> for f32 { ... }
```
What type would it return though?
- alfons_foobar38 minutes ago
  Would have to be F32, no? I cannot think of any way to enforce "non-zero-ness" of the result without making it return an optional Result<NonZeroF32>, and at that point we are basically back to square one...
  - MaulingMonkey11 minutes ago
    > Would have to be F32, no?
    Generally yes. `NonZeroU32::saturating_add(self, other: u32)` is able to return `NonZeroU32` though! ( https://doc.rust-lang.org/std/num/type.NonZeroU32.html#metho... )
    > I cannot think of any way to enforce "non-zero-ness" of the result without making it return an optional Result<NonZeroF32>, and at that point we are basically back to square one...
    `NonZeroU32::checked_add(self, other: u32)` basically does this, although I'll note it returns an `Option` instead of a `Result` ( https://doc.rust-lang.org/std/num/type.NonZeroU32.html#metho... ), leaving you to `.map_err(...)` or otherwise handle the edge case to your heart's content. Niche, but occasionally what you want.
cmovq2 hours ago
Dividing a float by zero is usually perfectly valid. It has predictable outputs, and for some algorithms like collision detection this property is used to remove branches.
- woodruffw2 hours ago
  I think “has predictable outputs” is less valuable than “has expected outputs” for most workloads. Dividing by zero almost always reflects an unintended state, so proceeding with the operation means compounding the error state.
  (This isn’t to say it’s always wrong, but that having it be an error state by default seems very reasonable to me.)
  - an hour ago
    undefined
sam0x172 hours ago
btw the “quoth” crate makes it really really easy to implement scannerless parsing in rust for arbitrary syntax, use it on many of my projects
- IshKebaban hour ago
  Interesting looking crate. You don't seem to have any examples at all though so I wouldn't say it makes it easy!