Don't let dicts spoil your code(roman.pt)

197 pointsby juniperplant9 months ago31 comments

cardanome9 months ago
This is absolute key advice.
Another way to look at it is the functional core, imperative shell pattern.
Wrapping up your dict in a value object (dataclass or whatever that is in you language) early on means you handle the ugly stuff first. Parse don't validate. Resist the temptation of optional fields. Is there really anything you can do if the field is null? No, then don't make it optional. Let it crash early on. Clearly define you data.
If you have put your data in a neat value objects you know what is in it. You know the types. You know all required fields are there. You will be so much happier. No checking for null throughout the code, no checking for empty strings. You can just focus on the business logic.
Seriously so much suffering can be avoided by just following this pattern.
- sevensor9 months ago
  I like to think about this as securing the perimeter. Inside, everything is typed, static analysis constrains what can happen, and I am never surprised as long as the code type checks. Outside, data is probably garbage. All the effort goes into locking down the interface. Pydantic is ok for this, although I find it too intrusive for my taste, and I think mixing arbitrary validity predicates with structural correctness is a mistake. Still, I’d much rather walk into a codebase that uses Pydantic than one that assumes its inputs are valid, because confidently writing business logic that can assume its inputs are correct is incredibly liberating.
- aequitas9 months ago
  > Another way to look at it is the functional core, imperative shell pattern.
  A good explanation of this is: https://www.destroyallsoftware.com/talks/boundaries
- cruffle_duffle9 months ago
  Pydantic makes that stuff super simple too. It has all manner of data validation hooks as well as (de)serialization help.
- mcdeltat9 months ago
  The "loosey-goosey" approach to data in coding is one of my biggest pet peeves. Some people absolutely insist on making everything as dynamic as possible, and then wonder why we end up with a buggy mess. I always found it very natural to move as much as possible into the type system, because why wouldn't I want the machine to find all my inevitable mistakes for me?
  - majewsky9 months ago
    My personal favorite of this has got to be a particular mess of a Python API that led to me implementing "type veryFlexibleUint64": https://github.com/sapcc/limes/blob/9ea9d1f86383f8a5fe0fa1d1...
    pbronez9 months ago
    So brutal
  - 9 months ago
    undefined
- Kinrany9 months ago
  There's little to disagree with here, and yet this comment reads like a slogan soup.
jimmytucson9 months ago
Here’s an out-there take, but one I’ve held loosely for a long time and haven’t shed yet: dicts are not appropriate for what people mostly use them for, which is named access to member attributes.
dict is an implementation of a hash table. Hash table are designed for o(1) lookup of items. As such, they are arrays which are much bigger than the number of items they store, to allow hashing items into integers and sidestep collisions. They’re meant to act like an index that contains many records, not a single record.
A single record is more like a tuple, except you want named access instead of, title = movie[0], release_year = movie[1], etc. And Python had that, in NamedTuple, but it was kinda magical and no one used it (shoutout Raymond Hettinger).
Granted, this rant is pretty much the meme with the guy explaining something to a brick wall, in that dicts are so firmly entrenched as the "record" type of choice in Python (but not so in other languages: struct, case class, etc. and JSON doesn’t just deserialize to a weak type but I digress).
- fallingsquirrel9 months ago
  NamedTuples are great, but they let you do too much with the objects. You probably don't want users of your GitHubRepo class to be able to do things like `repo[1]` or `for foo in repo`. Dataclasses have more constrained semantics, so I reach for them by default. In my ideal world they would default to frozen=True, kw_only=True, slots=True, but even without those they're a big improvement.
- aatarax9 months ago
  Dicts in python are for when you have a thing and you aren't sure what the keys are. Dataclasses are for when you have a thing and you're sure what the keys (attributes are). The trouble is when you have a thing and you're sort of sure, but not entirely sure, and some things are definitely there but not everything you might be thinking of.
  - 9 months ago
    undefined
- jsyang009 months ago
  I think most modern Python codebases are using dataclasses/ something like Pydantic. I think dicts are mostly seen, like the author suggests, because something which you hacked up to work quickly ends up turning into actual software and it's too much work refactor the types
- jonathrg9 months ago
  dicts are used internally in the language to look up class and module attributes. They are optimized for this use case. How can it be wrong to use them that way when the very fabric of the language depends on it?
  namedtuple is widely used in Python code, especially before the introduction of dataclasses.
  - jpc09 months ago
    A hash function will always be more expensive than a pointer lookup, specially concidering a pointer lookuo is still needed after the hash function.
    No matter what you do, a lookup into an array will always be quicker than a hash lookup if you don't need to do a linear search, even in a lot of cases the linear search will be quicker.
    Structs in other languages is a lookup of pointer + and offset. Which to my knowledge is also true in python classes using __slots__. There's no reason to use a dict if you know the contents of the data, use a dataclass with slots=True purely because there's no hash function run on every lookup into the datastructure.
  - sickblastoise9 months ago
    It’s not wrong to use dicts, it’s just bad practice when you could use something like a dataclass or pydantic model instead.
    Dicts are useful for looking things up, like if you have a list bunch of objects that you need to access and modify, you should use a dict.
    If you are using the dict as a container like car={“make”:”honda”,”color”:”red”}, you should use a proper object like a class, dataclass, or pydantic model based on whether you need validation, type safety, etc. This drastically reduces bugs and code complexity, helps others reason about your code, gives you access to better tooling etc.
  - cruffle_duffle9 months ago
    Right? I thought pretty much all the higher level “objecty” stuff in python are dicts under the hood.
- travisjungroth9 months ago
  I think I once heard a Clojure talk where they were referred to as big and small maps. Small ones are what you’re comparing to arrays.
  A place where dicts for hard coded keys makes sense is notebooks. The convenience is worth it and it’s unlikely to get out of hand.
- seabrookmx9 months ago
  Subclassing NamedTuple is very ergonomic, and given they're immutable unlike data classes I often reach for them by default. I still use Pydantic when I want custom validation or when it ties into another lib like FastAPI.
  - psd19 months ago
    You know about frozenset, right? Dataclasses can be immutable.
    It's python, so take that with a grain of salt.
Jean-Papoulos9 months ago
>"unstructured data is problematic"
>"solution : use dataclasses"
Damn, it's almost like using an untyped language for large projects is not a great idea.
- mkesper9 months ago
  Python is absolutely typed. By default, it's really dynamic, though.
- ktosobcy9 months ago
  And yet we are overwhelmed by javascript nonsense... I get it - it's so easy to get up to speed with tiny snippets but it quickly becomes hot mess.
  Yes, decades ago I was also fascinated by python and it's ease of doing stuff (compiler doesn't complain that I missed something) but with time I grew fond of statically typed languages... they simply catch swaths of errors earlier...
  - kabes9 months ago
    Are we still overwhelmed by js? I almost only see TS code these days.
    ktosobcy9 months ago
    I had to deal with it ~1-2 years ago so for my faily ancient self it feels "recent" though considering that there were JS frameworks popping up every couple of months and were getting dropped a year later then my timeframe may be off ;)
- sosotrue9 months ago
  [flagged]
  - jmorenoamor9 months ago
    Dynamic languages demand self discipline, they teach you to respect runtime and think ahead of execution time.
    I've written software with both typed and untyped languages and never had problems (out of the ususal) with them.
    ktosobcy9 months ago
    > Dynamic languages demand self discipline, they teach you to respect runtime and think ahead of execution time.
    Ah... yes... because static languages doesn't do that by forcing you to properly model everything. And as a bonus you can easily navigate between everything and not fear that you miss something while refactoring...
    jrjrjrjrj9 months ago
    I would argue that dynamic languages make a compile time problem a run time problem...
    So yeah that small not hit portion of code can always be a time bomb if it does not get tested...
    consp9 months ago
    > So yeah that small not hit portion of code can always be a time bomb if it does not get tested...
    That has nothing to do with the language, and can happen in any language.
bigstrat20039 months ago
For better or for worse, Python doesn't do typing well. I don't disagree that I prefer well defined types, but if that is your desire then I think Python is perhaps not the correct choice of language.
- Ey7NFZ3P0nzAe9 months ago
  Personnaly I became a huge fan of beartype : https://pypi.org/project/beartype/
  Leyec, the magic dev behind it managed to make a full python type checker with super advanced features and about 0 overhead. It's crazy
  - skeledrew9 months ago
    I tried using it, but beartype quickly became a pain with having to decorate things manually. Then I found typeguard which goes even further and never looked back. Instead of manually decorating each individual function, an import hook can be activated that automatically decorates any function with type annotation. Massive QoL improvement. I have it set to only activate during testing though as I'm unsure of the overhead.
    Mattwmaster589 months ago
    It looks like beartype supports the same sort of implicit decoration, because there's mention of an explicit API:
    >Beartype now implicitly type-checks all annotated classes, callables, and variable assignments across all submodules of all packages.
    skeledrew9 months ago
    That definitely got implemented after I had already moved on.
    Ey7NFZ3P0nzAe9 months ago
    Iirc beartype is several orders of magnitude faster than typeguard so you might want to give it a try again
- nerdponx9 months ago
  Python does typing pretty darn well now for data like API requests and responses.
  "Typed Python" does poorly (compared to e.g. Typescript) on things like overloading functions, generics, structural subtyping, et al.
- est9 months ago
  > Python doesn't do typing well
  Golang does typing, but JSONs are PITA to handle.
  Try parsing something like `[{"a': 1, "b": "c", "d": [], "e": {}}, null, 1, "2"]` in go.
  Types are a bless as well as a curse.
  - Aditya_Garg9 months ago
    Thats only because your list has different types. Its a badly formed API and if you really need to support that use case then you can use maps and reflection to handle it.
    est9 months ago
    The problem is, programmers can't dictate what JSON should look like in the wild.
    We used to have strict typed XML. Nobody even bothered.
    a577219 months ago
    > The problem is, programmers can't dictate what JSON should look like in the wild.
    Not JSONs in general, but a sane API would never return something like that.
    > We used to have strict typed XML. Nobody even bothered.
    Nowadays there is OpenAPI, GraphQL, protobuf, etc. and people do bother about such things.
    mook9 months ago
    Unfortunately, a lot of the time you need to deal with other people's APIs.
    shiroiushi9 months ago
    >We used to have strict typed XML. Nobody even bothered.
    Yeah, because it was ugly as hell and not human-readable.
  - emmanueloga_9 months ago
    gjson [1] and a few other go packages offer a way to parse arbitrary JSON without requiring structs to hold them.
    re: Python. I like PyRight/PyLance for Python typing, it seems to "just work" afaict. I also like msgspec for dataclass like behavior [2].
    ---
    1: https://github.com/tidwall/gjson
    2: https://jcristharif.com/msgspec/
  - jpc09 months ago
    []inferface
    But the same issue exists as other dynamic languages, how do you know what the type is of the item you are accessing?
    If you know the array will be laid out exactly like that before you make the request you can always create a custom parser to return a struct with those fields name what they actually are instead of arbitrary data.
    The only valid way to parse that dynamically is to try and fail in a loop which is inefficient enough that you should stop using whatever API returns that monstrosity.
  - Turskarama9 months ago
    And if you got that JSON back in Python, how would you do anything with it? This API is essentially useless. You can deserisalise it, sure, but then what?
    est9 months ago
    I can get parsing job easily done without mental gymnastics.
    Turskarama9 months ago
    Right but what do you do with the parsed object? An array of random objects is used for what, exactly?
ungamedplayer9 months ago
Can someone educate me in why dicts are uncool for explained reasons, but clojure (which seems to be highly recommended on hn) seems to suffer the same issues when dealing with a map as a parameter (ring request etc).
I know how to deal with missing values or variability in maps, and so do a lot of people.. what am I missing here?
- bloppe9 months ago
  Dicts are great when the data is uniform and dynamic, like an address book mapping names to contact info. You never assume that a key must be in there. Lookups can always fail. That's normal for this kind of use-case.
  When the data is not uniform (different keys point to differently-typed values), and not as dynamic (maybe your data model evolves over time, but certain functions always expect certain keys to be present), a dict is like a cancer. Sure, it's simple at first, but wait until the same dict gets passed around to a hundred different functions instead of properly-typed parameters. I just quit my job tech at a company that shall remain nameless, partially because the gigantic Ruby codebase I was working on had a highly advanced form of this cancer, and at that point it was impossible to remove. You were never sure if the dict you're supplying to some function had all the necessary keys for the function it would eventually invoke 50 layers down the call stack. But, changing every single call-site would involve such a major refactor that everybody just kept defining their functions to accept these opaque mega-dicts. So many bugs resulted because of this. That was far from the only problem with that codebase, but it was a major recurring theme.
  I learned this lesson the hard way.
  - cornholio9 months ago
    This should be the top answer. It's not about using dicts in their primary use case, it's about abusing them as a catch all variadic parameter for quick prototyping and "future expansion"
  - scotty799 months ago
    I think the problem is that different data containers have completely different interfaces.
    If getting a filed of your object had the same syntax as getting a value from a dict you could easily replace dicts with smarter, more rigid types at any point.
    My dream is a language that has the containers share as much interface as possible so you can easily swap them out according to your needs without changing most of the code that refers to them. Like easily swap dict for BTreeMap or Redis.
    I think the closest is Scala but it fallen out of favor before I had a chance to know it.
- lispisok9 months ago
  Maps arent nearly as problematic in clojure because data is immutable by default on top of the functional paradigm where your program is basically a big composition of functions and the language is built around using maps. In Python I largely agree with the author. In clojure I love my maps.
  Here is Rich Hickey with an extreme counter example although I would argue he's really demonstrating against getters and setters. https://www.youtube.com/watch?v=aSEQfqNYNAc
- 9 months ago
  undefined
- nlitened9 months ago
  In Clojure, maps don’t have either of the flaws highlighted in the article. They are neither opaque (they are self-describing with namespaces keys) nor mutable.
  As a result, they are very powerful and simple to use.
- orf9 months ago
  They also work fine with JavaScript.
  The issue is that the concrete types are implicit. Depending on the language, runtime or type system expressing the type in a “better” way might be very hard or un-ergonomic.
Waterluvian9 months ago
I think one really nice thing about Python is duck typing. Your interfaces are rarely asking for a dict as much as they’re asking for a dict-like. It’s pretty great how often you can worry about this kind of problem at the appropriate time (now, later, never) without much pain.
There’s useful ideas in this post but I’d be careful not to throw the baby out with the bath water. Dicts are right there. There’s dict literals and dict comprehensions. Reach for more specific dict-likes when it really matters.
- turnsout9 months ago
  Duck typing is so fragile… Once you have implementations that are depending on your naming or property structure, you can’t update the model without breaking them all.
  If you use a real type, you never have to worry about this.
  - pistoleer9 months ago
    You would still have to update everything if you rename a field in a struct, what do you mean you never have to worry?
    dwattttt9 months ago
    If you use type checking, the breakage occurs when you introduce the change: the author of the change is the one who can figure out what it means if 'foo' is no longer being passed into this function.
    If you're duck typing, you find this out in the best case when your unit tests exercise it, and in the worst case by a support call when that 1/1000 error handling path finally gets exercised in production.
    pistoleer9 months ago
    I agree with that, in the context of dynamically typed languages.
    Slowly but surely, new languages are starting to develop with static duck typing. Implicit interfaces if you will.
    jcgl9 months ago
    Which languages are developing? This is something I’ve been wishing for.
    trealira9 months ago
    > static duck typing
    What do you mean by this? Macros? C++ templates?
    turnsout9 months ago
    Exactly… with strong typing, you can do the refactor automatically, because the IDE knows everywhere that symbol is used. (For codebases in your control—for third party users, you can indicate that something has been deprecated or renamed via a warning or other language feature)
  - zmgsabst9 months ago
    And now inserting every middleware is an exercise in retyping the system, rather than piggybacking on the parameter dict.
cle9 months ago
Dicts can be a problem, but this particular example isn't that great, like in this diagram from the article:
```
  External API <--dict--> Ser/De <--model--> Business Logic
```
Life's all great until "External API" adds a field that your model doesn't know about, it gets dropped when you deserialize it, and then when you send it back (or around somewhere else) it's missing a field.
There's config for this in Pydantic, but it's not the default, and isn't for most ser/de frameworks (TypeScript is a notable exception here).
Closed enums have a similar tradeoff.
- mjr009 months ago
  If external API adds a new field but your software already worked, you didn't need it in the first place, so why should it matter?
  Dropping unknown/unused fields makes sense in 99% of cases.
  - buzer9 months ago
    Unfortunately some APIs assume that they will get all the fields as part of the update. If field doesn't exist in the input it gets it will drop the original value during the update.
    _ZeD_9 months ago
    yet, again, most of the libraries already deal with extra fields... i.e. for pydantic https://docs.pydantic.dev/latest/concepts/models/#extra-fiel...
    vouwfietsman9 months ago
    I don't deal with external APIs often, but this is a development nightmare. You can't just magically let data flow through your system without knowing about it, because this is not how programming works. Your API has a contract and your code is written to support that contract, if the contract changes it should either be a very consciously decided breaking change that is versioned somehow, or it should be an unversioned non breaking change. Apparently whatever data is added like this is completely meaningless to your program so why do you need to be in charge of passing it back to the API.
    Changing your API and assuming everything just keeps working is a nonsense cowboy attitude to software compatibility, even if some frameworks bend over backwards to support it through magic that's hidden from the developer. Furthermore, many programming languages are simply incapable of doing this, and this approach to APIs is immediately restricting those languages from use.
    Finally, transforming objects to an internal domain model is really the cornerstone of a lot of recent well-thought-out programming discipline, and this API design is throwing that in the garbage. It's explicitly asking you to mess up your service architecture, spreading bad architecture like a virus to all systems that interact with the API.
Garlef9 months ago
I don't think dicts themselves are the problem.
In typescript using plain JS objects is very straightforward. Of course you have to validate the schema at your system boundaries. But you'll have to do this either way.
So: If this works very well in TS it can't be dicts themselves but must be the way they integrate into- and are handled in python.
This leads me to the conclusion that arguments presented in the article might be the wrong ones.
(But I still think, the conclusion the article arrives at is okay. But I don't think there's a strong case being made in the article about wether to prefer data classes or typed dicts.)
- soulchild779 months ago
  This. I think types really make the difference here. You can get very far with just plain old JS objects as long as you've got strong types in place.
hcarvalhoalves9 months ago
Debatable. Here's a counter-point:
https://www.youtube.com/watch?v=aSEQfqNYNAc
But ok, it's less bad in Python since objects are dicts anyway and you don't need getters.
fhdsgbbcaA9 months ago
Seems like the issue is less using dicts than not treating external APIs as input that needs to be sanitized.
- physicsguy9 months ago
  The code in the examples doesn't even check the API response code, let alone the structure of the response.
- pmarreck9 months ago
  Agreed. If you sanitize/allowlist API data you should not have issues with dicts.
  - imron9 months ago
    You'll have issues if you ever rename things in the dict.
    Linting tools will pick up on every instance where you forgot to rename the fields of a class, but won't do the same for dicts.
    FreakLegion9 months ago
    TypedDicts solve the linting problem, but refactoring tools haven't caught up (unlike e.g. ForwardRef type annotations, which are strings but can be transformed alongside type literals).
    tomjakubowski9 months ago
    Is there any advantage to using a TypedDict for a record over a dataclass?
    FreakLegion9 months ago
    TypedDicts "aren't real" in the sense that they're a compile-time feature, so you're getting typing without any deserialization cost beyond the original JSON. Dataclasses and Pydantic models are slow to construct, so that's not nothing.
    This of course means TypeDicts don't give you run-time validation. For that, and for full-blown custom types in general, I tend to favor msgspec Structs: https://jcristharif.com/msgspec/benchmarks.html#json-seriali....
    orf9 months ago
    > Dataclasses and Pydantic models are slow to construct
    Citation needed? Pydantic is really quite fast, and you can pass raw JSON responses into it.
    It may be slower (depending on the validators or structure), but I’d expect it to be comparably fast to the stdlib JSON module.
    newaccountman29 months ago
    > Citation needed? Pydantic is really quite fast
    Pydantic v1 was slow enough for them to write a lot of the core logic in Rust for Pydantic v2, and for the previous sloth to have been an argument people launched against it if you look back at threads on here and Reddit comparing it to other libraries.
    FreakLegion9 months ago
    Pydantic's JSON parsing is faster than the built-in module, on par with orjson, but creating model instances and run-time type checking net out to be much slower. I linked msgspec's benchmarks in the previous post.

cschneid9 months ago

I generally support this. When dealing with API endpoints especially I like to wrap them in a class that ends up being. I also like having nested data structures as their own class sometimes too. Depends on complexity & need of course.

    class GetThingResult
      def initialize(json)
        @json = json
      end
    
      # single thing
      def thing_id
        @json.dig('wrapper', 'metadata', 'id')
      end
    
      # multiple things
      def history
        @json['history'].map { |h| ThingHistory.new(h) }
      end
      ... two dozen more things
    end

Attummm9 months ago
Python has made its rise as an antithesis to Java thinking. Classes used to be seen by some in the community as an anti-pattern. [0] The coding style used to focus on "Pythonic-ness," which meant using Python's expressiveness to write code in such a way that type information could be inferred without explicitly stating the type.
Most developers will carry their previous language paradigms into their new ones. But if types, DDD (Domain-Driven Design), and classes are what you're looking for, then Python isn't the best fit. Python doesn't have compiler features that work well with those paradigms, such as dead code removal/tree shaking. However, starting out with dictionaries and then moving over to dataclasses is a great strategy.[1] As a small note, it's kind of ironic that the statically typed language Go took inferred typing with their := operator, while there is now a movement in Python to write foo: str = "bar".
[0] https://youtu.be/o9pEzgHorH0?si=pv0QQyM-iBrHuXUN
[1] https://docs.python.org/3/library/dataclasses.html
CraigJPerry9 months ago
This has merit in some cases but let me try to make a counterpoint.
You lose the algebra of dict’s - and it’s a rich algebra to lose since in python it’s not just all the basic obvious stuff but it’s also powerful things like dict comprehensions and ordering guarantees (3.7+ only).
You tightly couple to a definition - in the simple GitHubRepository example this is unlikely to be problematic. In the real world, coupling like this[1] to objects trying to capture domain data with dynamic structures is regularly the stuff of nightmares.
The over-arching problem with the approach given is that it puts code above data. You take what could be a schema, inert data about inert data, and instead use code. But it might also be an interesting case to consider as a slippery slope - if you can put code concerns above data concerns then maybe soon you will see cases where code concerns rank higher than the users of your software?
[1] - by coupling like this I mean the “parse don’t validate” school of thought which says as soon as you get a blob of data from an external source, be it a file, a database or in this case a remote service, you immediately tie yourself to a rocket ship whose journey can see you explosively grow the number of types to accurately capture the information needed for every use case of the data. You could move this parsing operation to be local to the use case of the data (much better) rather than have it here at the entry point of the data to the system but often times (although not always) we can arrive at a simpler solution if we are clever enough to express it in a style that can easily be understood by a newbie to programming. That often means relying on the common algebra of core types rather than introducing your own types.
- zmgsabst9 months ago
  You also make a nightmare of dynamically adding middleware — which can piggyback on a generic dict and have no meaningful way to insert themselves into your type maze.
cranium9 months ago
Python dataclasses are a good start for internal use. They are just a bit of a pain to serialize/deserialize natively. When it comes to that, I prefer to use Pydantic objects and have all the goodies, at the cost of some complexity.
xenoxcs9 months ago
I'm a big fan of using Protobuf for the third-party API validation task. After some slightly finniky initial schema definition (helped by things like json-to-proto.github.io), I can be sure the data I'm consuming from an external API is strongly typed, and the functions included in Protobuf which convert JSON to a Proto message instance blows up by default if there's an unexpected field in the API data it's consuming.
I use it to parse and validate incoming webhook data in my Python AWS Lambda functions, then re-use the protobuf types when I later ship the webhook data to our Flutter-based frontend. Adding extensions to the protobuf fields gives me a nice, structured way to add flags and metadata to different fields in the webhook message. For example, I can add table & column names to the protobuf message fields, and have them automatically be populated from the DB with some simple helper functions. Avoids me needing to write many lines of code that look like:
MyProtoClass.field1 = DB.table.column1.val
MyProtoClass.field2 = DB.table.column2.val
wruza9 months ago
Knew it was python before the first line of code. Python lacks ceremony-free data syntax, that’s why people use dicts. Dataclasses have to be named, initialized and imported, which is tedious. Much easier to just foo({name, age}) and let typings match, but python doesn’t have that. Lack of “POPO” is a design mistake.
pmarreck9 months ago
Less important in Elixir (where they are "maps") due to the immutable nature of them as well as the Struct type which is a structured map.
- mikhmha9 months ago
  Yup! I find Elixir makes it really intuitive to know when to represent a collection as a map and when to use a list of tuples. And its easy to transform between the two when needed.
- nesarkvechnep9 months ago
  Yes, usually my APIs in Elixir receive their arguments as a well-typed map, not stringly keyed, and transform them to structs which the core business logic expects.
Barrin929 months ago
It's a bit of an odd article because the second part kind of shows why dicts aren't a problem. You basically just need to apply the most old school of OO doctrines: "recipients of messages are responsible for how they interpret them", and that's exactly what the author advocates when he talks about treating dict data akin to data over the wire, which is correct.
If you're programming correctly and take encapsulation seriously, then whatever shape incoming data in a dict has isn't something you should take an issue with, you just need to make sure if what you care about is in it (or not) and handle that within your own context appropriately.
Rich Hickey once gave a talk about something like this talking about maps in Clojure and I think he made the analogy of the DHL truck stopping at your door. You don't care what every package in the truck is, you just care if your package is in there. If some other data changes, which data always does, that's not your concern, you should be decoupled from it. It's just equivalent to how we program networked applications. There are no global semantics or guarantees on the state of data, there can't be because the world isn't in sync or static, there is no global state. There's actually another Hickey-ism along the lines of "program on the inside the same way you program on the outside". Dicts are cool, just make sure that you're always responsible for what you do with one.
- alfons_foobar9 months ago
  I assume you're basically referring to this quote from the article?
  "Ignore fields coming from the API if you don’t need them. Keep only those that you use."
  IMO this addresses only one part of the problem, namely "sanitize your inputs". But if you follow this, and therefore end up with a dict whose keys are known and always the same, using something "struct-like" (dataclasses, attrs, pydantic, ...) is just SO much more ergonomic :)
scotty799 months ago
> Ignore fields coming from the API if you don’t need them. Keep only those that you use.
This is great if you know what you need from the start. If you only find out what you need after passing your data through multiple layers and modules of your system then you need to backtrack through all your code to the place of creation.
If you have immutable data structures then you have to backtrack through multiple places where your data is used from previous structures to create new ones to pass your additional data through all that.
So if your data travels through let's say 3 immutable types to reach the place you are working on then even if you know exactly where the new field that you need originates, you need to alter 3 types and 3 places where data is read from one type and crammed into another.
If you have a dict that you fill with all you got from the api there's zero work involved with getting the new piece of information that you thought you didn't need but you actually do. It's just there.
karmakurtisaani9 months ago
I've cleaned up code where input parameters came in a dict form. Absolute shit show.
- The only way to figure out which parameters are even possible was to search through the code for the uses of the dict.
- Default values were decided on the spot all over the place (input.getOrDefault(..)).
- Parameter names had to be typed out each time, so better be careful with correct spelling.
- Having a concise overview how the input is handled (sanitized) was practically impossible.
0/10 design decision, would not recommend.
pansa29 months ago
> convert [dicts] immediately to data structures providing semantics [...] You can simplify your work by employing a library that makes “better classes” for you
Python seems to have many different kinds of "better classes" - the article mentions `dataclass` and `TypedDict`, and AFAIK there are also two different kinds of named tuple (`collections.namedtuple` and `Typing.NamedTuple`).
What are the advantages of these "better classes" over traditional classes? How would you choose which of the four (or more?) kinds to use?
- pansa29 months ago
  To me, the proliferation of "better classes" implies there's a problem with Python's built-in classes - but what's wrong? Are they just too flexible and/or too verbose? Or actually deficient in some way?
  - zmgsabst9 months ago
    People enjoy the flexibility and many Python systems rely on duck-typing via dicts, etc.
    So people are trying to force Python to be something it isn’t in adherence to their ideology — but it fails to gain consensus because there’s a sizable cohort that use Python because it isnt those things.
    So we get repeated implementations, from each ideologically motivated group.
QuadrupleA9 months ago
Hard disagree on most of this. The immutability dogma for one (changing data is "the worst felony you can commit to your data"). Computing IS manipulation and transformation of data. The contortions people go through to try and sidestep that seem delusional.
Plus all this 1995-era OOP and domain-driven-design crap, "business logic" and data layers and all this other architectural rigidity and usually-needless complexity, layers of boilerplate (and then tools to automate the generation of that), etc.
If your function takes a dict, and is called from many different places, document the dict format in the function comment. Or yes, create a dataclass if it saves more trouble than its additional boilerplate and code and maintenance causes. But take it case by case and aim for simplicity. Most of the time I call out to an API in python, I process its JSON/dict response right after the call, using maybe 10% of the data returned. That's so much cleaner and simpler than writing a whole Data Object Layer, to be used by my API Interface Layer, to talk to my Business Logic layer, etc.
newaccountman29 months ago
I work with people who are ambivalent about this and believe using random dicts in a variety of places is a valid way to write Python code.
For these kinds of people, no amount of rational evidence or argument is going to convince them this is bad. They practically make an identity out of eschewing anything that seems too orderly or too designed.
(Luckily, at work, most of us on our team like `Pydantic` and also (some of us more than others) type-checking, so these people are dragged along)
est9 months ago
dicts are OK, because at least they do have a `key` and it does mean something.
un-annotated tuples and too many func params are cancer.
- stonethrowaway9 months ago
  No no,
  Un-annotated tuples and too many func params are OK, because at least they are pushed and popped from the stack.
  Calls and rets without a prologue and epilogue on the other hand…
  - est9 months ago
    > from the stack
    Or many, many stacks you can't comprehend nor amend.
    I dare to add a new `key` to a dict, can you modify a func call or a tuple with confidence?
  - 9 months ago
    undefined
- ramraj079 months ago
  Who does this still??
  - directevolve9 months ago
    In bioinformatics, one of our main dataflow platforms, Nextflow, is built with unnamed tuples in mind. Implementing the ability to conveniently pass data with HashMaps instead of unnamed tuples was a huge boost to usability for me.
    dijksterhuis9 months ago
    i really want to go on a rant about the general state and historical choices regarding data formats and data structures in bioinformatics, plus all the wheel reinvention.
    but i’m also trying to move on and do things differently today.
    let’s just say the situation is displeasing and leave it at that.
secretsatan9 months ago
I largely moved away from dictionaries when switching to swift. Usually we only use them now when going from legacy code. For the example in the article with JSON, Swift has the Codable protocol, which cleaned all my client code for our back end from the old NSJSONSerialization code.
greatgib9 months ago
I don't agree with the predicate, but I have to admit that the rest of the article is well written to list the different ways to give types to dicts when it is needed.
jimberlage9 months ago
If you want an example of a language where the exact opposite advice is taken at all times (with all the pitfalls described in this blog post), give Clojure a whirl.
thebeardisred9 months ago
FYI, posted in 2020, updated in 2021.
leoh9 months ago
Big structs as params in rust have similar issues
- saintfire9 months ago
  In what way? They're not opaque or mutable (by default).
  They can be unwieldy but they do define a pretty strongly typed API.
klyrs9 months ago
Lists and sets suffer the same drawbacks. If the advice is to not use any of the batteries included if the language, why are we using Python?
If you want an immutable mapping, why not use an enum?
- o11c9 months ago
  This isn't arguing against them in general, but against the unfortunate Javascript-esque abandonment of specified semantics.
  In particular, whenever anyone thinks that "deep clone vs shallow clone" is a meaningful distinction, that means their types are utterly void of meaning.
gotoeleven9 months ago
Personally I find it is often helpful to keep Dicts in a BigBag ie:
BigBag<Dict>
- likeclockwork9 months ago
  That's good eating.