This seems REALLY bad for reliability? I guess the idea is that it's better to have things not respond to requests than to lose data, but the outcome described in the article is pretty nasty.
It seems like the solution they arrived at was to "fix" this at the filesystem level by making fsync no longer deliver reliability, which seems like a pretty clumsy solution. I'm surprised they didn't find some way to make etcd more tolerant of slow storage. I'd be wary of turning off filesystem level reliability at the risk of later running postgres or something on the same system and experiencing data loss when what I wanted was just for kubernetes or whatever to stop falling over.
And also in prod, etcd recommends you run with SSDs to minimize variance of fsync/write latencies