Red/Green TDD(simonwillison.net)

1 pointby tomwphillips4 hours ago1 comment

tomwphillips4 hours ago
I agree it works well. Although as a long-time TDD practitioner it is mildly frustrating that it has taken LLMs to get more people to realise it works!
- eesmith2 hours ago
  I've always found TDD frustrating, especially in its Red/Green expression.
  I find that my tests become too low-level, as I build up component by component. This hinders large-scale refactorings because my mental planning wants to avoid the extra effort of rewriting the tests to any new interface.
  That refactoring can also some of the tests unnecessary, so it felt like I was going through extra work for a small benefit which wasn't worthwhile at that stage.
  I also found that red/green TDD's focus on "confirm that the tests fail before implementing the code to make them pass" (quoting the link) makes me think less about writing tests which aren't expected to fail, but if they do fail they indicate serious design problems.
  As an example, I once evaluated software package which was fully developed under TDD. It was a web app which, among other things, allowed anonymous and arbitrary users to download files with a URL like example.com/download?filename=xyz.txt
  There were no tests for arbitrary path traversal, and when I tried it out with something like filename=../../config.ini I got access to the server's config file.
  Now, it wasn't quite that bad. They required the filename end in only a handful of extensions, like ".pdf". Thing is, the developers didn't check for NUL characters, and their server was written in Java, which passes the string directly to the filesystem API, which in true C style expects NUL-terminated strings. My actual filename was more like "../../config.ini\0.pdf", with the NUL appropriately encoded as a URL path parameter. The Java code checked that the extension was allowed, then passed it to the filesystem call, which interpreted it as "../../config.ini", which gave me access to the system configuration - including poorly hashed admin passwords with almost no preimage resistance that I was able to break after a couple of hours of thinking about the algorithm.
  The explicit NUL test is needed as a security test in Java. In Python it's a different class of error as Python's filesystem APIs raise a ValueError if the string contains a NUL.
  I don't at all mean that good and useful software can't be written with TDD, nor that TDD is useless. Rather, it's that Red/Green TDD as a development practice appears to de-emphasize certain types of essential testing which don't fit into the red-green-refactor paradigm, but instead require a larger development methodology outside of TDD.
  As for me personally, I'm strongly influenced by rapid prototyping - "spike and stabilize" I believe it's called - where the code goes through possibly several iterations for the API and implementation to stabilize to the point where the overhead of writing automated tests is outweighed by their benefit.
  And these tests include tests which should pass, but which check boundary conditions, unexpected input, and the like.
  To say nothing of choosing the right way to hash passwords, which doesn't easily fit into any test-based framework. :)
  As to the specific tests in the linked-to piece, I think the tests are far from adequate. Consider the following test from the ChatGPT solution:
  def test_ignore_in_fenced_code_block(self): md = textwrap.dedent(""" # Real ```python # Not a header ## Also not ``` ## Real too """).lstrip("\n") self.assertEqual(extract_headers(md), [(1, "Real"), (2, "Real too")])
  That's the only test for fenced code blocks. The relevant code is:
  _FENCE_RE = re.compile(r"^[ \t]{0,3}(?P<fence>`{3,}|~{3,})(?P<info>.*)$") ... m_fence = _FENCE_RE.match(line) if m_fence: fence = m_fence.group("fence") char = fence[0] if not in_fence: in_fence = True fence_char = char fence_len = len(fence) else: if char == fence_char and len(fence) >= fence_len: in_fence = False fence_char = None fence_len = 0 i += 1 continue if in_fence or is_blockquote(line): i += 1 continue
  You can see the code requires fences starts with at least 3 ' or ~ characters, and that the close fence must match. However, there's no test for mismatches, nor a test for ~ fences, nor a test for mismatched fence length.
  Regular expression are really tricky to test correctly. Each term should be interpreted as a branch, and therefore tested, like a test for "``" as no-a-fence, and tests for leading whitespace, like "\t \t~~~~~~" as fence.
  For that matter, are leading tabs really allowed? https://github.github.com/gfm/#fenced-code-block says "indented no more than three spaces", "A space is U+0020" and "in contexts where whitespace helps to define block structure, tabs behave as if they were replaced by spaces with a tab stop of 4 characters".
  Also, the _FENCE_RE can drop the "(?P<info>.)$" as "info" isn't used, and the "." will match up to the end of line so "$" is guaranteed to match.
  In the TDD view, there's no reason to add that group in the first place, so why is it there?
  My point isn't that the code is right or wrong (though I do think the support for leading tabs is invalid). I'm rather pointing out that the code is incompletely tested, and not what I would expect from Red/Green TDD, because there is code and paths which aren't tested.