First reason, LLMs are modeled from what humans have been doing, and the have been writing software that way recently so it's easier to mimick that to get straight to results. This reason might fade away in the future.
Second reason, something related to impedance (mis)match, a signal processing notion (when the interface between two media is not well-suited, it is difficult to have a signal pass through).
Going through intermediate levels makes a structured workflow where each steps follows the previous one "cheaply". On the contrary, straight generating something many layers away requires juggling with all the levels at once, hence more costly. So "cheaply" above both means "better use of a LLM context" but also use regular tools where they are good instead of paying the high price (hardware+computation+environment) of doing it via LLM.
Interestingly, AIs are used to generate sample-level audio and some video, which may look like it contradicts the point. Still they are costly (especially video).
Runtime also matters; you can’t run assembly on the web.
Security mechanisms can also preclude assembly.
Etc.
FWIW, your question stopped short before the bottom turtle in the stack. Below assembly is machine code. So your question could rather be, why not emit machine code. Assembly is made for humans because we can understand it, but machine code is not really tractable for humans to engage with in a meaningful way.
Any 'public' (rate limited) web API (using CURL) from current AI inferences services?