So, as happened last week, if I’m interviewing for an Elixir dev I’m going to be interested in your knowledge of the BEAM and how it’s features can be used to solve common architectural problems.
Testing the candidate's ability to "steer" agents seems to be like testing their ability to know the Java API or to recite SOLID by heart.
> 2. How are you currently preventing these "prompt-only" developers from slipping through your own interview loops?
We don't ask anymore leetcode. We keep the usual systems design interview in which usage of AI is not needed (or at least we don't allow it because in this kind of interview we are more interested in seeing how the candidate thinks and so on)
We have a new stage in our job interview, though: generic Q/A about the fundamental of software engineering/computer science. Again, we don't care anymore how candidates produce code. We care about what they know, and what they don't know. What's the scope of their knowledge, and when do they need to rely on AI to come up with an answer. Silly (non-real) example: "Can you write a program that detects if another program halts?". The people we want are the ones who would say something about the Halting Problem but also perhaps be practical and perhaps ask more questions about such a program requirements.
You get the point: we look for people with a good breadth of knowledge, who can communicate well and know their shit. Whether they can use tool x or y (including LLMs), comes for granted for such people
I should definitely clarify my use of the word steering — I completely agree that testing prompt engineering is just the new API memorization, which is useless.
By steering, I mean putting them in a situation where the AI generates a plausible but architecturally flawed solution, and seeing if they have the fundamental knowledge to spot the BS, understand the scope of the problem, and fix it.
Basically, an automated way to test the exact critical thinking you mentioned.
I love your approach of dropping LeetCode for fundamentals Q/A and Systems Design. But out of curiosity, how do you scale that at the top of the funnel? Doing deep, manual 1-on-1 assessments gives the best signal by far, but doesn't that burn a massive amount of your senior engineers' time?