HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74. No – 88(danunparsed.com)

235 pointsby sambellll5 hours ago18 comments

dvt2 hours ago
An alarming number of people don't understand that LLMs work via purely stochastic processes, so I'm happy to see in-depth pieces like this. I'm looking for a job and maybe this is why it's so hard to get a callback these days: resumes are just dumped in some LLM black hole and no one really knows how it works. The author says:
> temperature 0.1 — low, supposedly nudging the model toward deterministic outputs
This is not correct (and is briefly touched on later in the piece when he sets temperature to 0), temperature is not some kind of "deterministic" switch, but rather it affects the sampling distribution (which becomes more "spiky"—but is still very much a distribution).
- spwa44 minutes ago
  > An alarming number of people don't understand that LLMs work via purely stochastic processes ...
  I've been studying AI for 20 years. What really needs to be added to this statement is:
  "An alarming number of people don't understand that LLMs work via purely stochastic processes - and so does human thinking. People do NOT arrive at the same conclusion if merely the weather's different. Worse: with human thinking not only do most people not think this is real, a subset of people will actively fight the idea. Of course, depending on the weather"
- aesthesiaan hour ago
  A distribution with all probability mass on one outcome is deterministic, so in principle, setting temperature to 0 _should_ result in deterministic outputs. There are a few reasons it might not, but I don't think any of these apply when running a local model like the author did.
  - 31707036 minutes ago
    > so in principle, setting temperature to 0 _should_ result in deterministic outputs
    It is a common misconception, but it is not true even in principle. If I have 2 or more logits which are equal to the maximum of my logits, I will sample uniformly random from them with any temperature, even zero. Sampling from softmax([1, 0, 1]) is still stochastic at temperature 0, because the limit is to sample uniformly from the first or the last element.
    Anyway: "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs. GPUs put the associativity of the sums in matrix multiplications in arbitrary order, and this has a huge impact on the logits coming out of the neural network.
    EvgeniyZh23 minutes ago
    You don't have to sample uniformly. You could take the lowest index of all maxima. But yeah, the main source of randomness is non-deterministic matmul, and temperature does nothing with it
  - easygenesan hour ago
    There are. If the kernels are nondeterministic (e.g. timing issues) there are minor changes between runs, on a single system, even with eager decode enabled (typically what temperature=0 achieves).
  - valzaman hour ago
    I mean the easiest explanation would be that the model harness doesn't always take the most likely token but does top-k sampling or similar. temperatur just means that probabilities get more and more equalized, boosting the chance that an unlikely token gets picked. but even with temp 0 you could have 0.8 T1, 0.19 T2, ... and sometimes sample T2
    aesthesiaan hour ago
    No, this can't happen at temperature 0. The formula defining temperature-adjusted softmax isn't strictly defined at 0, but taking the limit (in the case where all logits are distinct) results in probability 1 being placed on the largest logit. Samplers will typically special case temperature 0 and pick the most likely token at each step.
    dvt44 minutes ago
    This is a very authoritative answer that should be more nuanced and caveated as implementation-dependent. In some cases, repetition penalties take precedence over sampling; top_k and top_p can also be handled before or after the temperature step. In other cases, `0` is turned into like 1e-10 or some super tiny float value (which can drift if you do any arithmetic with it). Routing, quantization, etc. can also have an effect on sampling. And yes, in some cases, setting temperature to 0 can mean "pure greedy decoding" which makes the decoder about as deterministic as it can get.
  - IshKebab44 minutes ago
    Setting the temperature to 0 should give deterministic results but that's not any better - it's just hiding the huge variance by only taking one sample.
- bluechair2 hours ago
  Willing to be corrected but I believe this type of automated resume filtering is illegal. Not saying it never happens but my understanding is it is not typical.
  - thayne2 hours ago
    I would expect that to depend on jurisdiction.
    I don't know for sure, but I would be surprised if it was illegal in my particular US state. You might be able to argue the AI has inherent biases that introduce illegal discrimination in the hiring process, but my understanding is winning I case like that would be very difficult, especially since most employers are very cagey about their hiring process and why they mades a decision.
  - small_scombrus2 hours ago
    They don't need to actually filter/blackhole to have have the same virtual effect.
    Show someone a list of resumes with an "applicant score*" and they'll naturally ignore the ones with a low ranking
    *scores are generated with AI, mistakes may be made, use only as a guide and verify results
  - ivan_gammelan hour ago
    In situations when you get hundreds of applications for one open position (real market now), whatever reduces your pool to the size a human can handle, works. You can preserve some diversity metrics in the process. This particular filtering is rather primitive, but LLM as a first filter can definitely do the job. You may burn less tokens than the hourly rate of your HR and it will be fairer than just dumping 50% of unread CVs in trash.
    36954868489282610 minutes ago
    Great until someone realises you’ve filtered out minority groups from the application process (most developers are men so maybe the LLM decided they’re the best fit, but you’ll never know exactly why it screwed your over) and you suddenly have an expensive lawsuit
  - dgellow30 minutes ago
    Illegal where?
- make325 minutes ago
  A more spikey distribution exactly makes the distribution closer to deterministic. That's not the point though. Even in greedy (deterministic) decoding, it is still a black box though that reacts in ways ways that are unpredictable to the inputs. Switching one word around might lead to different scores for example.
Aurornisan hour ago
> The default model is gemma3:4b
That’s a tiny model. No LLM is going to be a perfect and repeatable judge, but a tiny 4B model is like plugging an RNG into this system.
This whole exercise feels like someone vibe coded an ATS and got it to the point where the tests were passing because they decided they should have an open source ATS project.
- danpalmer14 minutes ago
  This sort of model is fine for small problems, when used in the right way. I think there's probably a version of Resume analysis that would work well with this model, but "hey clanker, what projects has this person done" is not the way. You need extraction, cleanup, probably OCR to compare and further clean up, multiple analysis passes per signal with LLMs, judges, etc. None of that needs to be large models, you'll get marginally better performance, but there's very little context, these models will perform well when used correctly.
ryukoposting2 hours ago
At this point we might as well adopt that joke where you blindly throw away half the resumes because you don't want to hire unlucky people.
gs17an hour ago
I'm a little confused, is this an ATS system that anyone actually uses? If not, I'm not sure how it's better than just asking ChatGPT to score your resume out of 100. Why would you want to optimize your resume for a system no one is using to score it?
- 40four10 minutes ago
  “I'm a little confused, is this an ATS system that anyone actually uses?”
  You read my mind. If the answer is “no”, then we can ignore this.
- petesergeant16 minutes ago
  (Almost) everyone’s using some kind of ATS, every ATS is adding AI auto-ranking (and has been trying to for 15 years), and almost all HR people feel like they have too many obviously bad CVs to read. Whether or not someone is using this ATS specifically, if you submit several CVs to several places, your CV is going into at least one magical 8-ball.
jerrythegerbil2 hours ago
> I fail 65% of the time. Same exact resume, different luck.
As someone who’s run hiring pipelines for technical roles in the past few years, that’s actually a fantastic number. I objectively hate saying that, but it’s true.
35% chance of elevating a technical individual to the next stage with no effort? I’ve seen as many as 100+ applicants an hour even when including a domain specific screener question. That’s 35 “screened” applicants in an hour. Were valid candidates screened out? Yes. Does you still have a candidate pool 35x larger than you need? Unfortunately, also yes.
The volume of applicants is SO HIGH such that your chances of getting moved to the next stage are actually markedly worse if AI isn’t involved. If you didn’t apply immediately (using an AI bot) there’s 50+ people ahead of you, and an exhausted technical leader if they ever make it to your resume.
Referral bonuses exist for a reason.
- ludicrousdisplaa minute ago
  So the logical solution is for candidates to submit multiple applications with slight variations to their contact info, "John Schmidt", "John J. Schmidt", "John J. J. Schmidt", "John Jacob J. Schmidt", "J. J. Jingleheimer Schmidt", etc.
- kyralis2 hours ago
  Is it? Or is it a 65% chance of a resume getting ignored before a single human sees it, reducing your pipeline's likelihood of catching qualified candidates by the same?
  Gates that reduce resume flow-through are only useful if their reduction is correlated with quality. Otherwise they're just dragging out your hiring process or unnecessarily causing you to ultimately lower your hiring bars.
  - jerrythegerbil2 hours ago
    > Gates that reduce resume flow-through are only useful if their reduction is correlated with quality.
    The volume is infeasible to review everyone for quality, even at an hour scale. The conclusion and solution is inevitable, though I wish it were different. 35% is actually really good if you’re not coming in through a referral.
    The current reality is <1% and the person reviewing you is exhausted.
    sevenzeroan hour ago
    What a inhumane way of looking at this. Hiring is deeply flawed, you know it, and yet you keep job postings open for weeks/months in case "the one" magically appears on your doorstep instead of just interviewing 10-20 people and just pick one...
    Corpo bullshittery at its finest.
    Brian_K_Whitean hour ago
    This reasoning isn't.
  - aesthesiaan hour ago
    So the question is: is the score given by this system correlated with candidate quality? I don't think this post gives enough data to know.
  - bagels2 hours ago
    The goal for the interviewer is to have a much higher ratio of good/bad candidates after the first screening. This means the more costly time you spend on the second step has a better return.
- PufPufPuf33 minutes ago
  In that case, I have a pre-screening system to sell you. Through state of the art technology, it only lets through the best* 1% of applications.
  *According to our proprietary, undisclosed, non-deterministic metric, which may or may not be Math.random
- spike02130 minutes ago
  there have got to be better ways to optimize pipelines. maybe set a limit on number of applications for a role based on the number you/your team can reliably go through them. if more are needed then open the role for another wave of applications.
- lowbloodsugaran hour ago
  Except the bit about ranking a decades long S3 engineer lower than an intern with GitHub repo.
- dvt2 hours ago
  [dead]
makeavishan hour ago
Hiring and job search has been so hard and AI has amplified the existing problems instead of solving any.
- sevenzeroan hour ago
  Wdym, cant you just litter your applications with buzzwords and other bs to automatically get a high score in these systems?
brikym24 minutes ago
So that's where the Windows XP file copy dialog author now works.
rkuska2 hours ago
This reminds me of my former CTO. He would take bunch of CVs and randomly throw some of them in a bin. He didn’t want to work with “unlucky” people.
- hahahaa2 hours ago
  The problem is with this system he only worked with unlucky people.
- psalaun2 hours ago
  I thought this was only an old urban legend; some people actually use this technique? Especially in a trade supposed to be led by people trained in sciences?
steve_j_choi2 hours ago
This could be used as a good way to self-evaluate one's current position from the company's point of view. you would tweak prompts and guidelines that are expected from the company and see how you score
- hahahaa2 hours ago
  I sort of hope we land on 2 agents, one working for the candidate and one for the employee do a screen round. Salary compatiability could be negotiated by a 3rd party bot that knows both parties ranges and what would be needed each end of range, and figure out yes/no worth going ahead. Such a time saver.
neyaan hour ago
I wonder how is this even legal? The only useful job the HR departments are ever required to do - they decide to automate it? Aside from being a daycare for adults, what exactly does HR accomplish? It's clearly NOT on the side of employees, but this seems like they're clearly NOT on the side of employers, either.
While resume's are being filtered left and right, they just make TikTok's on company's dime [1]. What a sad state of affairs.
[1] https://www.youtube.com/shorts/wSug80Vg5JU
dc3k2 hours ago
Disregarding the fact that this thing is completely broken, its grading rubric is ridiculous to begin with (as was mentioned in the article itself, but I must reiterate how completely stupid this is):
> 35 points for open source contributions
> 30 for personal projects
I don't contribute to open source or have personal projects because I don't spend my free time doing what I do 40 hours a week to make a living. My 15 years of work experience is worth a maximum of 25%, so any company using this idiotic system would pass on me immediately. Open source and personal projects are fine, but in no sane world are they worth 65% of a resume's score.
- adrianN2 hours ago
  They are selecting for people who are fine working in their free time. If you contribute to open source you are more likely to contribute to the company on weekends. If instead you have other hobbies or a family that takes up non-work hours you are more likely to drop your pen after forty hours.
  - matheusmoreiraan hour ago
    Maybe they're selecting for intrinsic motivation. People who enjoy programming to the point they do it for fun, not just because it pays.
    Free software work doesn't imply we work for free. We work on our projects, the stuff that we actually enjoy working on. Nobody is going to work on corporate products without adequate compensation.
    lukan41 minutes ago
    "Nobody is going to work on corporate products without adequate compensation."
    I guess there sadly are many nobodies who do this to hope to become somebody.
    matheusmoreira39 minutes ago
    If the open source work is part of a hiring pipeline, sure. Contribute to some repository and have it serve as a resume that gets you hired is also a form of compensation. If the work is also enjoyable, then it's a win either way.
  - emj2 hours ago
    You might have numbers on that but after working in a place with a strict no more than 40 hour policy my view is that people overwork for many reasons. Being an open source enthusiast is not one of them.
  - stevesimmonsan hour ago
    I'm not sure that follows. I stopped making open source contributions when I switched from mature companies to startups.
    Now all my "non-work" time is spent on startup work. And none of that is visible via GitHub.
cyberax2 hours ago
Ah... The AI learned the old HR trick: take 50% of resumes and throw them out without looking. Rationale: "we don't need unlucky losers".
quink2 hours ago
"A computer can never be held accountable, therefore a computer must never make a management decision."
an hour ago
undefined
yieldcrvan hour ago
this will get patched, as in I'll optimize my resume for this and so will many other people that any edge disintegrates
glouwbug2 hours ago
I guess at least HR doesn’t have to read 1,000 resumes. Heck, to be frank, could they make sense of the first 10 resumes?
mlpickeran hour ago
[flagged]
chonghaoju2 hours ago
[dead]