I've seen other systems like this calibrate far more quickly by assigning a sort of score and confidence behind the scenes. Confidence starts out low and increases over time - correct/incorrect answers rapidly adjust score at the beginning, then things settle down.
In practice this means you get a sequence of increasingly uncommon words initially, until you get one wrong, then you drop back to something easier until you start getting things right again, and eventually circle around words at your level.
Also - too many clicks per word. It's low stakes, just let me click the definition once and I'll live if I misclick (or add an undo button).
This, and accept that people will have incorrect input and build it into the confidence. Even the smartest person in the world sometimes makes clerical errors, or has the wrong neuron fire at the wrong moment.
Zenzizenzizenzic for example.
Oh come on! Like you really knew what "Hippopotomonstrosesquippedaliophobia" is?
They’re also too far away. I’m on a laptop and I have to keep moving the cursor up and down just to confirm. Give each option a letter or number and let me press it to choose the answer¹.
¹ There is (was?) some service for forms which does that and it works quite well. I think it was Typeform, but I just opened the website to check and—of course—it’s now just plastered with mentions of AI so I lost interest in verifying.
I'm guessing it's testing our susceptibility to machine-generated compliments
What is?
> I'm guessing it's testing our susceptibility to machine-generated compliments
I fail to see the point. For one, the compliments aren’t particularly good or interesting; for another, I didn’t even read them (I just went back to check after your comment), I simply clicked when seeing green.
well the point would be to see how susceptible you are to that. They're figuring out where your cost vs reward tipping point is.
Anyway, if they were running metrics on that they just became useless because I automated responding to it a bunch of times.
I got tired after 8 words, looked at how many I'm suppose to know and gave up.
It'd be improved with statistical analysis; just progressively get harder and try to guess. If you wanted to gameify, you could update the stats after each answer.
I would suggest a bias in this test towards reading. More than a couple are words i know but rarely see in print. But maybe im too much a fan of british TV so i hear many of thier words without seeing them written down.
I suggest skipping the submit button and just showing it's correct when pressing and moving on after a sec or so. Having to click on submit twice really breaks the flow.
Also in all the words I tried I noticed out of the 4 options one is the correct one, another is the opposite of the correct one, and the other 2 are random stuff. You can basically skip any option whose antonym isn't present as well.
A tangent: writing distractors for multiple choice questions is hard. From the exams I know (excluding those whose nature precludes it, such as based on calculation or rote memorization) the only that does this brutally well is LEK (Polish medical graduate exam). It's nigh impossible to vibe guess it at more than random chance for someone outside the field.
I don't understand how they rank words though, some extremely common words like xenophobia were ranked as high as much more obscure ones.
xylo- = wood; -logy = study
Indeed from M-W: "a branch of dendrology dealing with the gross and the minute structure of wood"
In case of online quiz you can have a "competition" between distractors:
1. start by having much more distractors than needed and pick randomly
2. for each measure the probability of it getting clicked (clicks/times it's shown)
3. show the most frequently clicked distractors more often
Having an answer counted as incorrect, just because I've accidentally touched the screen of the phone? I would absolutely hate that.
You are correct. I tested that hypothesis about a dozen times and it seems that if you always pick the longest you’ll get it right somewhere in the high 70s to mid 80s. For anyone interested in testing for themselves, open the website to the first question then run this in the console (not going to spend time optimising it, it works well enough for the purpose):
let loopCount = 0
const loop = setInterval(() => {
Array.from(document.querySelectorAll("button")).slice(0, 4).reduce((long, curr) => curr.textContent.length > long.textContent.length ? curr : long).click()
setTimeout(() => Array.from(document.querySelectorAll("button")).at(-1).click(), 100)
setTimeout(() => Array.from(document.querySelectorAll("button")).at(-1).click(), 200)
loopCount++
if (loopCount === 100) clearInterval(loop)
}, 500)Core Basics 19/20
Intermediate 17/20
Advanced 19/20
Expert 14/20
Grandmaster 12/20
I guess, it's not too bad for a non-native speaker.
Minor feedback:
1. The correct answer for "Lethargic" is "Affected by lethargy". I think, definitions should not use words that share common root with the defined word, because:
a. it makes guessing too easy
b. it basically becomes a circular definition which is meaningless
2. Options almost always include 1 correct answer, 1 direct opposite and 2 completely random. Once you learn to recognise it, you can easily rule out 2 random options and have a 50/50 guess.
My shorter OED contains 163,000 words (compared to the 600,000 words of the longer).
According to this site I know 71,000 words... Let's test that against the OED. I should have about 43% chance if knowing a word picked at random.
In my totally scientific test (ha) I chose 50 words at random from the OED and discovered I knew 29 of them for a score of 58% which is more than two sigma from 43%, this disproving the hypothesis.
I forgot what that was now, but it was a fun experiment.
Your method of sampling could be improved further, unfortunately at the expense of ease of use. If the dictionary was sorted according to difficulty, then you could use stratified sampling.
I comment on the related aspects here.
If you force me to guess, then I'm going to guess. Not only does that give me a 25% chance of getting it right at random, but as others have pointed out, it is very hard to make a multiple choice question that isn't guessable by an astute enough test taker. I think I knew 80 - 85 of those words, but I scored 97, because those questions were very guessable.
Also, reiterating everyone else's comments with respect to the UX needing fewer clicks, and also the definitions not being exact or precise in many cases.
But then below it said "you are a man of few words".
I take it the latter is just because I've only done the test once? But it's mixed messaging on first attempt I think.
However, most native speakers have an active vocabulary between 15,000 and 35,000 words.
We must be geniuses, lol.That's always going to be smaller than the set of words for which a person can choose the correct definition out of four options.
Fun fact: according to a quick count by AI using web search, the previous sentence contains 21 words of Germanic origin, 2 of Latin origin, 2 of Greek origin and 1 of French origin. Also the etymology of the word Germanic is Latin, while that of the word French is Germanic
A lot of the more common and simpler words are Germanic, as is the grammar (e.g. compound words like cupboard).
At some point the word becomes both. Sourced from its mother language and maybe even still meaning the same thing in both, but no less an English word than any other at this point.
Latin isn't really any sort of parent to Old English afaik, even though the Romans ran Britain for a while.
I think bang-interro just didn't sound as nice and that's probably why it is called an interrobang.
Also add a keyboard focus state on the continue button.
I'm curious how the difficult is chosen because "obfuscate" was included in the hardest difficulty but I would not consider that to me a difficult word.
Also I found that some of the definitions were not completely correct.
Same strategies apply for guessing the unknown especially with a modicum(it was on the test!) of Latin knowledge..
Strange that pretty every one here is getting 70k estimates (93/100 for me).
Feels a bit high at least for me as a non-native speaker.
I got 2 words I knew wrong, and guessed about 5 unknown words correctly. Those were bizarre repetitive words I've never seen before.
I remember doing a similar test from a reputable university about 10-15 years ago also in an app format and only got about 30k estimate.
I'm not sure exactly how you did this, but I think you asked an LLM to come up with the wrong options. Two things to consider:
1. While the LLM can go r good options, they won't be always hard to guess. I wonder if instead you can have the LLM generate very close words (or skip using an LLM entirely) and put those as the options. 2. If you will generate options with an LLM, make sure you are mindful of its inability to shuffle things around. The correct answer was overwhelmingly the first or second option in the list. You should ask the model to give the options in a uniform order (say from true meaning then decreasing amount of replayability), then manually shuffle them so that the probability of which option (A, B, C or D) is always 25%.
1. Frame each option with one key (1,2,3,4). User press 2, select the second option
2. Let the user change options if they want until they press Enter. Enter submits the answer.
3. Once submitted, another Enter brings the next one
I do concur that a refined collection of incorrect proposed responses which includes selections among terms with semantic proximity, conflated synonyms and plausible morphology could refine the accuracy of evaluations; and if the test was intended to bestow authentic assessments of lexicographical capability this would in all probability become an efficacious approach, but as a simply presentable quiz for folks with sesquipedalian proclivities I was not unduly discomfited by anything moreso than the extraneous clicks leading to and following the display of dichotomous determinations.
I'd say I know 10 000 words tops.
I wonder if the test is calibrated to the fact that some answers are just well guessed? I am not a native English speaker, but I speak 3 languages overall and have basic notions in Latin, and I have to admit it helped a lot in "deciphering" a few words that I didn't know at all. And in at least 2 cases I just guessed correctly.
It would have paired well with an exposition of vanilla Monte Carlo and the benefits of stratified sampling.
Although stratified sampling is good, one can do better in this case by using adaptive sampling, where one uses a runtime (Bayesian) estimate of vocabulary to maximize information gain per question -- preferrentially sample from those strata where the current strata specific estimate has higher variance.
But to be honest many that might catch out a native speaker are just the Spanish/French/Latin word, so it was too easy in a way.
It's annoying that you need to click 3 times per question, and the buttons are in 2 different places.
Maybe would be better to just let me click the answer I want and then instantly show me the next question?
Also who is Sandi?
No offence mean to anyone, but the whole exercise feels very QI : superficial 'understanding' of a large range of things (for example words) without much of a connection between these words.
You are a person of few words, or perhaps just a mysterious one. Quite intriguing.”
—- This sounds more like a cute assessment of only getting two words right. And what do you mean “new words”? It wasn’t until eighty-odd words in that I actually got a word I didn’t know and had to guess by ruling out multiple-choice options.
Got 64,650: 20/19/17/18/12 (the intermediate one was a dumb mistake)
Some definitions were not great and alternatives a little silly at times but on the whole seemed pretty accurate.
Also probably needs calibrated as 96/100 was projected to 77k words, what would the estimate be for 100/100?
Might I suggest adaptive difficulty? After getting 10, 15, 20 correct in a row it should scale up the difficulty immediately, rather than waiting for 100 in the basic level 1...
I’m not sure how you’d gauge what knowing each word would indicate.
Also adequate options, that sound plausible.
I suppose the words must be weighed, because other people in the thread with more correct words got a not much higher estimate.
From the website with just one more click - like one more wafer thin mint.
<snip> According to the Oxford English Dictionary (Second Edition), there are approximately 171,476 words in current use.
However, most native speakers have an active vocabulary between 15,000 and 35,000 words. The Algorithm
We use Stratified Sampling. Instead of testing random words, we divide the language into 5 distinct difficulty bands based on frequency of use:
1. Core Basics~3,000 words
2. Intermediate~7,000 words
3. Advanced~10,000 words
4. Expert~25,000 words
5. The Obscure~40,000+ words
Calculation"If you answer 2 out of 3 'Intermediate' questions correctly, we estimate you know roughly 66% of the 7,000 words in that band."
Total Score = Σ (Accuracy in Band × Band Size) </clip>
Admittedly I had to guess several. It’s kind of an etymological deduction and estimation game at times.
From what I can tell they actually have a bit more robust science behind their algorithm (and a lot less questions to answer)
The two tests give me widely different results, probably because the sampled words aren't perfectly representative and so the results should have huge error bars to account for this sampling error.
One suggestion would be more convincing decoy choices, some were pretty silly. But I have no idea how they come up with them.
Are accoutrement and ziggurat really English words? Accoutrement is even pronounced as French!
As you say, the line is very very blurry.
Japanese loanwords really tickle my humour; バイト "Baito" : a casual, part-time, non-serious job. From the German "Arbeit" which is serious, macro-level employment or exertion.
Anything up to expert was obvious
quixotic, scrooge, shangri-la, Uncle Tom, gargantuan, kafkaesque, blurb, milquetoast
and words like cyberspace were first used in fiction
once real people use them, they stop being fictional words
And it didn't even tell me at the end how many words I know!
There is a similar variant of such a test where you just go down a list of words of increasing obscurity, ticking the ones you are familiar with. If you do this once or twice, you can get a fairly good estimate of the actual number of words you know.
Probably not too bad for a person whose native language is not English.
Fun!
My score: 78,000 words, 20/20/19/18/18.
"Verbose," for instance, is defined as "Using more words than are needed."
That's not exactly wrong, but it's kind of misleading. "Verbose" explicitly means using a large pile of words, drowning the reader in far more words than are strictly necessary.
"More words than are needed" could be as limited as "used a three-word construction in a sentence where it could have been one."
There are many more like this.
Please, I beg all of you - don't use LLMs to generate linguistic slop that claims to be linguistic education.
I weep for the world that is to come.
Level 0: Core Basics Abundant, Baffle, Candid, Dwell, Emerge, Frugal, Generic, Hinder, Impartial, Jovial, Knack, Lucid, Meager, Naive, Obsolete, Peculiar, Quench, Refute, Seldom, Tedious, Unique, Valid, Wary, Yearn, Zeal, Adequate, Barren, Coarse, Diligent, Esteem, Fickle, Gloom, Hoax, Ignite, Jolt, Keen, Linger, Mend, Numb, Omit, Pledge, Quota, Rural, Soothe, Toxic, Urge, Vow, Witty, Yield.
Level 1: Intermediate Acumen, Benevolent, Complacent, Dilapidated, Eloquent, Fabricate, Gregarious, Hypothetical, Imminent, Juxtapose, Lethargic, Meticulous, Nostalgia, Oblivious, Pragmatic, Reiterate, Scrutinize, Tentative, Ubiquitous, Verbose, Wane, Aesthetic, Bolster, Candor, Defer, Elicit, Furtive, Glut, Heed, Impeccable, Lament, Modicum, Notorious, Opulent, Plausible, Resilient, Stagnant, Trivial, Viable, Zenith.
Level 2: Advanced Alleviate, Breviary, Cacophony, Deferential, Ephemeral, Fastidious, Garrulous, Harangue, Iconoclast, Juggernaut, Laconic, Magnanimous, Nefarious, Obsequious, Paradigm, Recalcitrant, Sanguine, Taciturn, Ubiquity, Vacillate, Winsome, Zephyr, Abase, Banal, Capricious, Debilitate, Ebullient, Facetious, Gaikwar, Hackneyed, Idiosyncrasy, Jargon, Kindle, Labyrinth, Maverick, Narcissism, Ostracize, Palliate, Quagmire, Rancorous, Sagacity, Tantamount.
Level 3: Expert Abstemious, Bellicose, Chicanery, Deleterious, Enervate, Fatuous, Gauche, Hegemony, Inculcate, Jejune, Kowtow, Lugubrious, Mawkish, Nonsectarian, Obdurate, Pernicious, Quotidian, Recapitulate, Supercilious, Tempestuous, Unctuous, Vehement, Winnow, Xenophobe, Ziggurat, Acquiesce, Bombastic, Circumlocution, Desultory, Equinox, Fiduciary, Gerrymandering, Hubris, Incognito, Kinetic, Loquacious, Metamorphosis, Nihilism, Orthography, Precipitous, Quasar, Reparation, Soliloquy.
Level 4: Grandmaster (The Obscure) Accoutrement, Brobdingnagian, Crepuscular, Defenestrate, Equanimity, Flibbertigibbet, Grandiloquent, Hippopotomonstrosesquippedaliophobia, Ineffable, Jingoism, Kerfuffle, Logorrhea, Mellifluous, Obfuscate, Panacea, Quixotic, Rococo, Sesquipedalian, Tergiversate, Ultracrepidarian, Vicissitude, Weltschmerz, Xeric, Yclept, Zeitgeist, Absquatulate, Bumbershoot, Callipygian, Dord, Ergophobia, Fartlek, Gobbledygook, Houghmagandy, Interrobang, Kakistocracy, Lollygag, Mumpsimus, Nudiustertian, Omphaloskepsis, Pogonotrophy, Quire, Ratoon, Snollygoster, Tittynope, Ucalegon, Vagitus, Widdershins, Xylopolist, Yarborough, Zenzizenzizenzic.
* Correct word * Opposite definition * Another word's definition * Opposite of that word's definition
Which massively reduces the difficulty
I mean, select the word, then press check, then press continue.
It could be one single click and move to the next, show me my last result at the same time you ask me for the next one.
Then I was doing poorly in grandmaster, until I realize you can ace grandmaster by just picking the longest explanation every time.
Vibe coders need to be forced to spend one day learning basic CSS before they're allowed to use an LLM to make a website and the internet would be a lot more pleasant as we move forward with slopification.. It doesn't have to be sloppy, and doesn't take all that much studying to at least be able to steer an llm in the right direction to make something look nice. At this point everything is just the same 3 colors and a centered flex column with weird spacing.
3 clicks per is what gives it away. and the little compliments. and that it's 100 questions
English is not my native language. I get my vocabulary from browsing the Internet. There is no way I know that many words.
I use the language to understand not get an effect