I used to write switch/if blocks for:
• 0 rows → “No results” • 1 row → “1 result” • n rows → “{n} results”
Which seems trivial in English, but gets messy once you support languages with multiple plural categories.
I wasn’t really aware of how nuanced plural rules are until I dug into ICU. The syntax looked intimidating at first, but it actually removes a lot of branching from application code.
I’ve been using an online ICU message editor (https://intlpull.com/tools/icu-message-editor) to experiment with plural/select cases and different locales helped me understand edge cases much faster than reading the spec alone.
(Fluent informed much of the design of MessageFormat 2.)
I18n / l10n is full of things like this, important details that couldn’t be more boring or fiddly to implement.
if alternatives don't start with a very strong case why gettext wasn't a good option, it's already a good indicator of not-invented-here syndrome.
IMHO pluralization is a prime example, with an API that only cleanly handles the English case, requires the developer to be aware of translation gotchas, and honnestly confusing documentation and format. Compare that to MessageFormat's pluralization example (https://github.com/unicode-org/message-format-wg/blob/main/s...) which is very easy to understand and fully in the translator's hands.
That’s not true at all? Gettext is functionally limited to source code being English (or alike). It handles all translation languages just fine, and competently so.
What is doesn’t have is MessageFormat’s gender selectors (useful) or formatting (arguably not really, strays from translations to locales and is better solvable with placeholders and locale-aware formatting code).
> fully in the translator's hands.
That is a problem that gettext doesn’t suffer from. You can’t reasonably expect translators to write correct DSL expressions.
Seems like to get it right for every use case / language, you would need functions to translate phrases - so switch statements may be a valid solution. The number of text elements needed for pagination, CRUD operations and similiar UI elements should be finite :)
Let's take your example. In English, counting files looks like this:
You have {file_count, plural,
=0 {no files}
one {1 file}
other {# files}
}
In Polish, there are several possible variants depending on the count: Masz 1 plik
Masz 2,3,4 pliki
Masz 5-21 pliko'w
Masz 22-24 pliki
Masz 25-31 pliko'w
Your Polish translators would write: Masz {file_count, plural,
one {# plik}
few {# pliki}
other {# pliko'w}
}
The library (and your translators) know that in Polish, the `few` variant kicks in when `i%10 = 2..4 && i%100 != 12..14`, etc. I think the library just knows these rules for each language as part of the standard. Mozilla says that it was an explicit design goal to put "variant selection logic in the hands of localizers rather than developers"The point is that it's supported, it simplifies developer logic, and your translators know how to work with it.
See https://www.unicode.org/cldr/charts/48/supplemental/language...
(Apologies if I got the above translation strings wrong, I don't speak Polish. Just working from the GNU gettext example.)
AFAIK, unlike gettext, MessageFormat doesn't allow you to specify a formula for the plural forms as part of the localization data, so the variant selection logic ended up in the hands of library developers rather than localizers or application developers.
And the standard does get updated occasionally, which can also lead to bugs with localization data written against another version of the standard: https://github.com/cakephp/cakephp/issues/18740
I wonder why it hasn't been adopted more widely.
It seems the last edit of the page was in 2019, so I'm not sure how up to date it is.
I imagine that I probably wasn't the only one driven away by that (and I gave it many attempts!).
.input {$var :number maximumFractionDigits=0}
.local $var2 = {$var :number maximumFractionDigits=2}
.match $var2
0 {{The selector can apply a different function to {$var} for the purposes of selection}}
* {{A placeholder in a pattern can apply a different function to {$var :number maximumFractionDigits=3}}}
Oof, that's a programming language already. And new syntax to be inevitably iterated on. I feel like we have too many of those already, from Python f-strings to template engines.I wish it'll at least stay small: no nesting, no plugins, no looping, no operators, no side effects or calls to external functions (see Log4J).
However, ideally / in most cases it isn't.
Some languages have more variations. E.g. Czech, Slovene and Russian has 1, 2-4 and 5 as different cases.
Personally I think the syntax is too brittle. It looks too much like TeX code and it has the lisp like deal with lines ending with too many } braces.
I would separate it into two cases: simple strings with just simple interpolation and then a more fuller markup language, more like a simplified xml.
There are more example code at https://github.com/unicode-org/message-format-wg/blob/main/d...
.local $hasCase = {$userName :ns:hasCase}
.match $hasCase
vocative {{Hello, {$userName :ns:person case=vocative}!}}
accusative {{Please welcome {$userName :ns:person case=accusative}!}}
* {{Hello!}}
But if anyone can find a good compromise, it's the Unicode team.That being said your project looks very cool!
https://github.com/Frizlab/XibLoc/blob/e85a5179bdd93e0174731...
Does anyone have reason for more optimism?
I don't see what's infeasible about it. It doesn't seem too different from .po files (gettext catalogs) meshed with hooks for post-processing as would see in e.g. a handlebars, both of which have individually found great adoption.
GP based his opinion on the assumption that this spec new and no implementations for it exist.
I mean they have hieroglyphs, some of which have plurals: https://www.unicode.org/charts/nameslist/n_13000.html
What I can say that it's a well-maintained format but also kinda hard to learn.
What is the equivalent of xgettext.pl, the file extension for the main catalog file `.po`, the __ function?
How does gender work (small example)? How does layering pt_BR on pt_PT work?
What is a compelling reason to switch?