But, more importantly, the amount of cycles that would be needed to text-wrap most websites is effectively zero. Most websites are simply not typesetting the volumes of text that would be needed for this to be a concern.
Happy to be shown I'm flat wrong on that. What sites are you envisioning this will take a lot of time for?
I've measured this, and no, it's not. What you're missing is the complexities of typesetting Unicode and OpenType, where GSUB/GPOS tables, bidi, ruby text, etc. combine to make typesetting quite complex and expensive. HarfBuzz is 290,000 lines of code for a reason. Typesetting Latin-only text in Times New Roman is quick, sure, but that doesn't cut it nowadays.
I would wager you can find scenarios where it is a large number. My question is if there are sites people use?
Seriously taken, these would all be reasons not to do many of the things Unicode does. And yet here we are.
That all said, if you have measurements, please share. Happy to be proven wrong.
- handling shy hyphens/hyphenation when splitting long words -- working out where to hyphenate so it is readable, then how that affects the available space takes time to compute to ensure that justified text doesn't result in large blocks of whitespace esp. for long words;
- handing text effects like making the first letter large, or the first word(s) larger, like is done in various books;
- reflow due to any changes resulting from text rendering/formatting (e.g. if applying kerning or hyphenation results in/does not result in text wrapping);
- impact of things like kerning and digraphs (e.g. ffi) on text width -- including determining if a word can be split in the middle of one of these or not;
- combining characters, emoji (with zero-width non-joiners), flag Unicode character pairs (including determining valid pairs to determine if/where to split on), etc.;
- mixed direction text (left to right and right to left) handling and positioning;
- the mentioned Ruby text (e.g. https://www.w3.org/International/articles/ruby/styling.en.ht...) -- dealing with both the main text wrapping and the above/below text wrapping, both of which could happen;
- for Chinese/Japanese/Korean ensuring that characters within a word don't have extra space, as those languages don't use spacing to delimit words;
- other things affecting line height such as sub/super script text (mathematical equations, chemical symbols, references, etc.).
Like, I am largely aware that it is a hard problem. So is just rendering text, at large. The added difficulty for justifying text is still not something I expect to impact the vast majority of sites. If you are willing to break your content into multiple pages, I hazard it isn't a significant chunk of time for most content.
Are there edge cases? Absolutely! Most of these are isolated in impact, though. And I do think a focus on whole content optimization is clouding a lot of people's view here. You are not doing yourself any favor by optimizing a full book every time a chapter changes.
There is also the idea that you have to find the absolute best answer for justifying text. Why? One that is low enough on a penalty score is just fine. Akin to the difficulty in finding all tilings of a board, versus just finding a single tiling for a board. Or a single solution to the N-Queens question, versus all solutions. If you just want a single solution, you don't need the raw compute necessary to get them all.
I hardly can open any website w/o some anti-bot check burning my CPU to the ground for 1/2 minute or something (if it doesn't manage to entirely crash my Firefox in the process like cloudflare). I rather would wait for 0.2s text wrapping than that, that's for sure. :)
Try zooming in and out with text-wrap: pretty vs text-wrap: wrap
Are there ereaders that have to typeset the entire text of what you are reading? What is the advantage of making the task harder?
Still, I should acknowledge you provided an example. It surprises me.
As often as the font size changes.
Even the few times I do change text size on my e-readers are largely mistakes. Having gestures to resize is frustrating, in the extreme.
When I'm home, I read books.
Can't speak to Apple Books, but at least Pages.app (and iWork in general) use a separate text engine from TextKit, focused on higher fidelity at the cost of performance -- optical kerning, etc. (Terminal.app also does not use TextKit.)
OpenStep used Display Postscript and was written in Objective-C; WebKit is written in C++.
Rendering text on the web is a different animal all together.
Again, I won't claim it is absolutely free. It is almost certainly negligible in terms of processing power involved with any of the things we are talking about.
* for most webpages. Of course you can come up with giant ebooks or other lengthy content for which this will be more challenging.
And is most of the effort from LaTeX in paragraph layout? Most slowness there, is in Mathematica typesetting, I would think.
Is that the reason the Microsoft Word team tells themselves as well?
We have multi-core, multi-gigahertz CPUs these days: there aren't cycles to spare to do this?
Zooming in a bit, Word also does not kern fonts as well as LaTeX, so it might be missing some logic there that would trickle down into more beautiful word spacing and text flow.
Yes, there can be edge cases where optimizing one section causes another section to be resized. My gut is that that is the exception, not the norm. More, for most of the resizes that will lead folks to do, it will largely result in a "card" being moved up/down in such a way that the contents of that card do not need to be re-optimized.
Yes, you could make this even harder by optimizing what the width of a section should be and flowing another section around it. But how many sites actually do something like that?
To be clear, just because I would hazard that, does not mean I'm right. Would love to see some benchmarks.
A lot of the time folks are sitting and thinking and things are idle. Perhaps Word could 'reflow' text in the background (at least the parts that are off screen)? Also, maybe the saved .docx could perhaps have hints so that on loading things don't have be recalculated?
https://www.youtube.com/watch?v=kzdugwr4Fgk
The Kindle Text Problem - Computerphile
Typesetting all of wikipedia? Probably measurable. Typesetting a single article of wikipedia? Probably not. And I'd wager most sites would be even easier than wikipedia.
"Line Breaking", xxyxyz.org
https://web.archive.org/web/20171021044009/http://xxyxyz.org...
This was on a 4.5 GHz quad core CPU. Single threaded performance of todays top CPUs is only 2-3x faster, but many gamers now have 144Hz+ displays.
What you're complaining about is that the site is not optimized for your reading enjoyment. The site is probably quite well optimized, but your reading enjoyment was not one of the optimizer's priorities. I think we agree about how annoying that is and how prevalent, so the news that new typographical features are coming seems to me like good news for those of us who would appreciate more sites that prioritize us the readers over other possible optimization strategies.
And to be clear, most sites flat out don't need to be optimized. Laying out the content of a single site's page is not something that needs a ton of effort put into it. At least, not a ton in comparison to the power of most machines, nowadays.
This is why, if I open up GMail in the inspector tab, I see upwards of 500+ requests in less than 10 seconds. All to load my inbox, which is almost certainly less than the 5 megs that has been transferred. And I'd assume GMail has to be one of the more optimized sites out there.
Now, to your point, I do think a lot of the discussion around web technologies is akin to low level assembly discussions. The markup and script layout of most sites is optimized for development of the site and the creation of the content far more than it is for display. That we have moved to "webpack" tricks to optimize rendering speaks to that.
You're kidding, right? There are a ton of non-trivial edge cases that have to be considered: break points, hyphenation, other Latin-based languages, etc.
From a Google engineer's paper describing the challenges: https://docs.google.com/document/d/1jJFD8nAUuiUX6ArFZQqQo8yT...
Performance Considerations
While the `text-wrap: pretty` property is an opt-in to accept slower
line breaking, it shouldn’t be too slow, or web developers can’t use
them due to their performance restrictions.
The pinpoint result when it is enabled for all web_tests is in this CL.
Complexity
The score-based algorithm has different characteristics from the
bisection algorithm. The bisection algorithm is O(n * log w) where n is
the number of lines and w is the sum of spaces at the right end. The
score-based algorithm is O(n! + n) where n is the number of break
opportunities, so it will be slower if there are many break
opportunities, such as when hyphenation is enabled.
Also, computing break opportunities are not cheap; it was one of
LayoutNG's optimizations to minimize the number of computing break
opportunities. The score-based algorithm will lose the benefit.
Last 4 Lines
Because computing all break opportunities is expensive, and computing
the score is O(n!) for the number of break opportunities, the number of
break opportunities is critical for the performance. To minimize the
performance impact, the implementation caches 4 lines ahead of the
layout.
Before laying out a line, compute line breaking of 4 lines ahead of the
layout.
If it finds the end of the block or a forced break, compute the score
and optimize line breaks.
Otherwise layout the first line from the greedy line breaking results,
and repeat this for the next line.
The line breaking results are cached, and used if the optimizer decided
not to apply, to minimize the performance impact.
Currently, it applies to the last 4 lines of each paragraph, where
“paragraph” is content in an inline formatting context separated by
forced breaks.
The Length of the Last Line
Because the benefit of the score-based line breaking is most visible
when the last line of the paragraph is short, a performance
optimization is to kick the optimizer in only when the last line is
shorter than a ratio of the available width.
Currently, it applies only when the last line is equal to or less than ⅓
of the available width.
Checking if the last line has only a single word
Checking if the last line has only a single word (i.e. no break
opportunities) requires running the break iterator, but only once.
They are still not a sizeable number in comparison to the number of devs that have enabled different text wrap options. Most of which do not give much thought to a setting that did not appreciably slow things down at all.
The article says what chrome does is only support the “no super short lines” bit.
So while you won’t end up with one word on its own line at the end of a paragraph, it’s not trying to prevent rivers of text or keep a relatively clean ragged right or anything else.
That’s allowed by spec, but it’s not quite the same thing.
[1] https://en.wikipedia.org/wiki/Phototypesetting
[2] https://en.wikipedia.org/wiki/Hot_metal_typesetting
[3] https://en.wikipedia.org/wiki/Knuth%E2%80%93Plass_line-break...
See http://widespacer.blogspot.com/2014/01/two-spaces-old-typist... for many details.
But text on a page is set for a set layout, and that’s where the web really differs.
Surprisingly in the high-end the less automated line composer is used a lot more. It requires more work but human decisions lead to best results if done properly.
So in my experience ereaders have had great layout engines.
So unless the e-reader uses an engine that already has good rules there will be no real change unless the manufacturer does what it should have already.
Granted, I probably view CSS with far more disdain than I should.
Don't get me wrong, there are smart people working on css. That we decided, as an industry, to treat layout as a Rube Goldberg machination of interacting rules on a growing number of elements is not something you can easily overcome.
Notice how the open quotation marks hang into the left margin. There’s been some recent work with CSS to make this automatic, but that’s newer than this book and support is spotty. MB made it happen with a (iirc) custom filter inside the Pollen setup he made for this book. Wild. And beautiful.
But they say:
> We are not yet making adjustments to prevent rivers, although we’d love to in the future.
And indeed, I couldn't even begin to guess how to define a metric for rivers, that can occur at different angles, with different variation, being interrupted by various degrees... I'm curious if there's a clever metric anybody has invented that actually works? Or does it basically require some level of neural-network pattern recognition that is way too expensive to calculate 1,000 variations of for each paragraph?
The intent here is that the document author is informed that their text contains rivers, and responds by tweaking the wording until they land on something that doesn't have them.
Of course, for a browser engine this is a complete nonstarter; a useful feature for dealing with rivers would require not just detecting them but automatically removing them, without changing the text content. I'm not aware of any existing software that does this, but I've found one proposed list of exploratory directions that could make a decent starting point for anyone who wanted to build this: https://tex.stackexchange.com/a/736578
And, yes, there are some concerns that are done at the line level that could lead to a paragraph getting reworked. Ending with a single word, is an easy example. That is still something where you can evaluate it at the line level easily.
There's no problem with paragraph-level optimizations inherently. Reducing raggedness is paragraph-level and that's comparatively easy. The problem is the metric in the first place.
Or, maybe not? I'll note that the vast majority of "rivers" I've seen in texts coincide with punctuation quite heavily. Even the example in this article has 5/8 lines using a comma to show the river. With the other lines having what seems to be obvious stretched space between words to use more of the line? Maybe enumerating the different reasons for space would be enough?
Granted, this also calls out how dependent you almost certainly are on the font being used, as well?
https://www.loc.gov/resource/gdcwdl.wdl_03042/?sp=5&r=-0.122...
Why did they even design it like this in the first place? This seems like it is counter to much of what browsers have been doing recently like being able to customize select, the browser interop and baseline projects, web platform test etc. I would rather move away from these types of features in favor of more explicit ones. I understand that this isn't a serious issue and is unlikely to cause bugs compared to other interop issues which are true deviations from the spec. It just seems counterintuitive to do this though.
There's no "correct" way to typeset a document, there wouldn't even be a consensus among typesetters on what the implementation specifics look like. Rather than turn the spec committee into a decades-long ecumenical council of typographers they just left the specifics up to each individual "shop" as it always has been. Except now instead of printers it's the browser vendors needing to make the final call.
They can add multiple typesetting properties and allow the develop to decide which one to use. Besides, letting each browser decide what the "best" line break looks like doesn't solve the problem of there not being a definitive answer to that question. Even here, I don't think the Chrome developers have a vastly different opinion on what a good line break looks like. It's possible they didn't like the performance implications of webkit's version or had some other tangential reason, although the blog says performance is not an issue.
Having worked with passionate (aka opinionated) typographers, that phrasing earned a well-deserved chuckle. Leaving implementation choices up to each browser was certainly the only to way to get it into CSS. Hopefully various implementations will evolve over time and coalesce into a fairly consistent baseline.
The whole point of CSS is/was to standardize presentation across browsers.
CSS was created to standardize how to deal with presentation, but that doesn't mean every website should look exactly the same on every device or in every browser. The era of attempting to do that is over.
text-wrap: pretty is a great example of progressive enhancement [1]: it's a great way to add some polish to a website but if the user's device doesn't support it, they can still access all of the content on the site and be none the wiser.
If you read the CSS specifications, browser makers, in some cases, are allowed to use platform-specific heuristics to determine whether or not to execute certain features. Downloading web fonts works like this—browsers fallback to system fonts if a webfont doesn't download within 3 seconds.
It makes sense that text-wrap: pretty should be one of those. If your smartphone is low on power and the signal isn't that great, you can forgo expertly wrapped text and elegant hyphenation in order to view the webpage as quickly as possible.
For every device I agree but that was never the goal of CSS. It is meant to respond to the device's constraints such as screen dimensions and device type (desktop, mobile, print) using eg. media queries. In every browser I do think they should try to accomplish the same thing. Even if the exact algorithms used are different, the intended result should be agreed upon.
That was the point. Maybe Gen Z changed its meaning now, but that was the main premise. There was even the Acid3 test and similar stuff.
>The pretty value is intended for body text, where the last line is expected to be a bit shorter than the average line.[0]
which seems to mainly be about avoiding short last lines. That is from a note. The actual value "specifies the UA should bias for better layout over speed, and is expected to consider multiple lines, when making break decisions," which is more broad. But the intent is clearly specified in the note. This is also how chrome described the feature as mentioned in the article. But it does say that the effect would change in the future:
>The feature does a little more than just ensure paragraphs don't end with a single word, it also adjusts hyphenation if consecutive hyphenated lines appear at the end of a paragraph or adjusts previous lines to make room. It will also appropriately adjust for text justification. text-wrap: pretty is for generally better line wrapping and text breaking, currently focused on orphans. In the future, text-wrap: pretty may offer more improvements.[1]
The design doc linked in [1] says this about it:
>The `text-wrap: pretty` is the property to minimize typographic orphans without such side effects.
>There are other possible advantages for paragraph-level line breaking, such as minimizing rivers. The csswg/#672 describes such other possible advantages. But the initial implementation focuses on typographic orphans, as it’s the most visible benefit, and to minimize the performance impacts.
>Because paragraph-level algorithms are slow, there are multiple variants to mitigate the performance impacts.[2]
The new draft[3] changed it to the current definition. What's also interesting from that new draft is this new note:
>The necessary computations may be expensive, especially when applied to large amounts of text. Authors are encouraged to assess the impact on performance when using text-wrap-style: pretty, and possibly use it selectively where it matters most.
which seems to go against what was written in the webkit blog. If developers start using this value everywhere expecting that it will be fast then that effectively stops future implementations from using a slower but better algorithm (assuming one exists).
[0] https://www.w3.org/TR/css-text-4/#propdef-text-wrap-style
[1] https://developer.chrome.com/blog/css-text-wrap-pretty
[2] https://docs.google.com/document/d/1jJFD8nAUuiUX6ArFZQqQo8yT...
strange English.
> It's far text
> this text has short a lot of words all in a row
not relevant to the subject, unless you want to consider improving line breaks by rearranging words
* https://en.wikipedia.org/wiki/Knuth–Plass_line-breaking_algo...
It seems like a pretty straightforward machine learning regression problem: given a sequence of word widths, find a line length which when applied to the sequence satisfy the constraint of being “pretty”.
Using a model would allow computing the answer in constant time.
The actual problem is also more complex than fixed word widths due to hyphenation and justification - from what I recall, Knuth's paper (IIRC there's two and the second one is the one to read) on TeX's layout gives a good overview of the problem and an algorithm that's still state of the art. I think the Typst people might have a blog post about their modern take on it, but I'm not sure.
If you like this and are interested in something closer to TeX, why not the TeX algorithm itself!? There's this lovely package that works wonderfully on the web https://github.com/robertknight/tex-linebreak?tab=readme-ov-... And if you want to play around with a live example to check the performance, I use it on my site's about page: https://lynkmi.com/about
Been on a big LaTeX buzz recently and even added support for it in comments there too!
Here's how you do it in advanced layout systems like CSS: https://stackoverflow.com/questions/6205109/justify-text-to-...
In Chrome (can't test Safari), `text-wrap: pretty` has a much milder effect.
Should one use both together in the main text of your average blog? I checked, they do appear to make individual changes.
> Should one use both together in the main text of your average blog
Optimize for legibility; the properties are compatible.
According to caniuse.com, Chrome has had support for this since September 2023. Maybe I'm dumb, but what's so "unprecedented" about this?
Comparatively:
> WebKit is not the first browser engine to implement, but we are the first browser to use it to evaluate and adjust the entire paragraph. And we are the first browser to use it to improve rag.
This enough digging for you?
Go on. Vote down without offering a counter again. Take my fake points from me. You know it'll make you feel big. :)
----
[1] In true Apple style, people did it before, they polished the act a bit, and took it as theirs!
> we want you to be able to use this CSS to make your text easier to read and softer on the eyes, to provide your users with better readability and accessibility.
How did that happen?
Chromium is the only browser engine whose stable channel currently supports text-wrap: pretty. In this post, WebKit is announcing not only that they've implemented it (though not yet in a stable channel), but that they've done so using an algorithm that's better than Chromium's. Their algorithm adjusts for various things that Chromium's currently does not.
> Mozilla standard position on this feature is positive > https://github.com/mozilla/standards-positions/issues/993
I realize I misread "One solution is text-wrap:pretty" as "OUr solution is text-wrap:pretty". Combined with the fact that this was on the webkit blog.
Thanks.
Applying `text-wrap:pretty` solves that!
You'll often see multiple listed, like `-apple-system, "SF Pro Text", Helvetica, sans-serif` in this case. It tries to use them from left to right.
In case you come across other website fonts you like, you can use https://fontofweb.com to get their names.
Disclaimer: I'm the creator