> Without any line breaks at all, this paragraph appears in source as a long, continuous line of text
Of course it doesn't because
> (which may be automatically wrapped at a fixed column length, depending on your editor settings):
Indeed, are you short on apps that support this ancient text formatting feature?
> Adding a line break after each sentence makes it easier to understand the shape and structure of the source text
Nope again, visually you've just wasted my devices width or overestimated my smartphone's width and I get exactly the same issue you've just complained about: a single sentence that doesn't fit.
Semantically, what you're looking for already exists and is called a paragraph. A sentence has a different meaning, which you break by line breaking after every single one. It kills the structure, not "makes it easier to understand the shape and structure of the source text"
(also, bullet points exist)
PS
By the way, why deprive readers of extra clarity offered by this formatting?
> We can further clarify the source text by adding a line break after the clause “with reason and conscience”. This helps to distinguish between the “and” used as a coordinating conjunction between “reason and conscience” and the “and” used as a subordinating conjunction with the clause
I think you might be misunderstanding. The semantic line breaks described here are not shown to readers. They are visible only to the person writing/editing the text, as a tool for their own use. If you aren't someone who finds a tool like this useful for your own writing, then no worries! Nobody has been harmed by this existing but not being used. It has no effect on the result.
While I never knew there was a name for this, I naturally do something very similar when writing, keeping thoughts separated by at least a line or two, even if I imagine they'll be in the same paragraph in the end result, just so I have a visual sense of where my different thoughts are and how long they are.
GP brought the point up that addresses this: "why deprive readers of extra clarity offered by this formatting?"
So if this is something that's valuable when reading or editing material, why not extend that to the final, rendered output?
To me, this smells of micro-optimization that's not well thought through: where are the boundaries between being able to edit vs being able to read? If we make every word be on a line by itself, you can use remove-line command in your editor, and diffs will automatically become word-diffs, and it would encourage writers to limit sentences to clearer sentences by fitting them into one ~50 line/word screen. By using double newlines, you can still keep "semantic" newlines too. Wouldn't that be appealing? "No" is what I would say.
I’ve always done similar. My initial writing is a disconnected jumble of sentence parts, sentences, more fleshed out paragraphs, etc just to get ideas out and they later get organized into something cohesive.
I like your extension of the term “readers” but I don’t think that’s the intended use for this matter. And if it were, would it be safe to assume that editors and other collaborators would consent to this standard?
> But also, have you never read the plain text / source of some markdown/other markup language written by someone else? Readme.md in its raw form?
That’s beside the point because the spec states "A semantic line break must not alter the final rendered output of the document.”
And I think you’re misinterpreting what “plain text” refers to here. Not .txt files exclusively, but the markup languages mentioned as well that are...plain text. The final rendered output of these kind of documents are not themselves.
The expectation is that the source of whatever flavor of plain text is not the final output.
If this practice offends you, don’t use it. This is a specification suggesting a practice for you* to use.
How have you been able to manage with hard-wrapped text elsewhere?
> And if it were, would it be safe to assume that editors and other collaborators would consent to this standard?
Easy no, only some of them in some instances. There is no uniformity at such a scale / variety of collaboration.
> That’s beside the point because the spec states
It's not, and I've addressed this in the very next semantic line! And you've also ignored the very point in your quoted line as well. Editing "Readme.md in its raw form" with the extra line breaks is still bad regardless of the final rendered output.
> Not .txt files exclusively
I don't need exclusivity, complementarity still works. And again, final output doesn't save you
> If this practice offends you, don’t use it.
If the criticism offends you, practice in the shadows and don't publish the raw misformatted specs/docs!
> How have you been able to manage with hard-wrapped text elsewhere?
Sometimes by batch-replacing those extra newlines in a text editor, sometiems by abandoning reading because the text reflow is too broken, sometimes just by plowing through while cursing the cavemen that force their habits onto the readers with different devices.
Your aversion appears
to be psychological.
It seems to me like
you have trouble examining
things by the sum of
their parts and
semantic line breaks
agitate this.
You’re free to
the render “misformatted”
text in the format that it’s
intended to be viewed.
And I take it that
physical literature
is a burden for you
to bear.
My condolences.
I imagine this type of formatting caused you to sneak in a typo or two like "free to the render" — if you had it as a free-flowing sentence, it'd be easier to catch.
The point is that if these formatting snippets are useful for reading, we should extend this to all the readers too. If they are not, we should be mindful not to micro-optimize lest we confuse others and ourselves.
My guess, reading this thread, is that most people would tell me that they find the breaks annoying and would rather read my prose without the breaks? Hmm. Would love to hear some feedback.
I’m clearly biased. But I enjoyed the format and it's risk like this that make me more likely to read the content where under different circumstance I may not even care about your professional experiences. The smaller font size mitigates any issues due to the line breaks because I can still see all the text.
Although I can get how someone else would feel that the text is too small and if the size was a conscious decision on your part to accommodate the line breaks they would hold you to blame further.
I think you’ve got a nice personal website overall. Even down to the drafts that lead to 404 errors; a nice touch even if unintentional.
Everybody wants personality to return to the Web again until they’ve got to deal with personalities.
I still might batch process all my notes to break at words up to 72 characters anyway! But I’ll be mindful to where safety goggles and proofread before grinding axes in public.
And I’m leaving the typo above in situ so as not to mislead you that I’m less prone to error under normal circumstances. I’m a WIP.
I think this proposal is aimed at folks who won't do things like view plain-text documents on a smartphone, which I think is a reasonable assumption/tradeoff. I think bringing up viewing plaintext on your phone is... a misplaced optimization.
Regardless, if your point is that actually interacting with such a document would be annoying, I agree. I think anything that's giving me jagged paragraphs instead of nicely wrapped text would drive me crazy (though I'm a lover of hard wrapped plaintext). I know one day I'll end up "tidying the formatting" of a markdown doc formatted in this way and end up getting linked to this proposal.
> Start each sentence on a new line. Make lines short, and break lines at natural places, such as after commas and semicolons, rather than randomly. Since most people change documents by rewriting phrases and adding, deleting and rearranging sentences, these precautions simplify any editing you have to do later.
I've been doing this for some time now and it indeed makes editing text in vim easy. Just `dd` and `p`. And as the article mentions, diffing is much cleaner.
There's a caveat however, if you comment a line (to keep the thought, but see how it would work without) some Markdown parsers will interpret it as an empty line, and thus create a new paragraph. It then becomes necessary to remove one of the semantic newlines, which looks rather messy.
> A semantic line break SHOULD occur after an […] em dash (—).
I agree with this, however it means that no existing markup language supports semantic line breaks, because every last one of them just turns the break into a space—and em dashes are, in most locales, not to be surrounded by a space. Consequently, you’ll end up with a stray space if you do this.
My irritation at being unable to break after an em dash (which I want to do quite frequently) was one of the things that headed me down the path of designing my own lightweight markup language (LML), to fix this and other problems I observe with existing LMLs. I’ve been using it for all my personal writing for something like four years now (though a a fair bit has changed since then), and I expect to finally have a functioning parser before the end of this year.
One of the other fun complications of this kind of line break in source code is languages that don’t have a word divider—inserting a space at all is incorrect in them.
> any remaining segment break is either transformed into a space (U+0020) or removed depending on the context before and after the break. The rules for this operation are UA-defined in this level.
My LML currently turns segment breaks into a space unless the line ends with an en or em dash, unless there’s a colon or a space before that. I haven’t got anything in place for languages with no word separator yet, but it is unusually well-suited to such languages.
I don’t like reStructuredText’s backslash behaviour, because it means two completely different things. Or arguably three. Normally it means to interpret the next character literally, but if it’s followed by whitespace (typically space or newline) it instead removes that next character. Except… not entirely in the case of newline, because it’s character-level markup, and at the end of a block it just does nothing. In
a\
b
you might expect to get “a b” or an error, but actually you get a single-item definition list with term “a” and definition “b”, just the same as if you had omitted the backslash.
A far more logical meaning of a trailing backslash is to escape the newline, meaning, in HTML terms, insert <br>. That’s what I chose in my LML, and I later learned CommonMark chose that too.
In hindsight “escape” was a poor choice of word, but I did explain it and you omitted that from your quote: “meaning, in HTML terms, insert <br>”. And that’s not what reStructuredText does. Rather, at the end of a line, backslash acts like a line continuation character (… that only works in certain circumstances), a behaviour commonly found in programming languages inside at least string literals, but such languages aren’t using backslash as “escape the next character”, but rather they have a fixed set of escape sequences like \n or \uXXXX.
> em dashes are, in most locales, not to be surrounded by a space
This is definitely not the case for at least French and Russian, which means markup renderers now have to guess text language or force authors to declare such in some metadata header. And it gets even more complicated with inclusion of block quotes in different languages.
It’s not hard and doesn’t need language awareness; I described how to detect it: if there’s no space before an end-of-line em dash, suppress the segment-break-replacing space.
There seem to be some locales or styles that use asymmetric spacing. From the Zen of Python—note different spacing based on context and position within the sentence:
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
[...]
Namespaces are one honking great idea -- let's do more of those!
Unicode has U+200B ZERO WIDTH SPACE for that purpose. In HTML and hence Markdown you can also use `<wbr>`. If you’re using a custom setup anyway, you can have it be inserted automatically by regex replacement, as a pre-rendering step.
I think you’ve misunderstood something? This is about suppressing the turning of a segment break into a space, not about line break opportunities.
> Unicode has U+200B ZERO WIDTH SPACE for that purpose.
ZWSP is not at all “for that purpose”. If you mean this:
A—​
B
Well, I am mildly surprised to find that no extra space is added in Gecko or Blink. But in WebKit, a space is still added; for this is part of the “UA-defined” bit I quoted.
And if you’re willing to do preprocessing, you can just merge the lines, that’d actually work.
> In HTML and hence Markdown you can also use `<wbr>`.
Indeed, I skimmed a bit and misread “unable to break” to mean that you wanted a line-break opportunity but the renderer didn’t allow for it when a letter is directly following an em dash. But it’s the other way around, you want a line break in the source after an em dash to not translate into a space in the rendering. This would likewise be possible to handle by regex replacement as a pre-rendering step.
More generally, I see markup languages and the details of how they are rendered as largely orthogonal. You don’t necessarily need to invent a different markup language in order to adjust the rendering.
> More generally, I see markup languages and the details of how they are rendered as largely orthogonal. You don’t necessarily need to invent a different markup language in order to adjust the rendering.
There’s not much to a markup language beyond how it’s rendered. If you don’t ever want to render it to something other than plain text, just write plain text however you desire. The reason for choosing a particular markup language is to express intended semantics (for plain-text and rendered use), and to render it. The semantics aspect is legitimate, so I won’t say the language and rendering are identical or parallel, but they’re definitely nothing like orthogonal. If you’re using a CommonMark pipeline, any preprocessing you do means you’re not actually writing in CommonMark, but an incompatible variant of it. You may well deem it worthwhile, but it’s no longer the same markup language.
I'm a died-in-the-wool, responsive, readable, internationalizable, accessible, standards-based, enshyenist:
Instead of using an unbreakable em dash to rigidly and unbreakably connect two phrases by their last and first words, I prefer using an en dash, followed by a shy hyphen, and then another en dash, to elegantly hyphenate words connected by em dashes when they don't fit on the line. ;)
Few fonts will render this nicely; the dashes are unlikely to join. Also if it does break at the soft hyphen, you’ve got an extraneous hyphen added on the first line.
If I were doing that, I’d probably use a zero-width space instead of a soft hyphen. Same break opportunity, removes the extraneous undesirable hyphen if it breaks, but introduces a new word boundary so that wordwise selection can now split your wonky dash. Therefore I suggest <span style=user-select:all>–​–</span> because if you’re going to do something ridiculous you might as well embrace the ridiculosity.
The main reason I use semantic line breaks, not explicitly mentioned in this article, is that it minimizes reformatting when editing. Only the subclause being edited is reformatted, while the rest of the paragraph remains as-is. This also minimizes the changes in line-oriented diffs.
While one could rely on automated line-wrapping instead of using hard line breaks that require reformatting, it isn’t usefully available in all environments, in particular for indented paragraphs and when having elements like ASCII art or code that shouldn’t be word-wrapped, and it makes plain-text diffs larger than necessary when whole paragraphs are on a single source line.
Wouldn't putting a single word per line minimize reformatting and changes in line-diffs further?
From an opposite angle, I imagine you do not use semantic line breaks in ASCII art or code that shouldn't be word-wrapped, so I imagine why not push for that too?
Mostly to point out that this is a micro-optimization for little demonstrable benefit.
TBH most of the time I find markdown's collapsing of whitespace annoying - if you want a 'visual' line-break you have to add unnatural double space at the end of preceding line. And even this is renderer dependent, I don't think is part of the spec (?) so some renderers don't respect it (and IIRC GitHub comments renderer does't need it, i.e. doesn't do semantic line breaks)
Another pet hate is text editors which auto-convert double space into ". " - I find this even cropping up in IDEs now, so you try to add an end of line comment "...] # here" and it turns into "...]. # here". Awful
Fun. That shuffling of its location in Ventura reminds me of how Windows has shifted some things over time, sometimes more than once or twice. The dialog where you can stop Ctrl+Shift from switching keyboard layout has been in at least eight different places over time <https://superuser.com/q/109066>, possibly more.
> That’s just a bad syntax choice on Gruber’s part.
I believe Gruber was inspired by how people wrote emphasis in plain text emails and other text documents. Most MUA at the time would treat trailing whitespace as a hard rather than soft line break. This is from my--now aging--memory, and I can't find a source to corroborate. I do recall, though, there were clients that didn't do it well (ahem, Outlook), and would break plain text formatting of deeply-nested quoted text. (Don't even get me started on how Outlook single-handedly changed culture from bottom posting to top posting).
I also wonder, why conceal bits of information from readers, while they could possibly benefit of them the same way editors and writers do. Admittedly, the outcome then seem like a poetry, but … why not?
To give it a shot on that page, simple way to see these breaks it to run
document.body.insertAdjacentHTML
( 'afterend'
,`<style>p, li { white-space: pre-line; }</style>`
)
in devtools console. (Using `pre-wrap` instead of `pre-line` is also interesting: indents "wrapped" lines by the source code indent, what gives it even more clarity.
(By the way, HN comments
also
preserve
line
breaks
in
the
source output, but unless revealed by some extra style, they are usually not presented on the surface.)
Here's a userscript to apply it automatically, and show all the extra whitespaces people put in their comments. (There aren't many.)
// ==UserScript==
// @name HN comments show whitespace
// @description Changes HackerNews CSS to show comments with original whitesapaces
// @match https://news.ycombinator.com/*
// @run-at document-body
// @grant none
// ==/UserScript==
// HN inserts a newline after <pre> so formatted code blocks have a whole newline after them
// but we can remove that extra space with negative margin
const HN_noformat_CSS_rule = `
div.commtext.c00 {
white-space: pre-wrap !important;
}
div.commtext.c00 > pre {
margin-bottom: -.25em !important;
}
`;
let myStyleHolder = document.createElement('style');
myStyleHolder.textContent = HN_noformat_CSS_rule;
document.head.appendChild(myStyleHolder);
The problem is that this makes having line breaks that are not paragraph breaks in the output much more awkward and I think those are much more important than line breaks that are only there in the source.
This is especially true for Markdown which is supposed to be a pretty rendering of conventions that were already common in text only communication so it's weird when explicitly entered line breaks are ignored in the output.
The significant majority of markup languages essentially treat a single line breaks as a space. HTML, Markdown, et cetera. In lightweight markup languages, you normally need a blank line (i.e. two line breaks) to signify a paragraph break.
GitHub issues and discussions are an outlier in treating them as hard single line breaks (which are not paragraph breaks).
Most plain-text communication used to use line wrapping, often not supporting lines above, say, 100 characters.
Just like typeset prose uses wrapping, because your paper isn’t infinitely wide.
Good thing about Markdown is that the lack of a proper spec means you can pick one you like (when possible). Pandoc for instance treats input Markdown line-breaks in a sane way, allowing semantic breaks to not affect the output.
this seems to consider "text being read after formatting" and "text being read before formatting" as different things.
Which I guess, if you're the sole author of the text might be true.
But in my experience most text that gets rendered is also read and edited by multiple people in its source form, so why wouldn't you want to make source just as easy to read?
I think it's about optimizing for different types of reading. When you're reading the final text, you're reading to absorb the content. When you're reading the source text, you're reading to find edits you want to make. Using more line breaks is a way of making the document easier to scan if you're familiar with the "shape" of it.
I can confirm that while this technique feels odd at first, it does prove very useful when you need to edit or shorten a "wall of text", or express something very concisely. Somehow, the relevant/irrelevant parts of a sentence or thought stand out better this way.
IIRC, I even wrote a simple Vimscript function back in the day to join a "ventilated" secton back into a traditional, coherent sentence.
Hi buddy you might be colorblind: git diff uses green for added things, not blue. That includes --color-words.
You might've also changed your color theme and forgotten. You can pipe through less (or something else that doesn't understand the control code) and look for ^[[31m is red; ^[[32m is green; ^[[34m would be blue (although ^[[36m is kindof a light blue/cyan) to tell if you've reconfigured your terminal, or you've reconfigured git's default colors, but if you haven't done either you might want to get your color-vision checked!
I’ve often thought this would be useful for version control and change review, since it allows diffs to be a lot less noisy. I’m imagining how much easier it would be to review a PR with significant README edits if the file was already structured with semantic line breaks.
I’ve previously had the above thought and applied it to the end of sentences, but the idea of introducing them at the level of semantic thought had not occurred to me. But if this is where we’re going I’d start to wish for indentation possibilities. I’ve do this frequently with SQL statements, introducing both line breaks and indentations to provide a visual structure that mimics the semantic structure of clauses and the details they contain.
Indeed. Edits show up as a -/+ on just the sentence or clause that has changed. Contrast with hard-wrapped text, where a single word change towards the beginning of a paragraph can cause the entire paragraph to be replaced in the diff view, as things reflow.
There is a very good technical argument for NOT using "semantic" line breaks when editing markup source code, especially of the "hardwrap" variety, and that is the ability to easily diff two versions of the same document, e.g. when comparing latex git commits.
Anything that reorganises the sentence around for the sake of maintaining justification, completely destroys any meaningful diff from taking place.
And ideally your editor should support both hard and soft wrapping, so that aesthetics of wrapping shouldn't be a big issue.
I made a command-line tool [0] powered by Transformer models that performs semantic linebreaks to breaks lines in a text file at semantic boundaries. It supports multiple file types including LaTeX, Markdown, and plain text, with automatic file type detection.
If your editor auto reflows the text, that will conflict with this, by erasing line breaks you inserted.
This is imposing an 80-character line length limit. With a line length limit, I want an editor to reflow my text so I don't have to do the line length limit manually.
Reflowing the current line won't work. If I delete a word, I need the next line's text to be flowed up to join with the current line. And that might require reflowing of subsequent lines as well.
Reflowing the current sentence won't work, because this style involves inserting semantic line breaks in certain places within a single sentence.
Reformatting a user selected portion is going to take more effort on my part. Currently my editor reflows the entire file every time the file is saved; I don't have to think about which part to reflow.
> Without any line breaks at all, this paragraph appears in source as a long, continuous line of text
Of course it doesn't because
> (which may be automatically wrapped at a fixed column length, depending on your editor settings):
Indeed, are you short on apps that support this ancient text formatting feature?
> Adding a line break after each sentence makes it easier to understand the shape and structure of the source text
Nope again, visually you've just wasted my devices width or overestimated my smartphone's width and I get exactly the same issue you've just complained about: a single sentence that doesn't fit.
Semantically, what you're looking for already exists and is called a paragraph. A sentence has a different meaning, which you break by line breaking after every single one. It kills the structure, not "makes it easier to understand the shape and structure of the source text" (also, bullet points exist)
PS By the way, why deprive readers of extra clarity offered by this formatting?
> We can further clarify the source text by adding a line break after the clause “with reason and conscience”. This helps to distinguish between the “and” used as a coordinating conjunction between “reason and conscience” and the “and” used as a subordinating conjunction with the clause
I think you might be misunderstanding. The semantic line breaks described here are not shown to readers. They are visible only to the person writing/editing the text, as a tool for their own use. If you aren't someone who finds a tool like this useful for your own writing, then no worries! Nobody has been harmed by this existing but not being used. It has no effect on the result.
While I never knew there was a name for this, I naturally do something very similar when writing, keeping thoughts separated by at least a line or two, even if I imagine they'll be in the same paragraph in the end result, just so I have a visual sense of where my different thoughts are and how long they are.
GP brought the point up that addresses this: "why deprive readers of extra clarity offered by this formatting?"
So if this is something that's valuable when reading or editing material, why not extend that to the final, rendered output?
To me, this smells of micro-optimization that's not well thought through: where are the boundaries between being able to edit vs being able to read? If we make every word be on a line by itself, you can use remove-line command in your editor, and diffs will automatically become word-diffs, and it would encourage writers to limit sentences to clearer sentences by fitting them into one ~50 line/word screen. By using double newlines, you can still keep "semantic" newlines too. Wouldn't that be appealing? "No" is what I would say.
I’ve always done similar. My initial writing is a disconnected jumble of sentence parts, sentences, more fleshed out paragraphs, etc just to get ideas out and they later get organized into something cohesive.
> are not shown to readers.
Sure they are, though the spec hides some readers behind other names like "editors, and other collaborators"
But also, have you never read the plain text / source of some markdown/other markup language written by someone else? Readme.md in its raw form?
And the spec explicitly applies to plain text, so it's self-contradictory as "the final rendered output" of plain text is... itself.
I like your extension of the term “readers” but I don’t think that’s the intended use for this matter. And if it were, would it be safe to assume that editors and other collaborators would consent to this standard?
> But also, have you never read the plain text / source of some markdown/other markup language written by someone else? Readme.md in its raw form?
That’s beside the point because the spec states "A semantic line break must not alter the final rendered output of the document.”
And I think you’re misinterpreting what “plain text” refers to here. Not .txt files exclusively, but the markup languages mentioned as well that are...plain text. The final rendered output of these kind of documents are not themselves.
The expectation is that the source of whatever flavor of plain text is not the final output.
If this practice offends you, don’t use it. This is a specification suggesting a practice for you* to use.
How have you been able to manage with hard-wrapped text elsewhere?
> And if it were, would it be safe to assume that editors and other collaborators would consent to this standard?
Easy no, only some of them in some instances. There is no uniformity at such a scale / variety of collaboration.
> That’s beside the point because the spec states
It's not, and I've addressed this in the very next semantic line! And you've also ignored the very point in your quoted line as well. Editing "Readme.md in its raw form" with the extra line breaks is still bad regardless of the final rendered output.
> Not .txt files exclusively
I don't need exclusivity, complementarity still works. And again, final output doesn't save you
> If this practice offends you, don’t use it.
If the criticism offends you, practice in the shadows and don't publish the raw misformatted specs/docs!
> How have you been able to manage with hard-wrapped text elsewhere?
Sometimes by batch-replacing those extra newlines in a text editor, sometiems by abandoning reading because the text reflow is too broken, sometimes just by plowing through while cursing the cavemen that force their habits onto the readers with different devices.
I imagine this type of formatting caused you to sneak in a typo or two like "free to the render" — if you had it as a free-flowing sentence, it'd be easier to catch.
The point is that if these formatting snippets are useful for reading, we should extend this to all the readers too. If they are not, we should be mindful not to micro-optimize lest we confuse others and ourselves.
I actually I'm somewhat torn on this.
In my blog, I do this in my poems, such as: https://alejo.ch/39l — I don't expect this to be controversial, makes sense for poems, right?
However, I'm also experimenting with rendering my prose with the same type of breaks, like https://alejo.ch/3gb or https://alejo.ch/3g9
My guess, reading this thread, is that most people would tell me that they find the breaks annoying and would rather read my prose without the breaks? Hmm. Would love to hear some feedback.
It's pretty annoying on a phone screen where there are additional line breaks: I actually lose the flow when I need to skip lines.
Maybe it'll be better on a larger screen, but due to more frequent line breaks, I would advise you to use a serif font.
I’m clearly biased. But I enjoyed the format and it's risk like this that make me more likely to read the content where under different circumstance I may not even care about your professional experiences. The smaller font size mitigates any issues due to the line breaks because I can still see all the text.
Although I can get how someone else would feel that the text is too small and if the size was a conscious decision on your part to accommodate the line breaks they would hold you to blame further.
I think you’ve got a nice personal website overall. Even down to the drafts that lead to 404 errors; a nice touch even if unintentional.
Everybody wants personality to return to the Web again until they’ve got to deal with personalities.
You win some, you lose some.
Welp. You caught me. And I appreciate your point.
I still might batch process all my notes to break at words up to 72 characters anyway! But I’ll be mindful to where safety goggles and proofread before grinding axes in public.
And I’m leaving the typo above in situ so as not to mislead you that I’m less prone to error under normal circumstances. I’m a WIP.
Not at all my point to "catch" you, just to demonstrate that some of the claimed benefits are not really there.
We all make typos regardless of the editing format: for some cases it might be better to go with these "semantic line breaks", for others not so much.
While an interesting thought experiment, it should be obvious that claims are not objectively true.
Perhaps a full-on scientific study is in order? :)
I think this proposal is aimed at folks who won't do things like view plain-text documents on a smartphone, which I think is a reasonable assumption/tradeoff. I think bringing up viewing plaintext on your phone is... a misplaced optimization.
Regardless, if your point is that actually interacting with such a document would be annoying, I agree. I think anything that's giving me jagged paragraphs instead of nicely wrapped text would drive me crazy (though I'm a lover of hard wrapped plaintext). I know one day I'll end up "tidying the formatting" of a markdown doc formatted in this way and end up getting linked to this proposal.
> proposal is aimed at folks who won't do
What's the mechanism that will exclude all the writers/editors/collaborators/readers who are not part of the aimed at group? Which reason helps here?
Prior art on writing line oriented prose comes from one B. Kernighan, no less! Via this blog post:
https://rhodesmill.org/brandon/2012/one-sentence-per-line/
> Start each sentence on a new line. Make lines short, and break lines at natural places, such as after commas and semicolons, rather than randomly. Since most people change documents by rewriting phrases and adding, deleting and rearranging sentences, these precautions simplify any editing you have to do later.
— Brian Kernighan, 1974
This is how all the Unix documents were written.
See e.g. https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/doc/cacm/p...
Related HN Thread: https://news.ycombinator.com/item?id=4642395
I've been doing this for some time now and it indeed makes editing text in vim easy. Just `dd` and `p`. And as the article mentions, diffing is much cleaner.
There's a caveat however, if you comment a line (to keep the thought, but see how it would work without) some Markdown parsers will interpret it as an empty line, and thus create a new paragraph. It then becomes necessary to remove one of the semantic newlines, which looks rather messy.
> A semantic line break SHOULD occur after an […] em dash (—).
I agree with this, however it means that no existing markup language supports semantic line breaks, because every last one of them just turns the break into a space—and em dashes are, in most locales, not to be surrounded by a space. Consequently, you’ll end up with a stray space if you do this.
My irritation at being unable to break after an em dash (which I want to do quite frequently) was one of the things that headed me down the path of designing my own lightweight markup language (LML), to fix this and other problems I observe with existing LMLs. I’ve been using it for all my personal writing for something like four years now (though a a fair bit has changed since then), and I expect to finally have a functioning parser before the end of this year.
One of the other fun complications of this kind of line break in source code is languages that don’t have a word divider—inserting a space at all is incorrect in them.
CSS presently just leaves such decisions UA-defined <https://drafts.csswg.org/css-text-4/#line-break-transform>:
> any remaining segment break is either transformed into a space (U+0020) or removed depending on the context before and after the break. The rules for this operation are UA-defined in this level.
My LML currently turns segment breaks into a space unless the line ends with an en or em dash, unless there’s a colon or a space before that. I haven’t got anything in place for languages with no word separator yet, but it is unusually well-suited to such languages.
More folks should define their own lightweight markup languages! It’s fun and makes your writing and notes feel more like your own.
I created a convention for defining sub-notes (with frontmatter) in a Markdown note and have found it really helpful over the past few years.
I used to do this with RST, though a backslash is needed at the end of the line to escape the newline.
I don’t like reStructuredText’s backslash behaviour, because it means two completely different things. Or arguably three. Normally it means to interpret the next character literally, but if it’s followed by whitespace (typically space or newline) it instead removes that next character. Except… not entirely in the case of newline, because it’s character-level markup, and at the end of a block it just does nothing. In
you might expect to get “a b” or an error, but actually you get a single-item definition list with term “a” and definition “b”, just the same as if you had omitted the backslash.A far more logical meaning of a trailing backslash is to escape the newline, meaning, in HTML terms, insert <br>. That’s what I chose in my LML, and I later learned CommonMark chose that too.
> meaning of a trailing backslash is to escape the newline
That's what it does in this example. Don't have to use other cases, and don't believe I did.
In hindsight “escape” was a poor choice of word, but I did explain it and you omitted that from your quote: “meaning, in HTML terms, insert <br>”. And that’s not what reStructuredText does. Rather, at the end of a line, backslash acts like a line continuation character (… that only works in certain circumstances), a behaviour commonly found in programming languages inside at least string literals, but such languages aren’t using backslash as “escape the next character”, but rather they have a fixed set of escape sequences like \n or \uXXXX.
> em dashes are, in most locales, not to be surrounded by a space
This is definitely not the case for at least French and Russian, which means markup renderers now have to guess text language or force authors to declare such in some metadata header. And it gets even more complicated with inclusion of block quotes in different languages.
It’s not hard and doesn’t need language awareness; I described how to detect it: if there’s no space before an end-of-line em dash, suppress the segment-break-replacing space.
There seem to be some locales or styles that use asymmetric spacing. From the Zen of Python—note different spacing based on context and position within the sentence:
You have missed a joke: https://bugs.python.org/issue3364.
Unicode has U+200B ZERO WIDTH SPACE for that purpose. In HTML and hence Markdown you can also use `<wbr>`. If you’re using a custom setup anyway, you can have it be inserted automatically by regex replacement, as a pre-rendering step.
I think you’ve misunderstood something? This is about suppressing the turning of a segment break into a space, not about line break opportunities.
> Unicode has U+200B ZERO WIDTH SPACE for that purpose.
ZWSP is not at all “for that purpose”. If you mean this:
Well, I am mildly surprised to find that no extra space is added in Gecko or Blink. But in WebKit, a space is still added; for this is part of the “UA-defined” bit I quoted.And if you’re willing to do preprocessing, you can just merge the lines, that’d actually work.
> In HTML and hence Markdown you can also use `<wbr>`.
I fail to see how <wbr> is relevant.
Indeed, I skimmed a bit and misread “unable to break” to mean that you wanted a line-break opportunity but the renderer didn’t allow for it when a letter is directly following an em dash. But it’s the other way around, you want a line break in the source after an em dash to not translate into a space in the rendering. This would likewise be possible to handle by regex replacement as a pre-rendering step.
More generally, I see markup languages and the details of how they are rendered as largely orthogonal. You don’t necessarily need to invent a different markup language in order to adjust the rendering.
> More generally, I see markup languages and the details of how they are rendered as largely orthogonal. You don’t necessarily need to invent a different markup language in order to adjust the rendering.
There’s not much to a markup language beyond how it’s rendered. If you don’t ever want to render it to something other than plain text, just write plain text however you desire. The reason for choosing a particular markup language is to express intended semantics (for plain-text and rendered use), and to render it. The semantics aspect is legitimate, so I won’t say the language and rendering are identical or parallel, but they’re definitely nothing like orthogonal. If you’re using a CommonMark pipeline, any preprocessing you do means you’re not actually writing in CommonMark, but an incompatible variant of it. You may well deem it worthwhile, but it’s no longer the same markup language.
I'm a died-in-the-wool, responsive, readable, internationalizable, accessible, standards-based, enshyenist:
Instead of using an unbreakable em dash to rigidly and unbreakably connect two phrases by their last and first words, I prefer using an en dash, followed by a shy hyphen, and then another en dash, to elegantly hyphenate words connected by em dashes when they don't fit on the line. ;)
–­–
Few fonts will render this nicely; the dashes are unlikely to join. Also if it does break at the soft hyphen, you’ve got an extraneous hyphen added on the first line.
If I were doing that, I’d probably use a zero-width space instead of a soft hyphen. Same break opportunity, removes the extraneous undesirable hyphen if it breaks, but introduces a new word boundary so that wordwise selection can now split your wonky dash. Therefore I suggest <span style=user-select:all>–​–</span> because if you’re going to do something ridiculous you might as well embrace the ridiculosity.
The main reason I use semantic line breaks, not explicitly mentioned in this article, is that it minimizes reformatting when editing. Only the subclause being edited is reformatted, while the rest of the paragraph remains as-is. This also minimizes the changes in line-oriented diffs.
While one could rely on automated line-wrapping instead of using hard line breaks that require reformatting, it isn’t usefully available in all environments, in particular for indented paragraphs and when having elements like ASCII art or code that shouldn’t be word-wrapped, and it makes plain-text diffs larger than necessary when whole paragraphs are on a single source line.
Wouldn't putting a single word per line minimize reformatting and changes in line-diffs further?
From an opposite angle, I imagine you do not use semantic line breaks in ASCII art or code that shouldn't be word-wrapped, so I imagine why not push for that too?
Mostly to point out that this is a micro-optimization for little demonstrable benefit.
I don't get it.
TBH most of the time I find markdown's collapsing of whitespace annoying - if you want a 'visual' line-break you have to add unnatural double space at the end of preceding line. And even this is renderer dependent, I don't think is part of the spec (?) so some renderers don't respect it (and IIRC GitHub comments renderer does't need it, i.e. doesn't do semantic line breaks)
Another pet hate is text editors which auto-convert double space into ". " - I find this even cropping up in IDEs now, so you try to add an end of line comment "...] # here" and it turns into "...]. # here". Awful
> if you want a 'visual' line-break you have to add unnatural double space at the end of preceding line.
That’s just a bad syntax choice on Gruber’s part. CommonMark adds trailing backslash as an alternative, so that will work in most places these days.
> And even this is renderer dependent, I don't think is part of the spec (?)
Yes it is. Quoting https://daringfireball.net/projects/markdown/syntax: “When you do want to insert a <br /> break tag using Markdown, you end a line with two or more spaces, then type return.”
> IIRC GitHub comments renderer does't need it
Yes, GitHub decided on a wilful violation of Markdown for issues and discussions.
> text editors which auto-convert double space into ". "
I have seen that as a feature on Android keyboards, but I would be very much surprised to find it in non-keyboard software.
> > text editors which auto-convert double space into ". "
> I have seen that as a feature on Android keyboards, but I would be very much surprised to find it in non-keyboard software.
It just happened to me in VS Code! Not even a Copilot thing
[a few moments later...]
It turns out to be a macOS system setting, defaulted to on, that is polluting everywhere
https://github.com/AdamMaras/vscode-overtype/issues/9#issuec...
Fun. That shuffling of its location in Ventura reminds me of how Windows has shifted some things over time, sometimes more than once or twice. The dialog where you can stop Ctrl+Shift from switching keyboard layout has been in at least eight different places over time <https://superuser.com/q/109066>, possibly more.
> That’s just a bad syntax choice on Gruber’s part.
I believe Gruber was inspired by how people wrote emphasis in plain text emails and other text documents. Most MUA at the time would treat trailing whitespace as a hard rather than soft line break. This is from my--now aging--memory, and I can't find a source to corroborate. I do recall, though, there were clients that didn't do it well (ahem, Outlook), and would break plain text formatting of deeply-nested quoted text. (Don't even get me started on how Outlook single-handedly changed culture from bottom posting to top posting).
> (Don't even get me started on how Outlook single-handedly changed culture from bottom posting to top posting).
Or how it single-handedly kept HTML for email frozen with an incomplete and buggy implementation of HTML 3.2 from 1997…
> are not shown to readers.
I also wonder, why conceal bits of information from readers, while they could possibly benefit of them the same way editors and writers do. Admittedly, the outcome then seem like a poetry, but … why not?
To give it a shot on that page, simple way to see these breaks it to run
in devtools console. (Using `pre-wrap` instead of `pre-line` is also interesting: indents "wrapped" lines by the source code indent, what gives it even more clarity.(By the way, HN comments also preserve line breaks in the source output, but unless revealed by some extra style, they are usually not presented on the surface.)
Here's a userscript to apply it automatically, and show all the extra whitespaces people put in their comments. (There aren't many.)
The problem is that this makes having line breaks that are not paragraph breaks in the output much more awkward and I think those are much more important than line breaks that are only there in the source.
This is especially true for Markdown which is supposed to be a pretty rendering of conventions that were already common in text only communication so it's weird when explicitly entered line breaks are ignored in the output.
The significant majority of markup languages essentially treat a single line breaks as a space. HTML, Markdown, et cetera. In lightweight markup languages, you normally need a blank line (i.e. two line breaks) to signify a paragraph break.
GitHub issues and discussions are an outlier in treating them as hard single line breaks (which are not paragraph breaks).
Most plain-text communication used to use line wrapping, often not supporting lines above, say, 100 characters.
Just like typeset prose uses wrapping, because your paper isn’t infinitely wide.
Good thing about Markdown is that the lack of a proper spec means you can pick one you like (when possible). Pandoc for instance treats input Markdown line-breaks in a sane way, allowing semantic breaks to not affect the output.
this seems to consider "text being read after formatting" and "text being read before formatting" as different things.
Which I guess, if you're the sole author of the text might be true.
But in my experience most text that gets rendered is also read and edited by multiple people in its source form, so why wouldn't you want to make source just as easy to read?
I think it's about optimizing for different types of reading. When you're reading the final text, you're reading to absorb the content. When you're reading the source text, you're reading to find edits you want to make. Using more line breaks is a way of making the document easier to scan if you're familiar with the "shape" of it.
This seems related to "ventilated prose", as invented by Buckminster Fuller: https://vanemden.wordpress.com/2009/01/01/ventilated-prose/
I can confirm that while this technique feels odd at first, it does prove very useful when you need to edit or shorten a "wall of text", or express something very concisely. Somehow, the relevant/irrelevant parts of a sentence or thought stand out better this way.
IIRC, I even wrote a simple Vimscript function back in the day to join a "ventilated" secton back into a traditional, coherent sentence.
The article mentions the git diffing command `git diff --word-diff`, which is cool, but I find an even better version to be:
which shows words removed in red, and words added in blue. The output produced is similar to `latexdiff` in case you're familiar.Hi buddy you might be colorblind: git diff uses green for added things, not blue. That includes --color-words.
You might've also changed your color theme and forgotten. You can pipe through less (or something else that doesn't understand the control code) and look for ^[[31m is red; ^[[32m is green; ^[[34m would be blue (although ^[[36m is kindof a light blue/cyan) to tell if you've reconfigured your terminal, or you've reconfigured git's default colors, but if you haven't done either you might want to get your color-vision checked!
Oh yes, I found this in my `~/.gitconfig`:
I guess I changed to match the red-blue color convention I'm used to from latexdiff.I’ve often thought this would be useful for version control and change review, since it allows diffs to be a lot less noisy. I’m imagining how much easier it would be to review a PR with significant README edits if the file was already structured with semantic line breaks.
I’ve previously had the above thought and applied it to the end of sentences, but the idea of introducing them at the level of semantic thought had not occurred to me. But if this is where we’re going I’d start to wish for indentation possibilities. I’ve do this frequently with SQL statements, introducing both line breaks and indentations to provide a visual structure that mimics the semantic structure of clauses and the details they contain.
Indeed. Edits show up as a -/+ on just the sentence or clause that has changed. Contrast with hard-wrapped text, where a single word change towards the beginning of a paragraph can cause the entire paragraph to be replaced in the diff view, as things reflow.
There is a very good technical argument for NOT using "semantic" line breaks when editing markup source code, especially of the "hardwrap" variety, and that is the ability to easily diff two versions of the same document, e.g. when comparing latex git commits.
Anything that reorganises the sentence around for the sake of maintaining justification, completely destroys any meaningful diff from taking place.
And ideally your editor should support both hard and soft wrapping, so that aesthetics of wrapping shouldn't be a big issue.
And I say this as a fan of hardwrapping text.
I think you’ve got things back to front. Semantic line breaks improves diffing.
I made a command-line tool [0] powered by Transformer models that performs semantic linebreaks to breaks lines in a text file at semantic boundaries. It supports multiple file types including LaTeX, Markdown, and plain text, with automatic file type detection.
[0]: https://github.com/admk/sembr
If your editor auto reflows the text, that will conflict with this, by erasing line breaks you inserted.
This is imposing an 80-character line length limit. With a line length limit, I want an editor to reflow my text so I don't have to do the line length limit manually.
Many editors allow reformatting a user-selected portion or the current line, and some even the current sentence, with a simple keyboard shortcut.
Reflowing the current line won't work. If I delete a word, I need the next line's text to be flowed up to join with the current line. And that might require reflowing of subsequent lines as well.
Reflowing the current sentence won't work, because this style involves inserting semantic line breaks in certain places within a single sentence.
Reformatting a user selected portion is going to take more effort on my part. Currently my editor reflows the entire file every time the file is saved; I don't have to think about which part to reflow.
Wonder if any linters know about this convention.
i thought this was for ruby and javascript and this would be really cool.
automated formatting including newlines, would be great.
[dead]