groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Permissible characters for hyphenation


From: Steffen Nurpmeso
Subject: Re: [Groff] Permissible characters for hyphenation
Date: Mon, 30 May 2016 17:09:14 +0200
User-agent: s-nail v14.8.8-237-g587085a

John Gardner <address@hidden> wrote:
 |On 30 May 2016 at 23:20, Steffen Nurpmeso <address@hidden> wrote:
 |> John Gardner <address@hidden> wrote:
 |>|> I have been convinced that soft hyphen is a control character and
 |>|> not something visual,
 |>|
 |>|Almost correct.
 |>|
 |>|Soft hyphens *do* describe potential breaking points, but they only
 |> become
 |>|visible when surrounding text is broken.

 |> Yes.  For display purposes however i think U+00AD can't be used
 |> directly, but will be replaced by the renderer to either nothing,
 |> if no wrap is to be applied at the character position, or
 |> something appropriate, like ASCII hyphen-minus or some extended

 |>|Web authors were encouraged to use the more semantic and reliable <wbr/>
 |>|element <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/wbr>
 |>|instead.
 |>
 |> I am, for one, sure that the HTML standard committee will someday
 |> manage to add markup for shitty baby napkins.  The palms and
 |> beaches of their happenings seem to promote this direction. ^.^
 |
 |I, uh, think something might've been lost in translation. :|

So.  That made me search the web and i've found:

  On UTF-8 encoded pages, <wbr> behaves like the U+200B ZERO-WIDTH
  SPACE code point. In particular, it behaves like a Unicode bidi
  BN code point, meaning it has no effect on bidi-ordering: <div
  dir=rtl>123,<wbr>456</div> displays, when not broken on two
  lines, 123,456 and not 456,123.

  For the same reason, the <wbr> element does not introduce
  a hyphen at the line break point. To make a hyphen appear only
  at the end of a line, use the soft hyphen character entity
  (&shy;) instead.

  This element [.] was officially defined in HTML5.

My opinion: HTML was derived from SGML as a strict abstraction of
content and form(atting).
But afaik HTML requires any conforming application to support
Unicode since quite a long time, so then why duplicating
behaviour?  Is it because of «explicit is better»?  So, then.
Fine.  I also would use <span> above, but the bigger the choice,
the harder it is to choose (www.dict.cc).

Years ago i've read Korpela's rant on this topic, but Markus Kuhn
also has something nice to say:

  The original HTML 2 specification [6] by Tim Berners-Lee et al.,
  still wisely leaves the semantics of SOFT HYPHEN untouched with
  the remark

    NOTE - Use of the non-breaking space and soft hyphen indicator
    characters is discouraged because support for them is not
    widely deployed.

  Unfortunately by HTML 4 [7], this had mutated into a complete
  reinterpretation of the purpose of the SOFT HYPHEN, compared to
  how it had been used over the past decade in output devices.
  What was originally a graphical character had turned into an
  invisible marker for a hyphenation opportunity:
  [.]
  This HTML 4 reinterpretation is essentially the semantics that
  Unicode then adopted as well.

May the majority be with you.

--steffen



reply via email to

[Prev in Thread] Current Thread [Next in Thread]