Bug 22073 - charmaps/UTF-8: wcwidth of U+00AD (soft hyphen): 0 or 1 ?
Summary: charmaps/UTF-8: wcwidth of U+00AD (soft hyphen): 0 or 1 ?
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: 2.26
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: https://www.cs.tut.fi/~jkorpela/shy.html
Keywords:
Depends on: 21750
Blocks:
  Show dependency treegraph
 
Reported: 2017-09-03 20:42 UTC by Mike Frysinger
Modified: 2017-09-15 08:23 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Frysinger 2017-09-03 20:42:22 UTC
+++ This bug was initially created as a clone of Bug #21750 +++

I’ve compared the new autogenerated column width from localedata/unicode-gen/utf8_gen.py with the results of the classical wcwidth() implementation from xterm (adjusted to Unicode 10.0.0) and found a few divergences (and bugs on my (MirBSD, which uses something based on xterm’s data system-wide) side, which I fixed).

U+00AD is forced to width 1 in xterm, autodetected as combining in glibc

Rationale for forcing it to 1 is likely that U+0000‥U+00FF are latin1, which, when displayed as 8bit on terminals, had no combining characters at all.

Change Request to glibc: force U+00AD to width 1.

more background discussion with different standards can be found here:
  https://www.cs.tut.fi/~jkorpela/shy.html
Comment 1 Mike Frysinger 2017-09-03 21:32:11 UTC
more discussion:
  https://github.com/jquast/wcwidth/issues/8
Comment 2 Troy Korjuslommi 2017-09-04 08:10:34 UTC
I reached a totally different conclusion from reading those links and
thinking of the wcwidth(SHY) situation for wcwidth().

When writing a curses/terminfo (terminal) application, one goes through
input and determines the width of text by iterating through the input
characters. If a word contains multiple U+00AD characters, at the end of
the line or not, the total width of the word ends up wrong if wcwidth is
set to 1. Therefore wcwidth(U+00AD) should return 0.

Also, using a SHY (U+00AD) character as a rendering hint seems to make
sense, since if a word is broken up with SHY characters, then a SHY
aware application can determine where to break the word, adding a
visible hyphen only at that position. A SHY non-aware application can
just ignore the SHY.

The Korpela article shed light on the confusion standard writers have
had with the issue. It seems clear to me that their intention has been
to add a character which can be used as a hint for breaking words
according to hyphenation rules. The imprecise wording used for
describing the solution has led to the current confusion. We should get
past the semantics of the standards' phrases and focus on the intent,
which is to allow authors to add hyphenation hints to text. 


Troy




On Sun, 2017-09-03 at 20:42 +0000, vapier at gentoo dot org wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22073
> 
>             Bug ID: 22073
>            Summary: charmaps/UTF-8: wcwidth of U+00AD: 0 or 1 ?
>            Product: glibc
>            Version: 2.26
>             Status: NEW
>           Severity: normal
>           Priority: P2
>          Component: localedata
>           Assignee: unassigned at sourceware dot org
>           Reporter: vapier at gentoo dot org
>                 CC: egmont at gmail dot com, libc-locales at sourceware dot org,
>                     maiku.fabian at gmail dot com, tg at mirbsd dot de
>         Depends on: 21750
>   Target Milestone: ---
> 
> +++ This bug was initially created as a clone of Bug #21750 +++
> 
> I’ve compared the new autogenerated column width from
> localedata/unicode-gen/utf8_gen.py with the results of the classical wcwidth()
> implementation from xterm (adjusted to Unicode 10.0.0) and found a few
> divergences (and bugs on my (MirBSD, which uses something based on xterm’s data
> system-wide) side, which I fixed).
> 
> U+00AD is forced to width 1 in xterm, autodetected as combining in glibc
> 
> Rationale for forcing it to 1 is likely that U+0000‥U+00FF are latin1, which,
> when displayed as 8bit on terminals, had no combining characters at all.
> 
> Change Request to glibc: force U+00AD to width 1.
> 
> more background discussion with different standards can be found here:
>   https://www.cs.tut.fi/~jkorpela/shy.html
> 
> 
> Referenced Bugs:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=21750
> [Bug 21750] column width of characters incompatible with classical wcwidth
Comment 3 Mike FABIAN 2017-09-04 08:16:22 UTC
Currently, in glibc master, we have set the width of U+00AD to 1.
Comment 4 Thorsten Glaser 2017-09-04 16:48:44 UTC
> When writing a curses/terminfo (terminal) application, one goes through
> input and determines the width of text by iterating through the input
> characters. If a word contains multiple U+00AD characters, at the end of
> the line or not, the total width of the word ends up wrong if wcwidth is
> set to 1. Therefore wcwidth(U+00AD) should return 0.

In your reading, everything but the conclusion is* correct.

*) if the application uses the soft hyphen char as soft hyphen


Basically, if the application decides U+00AD is expanded into a hyphen,
it must send a hyphen, NOT U+00AD, to the terminal, and if not, it must
sende no character to the terminal.

The reason here is that wcwidth() is the _width of the character ON THE
TERMINAL_ and not for use within the application. No terminal will break
on the soft hyphen, they’ll all break only at the last column in the
line; therefore, wcwidth of U+00AD *must* be 1.

Further reasons: compatibility with previous wcwidth implementations,
and that the first 256 chars are supposed to be latin1 which had a
wcwidth of 1 for all non-control characters (20‥7E, A0‥FF).
Comment 5 Mike Frysinger 2017-09-05 15:01:08 UTC
i don't think we have a choice here.  if the rest of the world is converging on the unicode standard view of the world, and it says 0, then we should do that as well.  trying to "take a stand" here won't help as long as the unicode consortium doesn't change, and i think they've settled the matter in their eyes.  if you want to deliberate the topic further, it'd probably be better spent doing so on their lists.

the unicode FAQ includes this entry [1] (which the korpela page called out):
Q: Unicode now treats the SOFT HYPHEN as format control (Cf) character when formerly it was a punctuation character (Pd). Doesn't this break ISO 8859-1 compatibility?
A: No. The ISO 8859-1 standard defines the SOFT HYPHEN as "[a] graphic character that is imaged by a graphic symbol identical with, or similar to, that representing hyphen" (section 6.3.3), but does not specify details of how or when it is to be displayed, nor other details of its semantics. The soft hyphen has had a long history of legacy implementation in two or more incompatible ways.
Unicode clarifies the semantics of this character for Unicode implementations, but this does not affect its usage in ISO 8859-1 implementations. Processes that convert back and forth may need to pay attention to semantic differences between the standards, just as for any other character.
In a terminal emulation environment, particularly in ISO-8859-1 contexts, one could display the soft hyphen as a hyphen in all circumstances. The change in semantics of the Unicode character does not require that implementations of terminal emulators in other environments, such as ISO 8859-1, make any change in their current behavior.

[1] http://www.unicode.org/faq/casemap_charprop.html#18

i think that answers the question here: in our UTF-8 charmaps, we should mark U+00AD as 0, but in our ISO 8859-1 (and other applicable legacy) charmaps, we should mark it as 1.
Comment 6 Thorsten Glaser 2017-09-06 14:38:06 UTC
Unicode does NOT define the column width of a char in the terminal. This shows in all those mailing list threads, in which they basically assume all fonts to be proportional.

wcwidth() however basically *is* the column width of a char in the terminal in a fixed-width cell layout.

The cōnsēnsus seems to be to ask _users_ avoid using U+00AD because of the two different histories in interpretation, and use something else for the separate purposes. That leaves us with needing a definition for this char *should* it appear anywhere still.

I’m arguing for 1 because:

• 0 is for combining characters and NUL only
• the “possible soft hyphen” reading of U+00AD is not a combining character
• compatibility with previous/older/other wcwidth() implementations, most importantly

The 0 fraction should not be at a loss here because:

• The char should be avoided already *anyway*
• Terminal emulators never implement wrapping at a “possible soft hyphen”, only at the end of the line
• Unicode data is still available elsewhere, this bugreport is precisely about wcwidth() which only “almost” aligns with the various Unicode datas (yes, I know, wrong plural, but I can’t think of anything better to express what I mean, right now)
Comment 7 Thorsten Glaser 2017-09-06 14:41:18 UTC
(In reply to Mike Frysinger from comment #5)

> i think that answers the question here: in our UTF-8 charmaps, we should
> mark U+00AD as 0, but in our ISO 8859-1 (and other applicable legacy)
> charmaps, we should mark it as 1.

That could get ugly, assume you have an application displaying latin1 data on a UTF-8 terminal (GNU screen comes to mind, or luit from XFree86®). Those map 0xAD to U+00AD not U+002D…

Given that mfabian as localedata maintainer of sorts has already accepted the change, does it really still be needed to be discussed? (The copyright form arrived last night btw, I’m sending it back to the FSF ASAP.)
Comment 8 Mike Frysinger 2017-09-07 07:14:21 UTC
(In reply to Thorsten Glaser from comment #6)

i'm aware wcwidth isn't explicitly defined by Unicode standards, but that doesn't mean they completely ignore it.  they discuss terminal emulators multiple times (including the SHY FAQ), and it's why things like EastAsianWidth.txt exist in the first place.  it's also pretty clear what the current Unicode standard is wrt their intentions to this codepoint.

> • 0 is for combining characters and NUL only

that is incorrect.  you mishandle Prepended_Concatenation_Mark (see bug 22070), and ignore Format Character (Cf) characters which are all 0 (or you're incorrectly claiming that Cf's are not combining characters).  and which U+00AD is classified as.

> • the “possible soft hyphen” reading of U+00AD is not a combining character

except that it is.  if Unicode wanted it to be an explicit hyphen, they would have kept its class as Pd (punctuation character), not changed it to Cf (format control).  they also wouldn't have described it explicitly as:
Soft Hyphen. Despite its name, U+00AD soft hyphen is not a hyphen, but rather an
invisible format character used to indicate optional intraword breaks.

> • compatibility with previous/older/other wcwidth() implementations, most
> importantly

appealing to historical wcwidth behavior isn't a great argument.  ones written to older Unicode standards are def wrong across many codepoints (emoji much?), and as i already mentioned, implementations converge on the latest Unicode releases.  all of which say this should be 0.

> • The char should be avoided already *anyway*
> • Terminal emulators never implement wrapping at a “possible soft hyphen”,
> only at the end of the line

then by your own argument, having it follow the Unicode standard is a non-issue

(In reply to Thorsten Glaser from comment #7)

if your terminal and the target application disagree about encoding then you've already lost.  everything above 0x7F will be wrong (0x80 != U+0080 or 0xc2 0x80).
Comment 9 Thorsten Glaser 2017-09-07 09:57:50 UTC
(In reply to Mike Frysinger from comment #8)

> > • 0 is for combining characters and NUL only
> 
> that is incorrect.  you mishandle Prepended_Concatenation_Mark (see bug
> 22070), and ignore Format Character (Cf) characters which are all 0 (or
> you're incorrectly claiming that Cf's are not combining characters).  and

OK, sorry about that. But xterm handles even those as such, basically
it combines the glyph for it (could be blank or just the dotted square)
over the preceding character, as they have no meaning for a terminal.


> > • compatibility with previous/older/other wcwidth() implementations, most
> > importantly
> 
> appealing to historical wcwidth behavior isn't a great argument.  ones

But this is more important than you make it sound.

> written to older Unicode standards

Sure, which is why I updated it to use the current Unicode data
as base, but there are a few cases which were specifically handled
explicitly different right from the start, and, with the changes
I described, mfabian’s code in glibc and mine in MirBSD come to
the same result modulo implementation differences.

(I also handle Prepended_Concatenation_Mark in MirBSD now in the
way you requested in bz#22070, so compatibility goes both ways.
My focus was on updating mgk25’s code in a compatible way, as to
not introduce any regressions; changes from later Unicode changes
are welcome, as are initial oversights such as this one (if it
existed back then), but as I said, U+00AD was special-handled
right from the beginning.)

> > • The char should be avoided already *anyway*
> > • Terminal emulators never implement wrapping at a “possible soft hyphen”,
> > only at the end of the line
> 
> then by your own argument, having it follow the Unicode standard is a

There is no Unicode standard for wcwidth().

> non-issue

It’s not because with 0, applications displaying a simple charmap
for the first page (i.e. latin1) fail on X'AD'.

> if your terminal and the target application disagree about encoding then
> you've already lost.  everything above 0x7F will be wrong (0x80 != U+0080 or
> 0xc2 0x80).

You did not understand what I wrote.

Tools like GNU screen and XFree86® luit can convert between the encodings,
so they’d convert an \xA0 from the program (meaning an 0x80 in latin1) to
a U+00A0 internally to a \xC2\xA0 in UTF-8 to the screen, and back.

The *definition* of these mappings maps 0xAD from latin1 to U+00AD, not to
U+002D. (Changing _this_ would also be unwise as there’d be no way to type
latin1 0xAD any more.)

Therefore, wcwidth(U+00AD) should stay at 1.

PS: Discussing this is really straining for me, and English is only my third non-programming language, so please read anything weird as I mean it, not as I formulated it.
Comment 10 Egmont Koblinger 2017-09-07 10:20:18 UTC
(In reply to Thorsten Glaser from comment #6)

> • The char should be avoided already *anyway*

Just wondering, isn't perhaps iswprint(0xAD) = 0, wcwidth(0xAD) = -1 also a sensible solution worth considering?
Comment 11 Egmont Koblinger 2017-09-07 16:50:05 UTC
To clarify my previous comment:

If compatibility is a concern then let's go with 1, I'm absolutely fine with that.

If compatibility is not such of a concern, it feels to me that -1 is a more reasonable choice than 0.

Basically, out of the three possibilities 0 is the one I find the least reasonable. As for the other two, my guts feeling tell me to go with the backwards compatible 1, however, you guys have way better arguments pro or con than guts feeling so I cannot join that discussion.
Comment 12 Egmont Koblinger 2017-09-07 16:55:45 UTC
To further clarify:

By "compatibility" I meant compatibility with existing legacy apps.

Compatibility with (or let's rather say: proper implementation of) the recent Unicode standard, if we're okay with dropping backwards compatibility, is where I feel -1 might perhaps be the best choice.

Either-or, I can't really see 0 being justified.
Comment 13 Mike FABIAN 2017-09-07 20:19:48 UTC
(In reply to Egmont Koblinger from comment #11)
> To clarify my previous comment:
> 
> If compatibility is a concern then let's go with 1, I'm absolutely fine with
> that.
> 
> If compatibility is not such of a concern, it feels to me that -1 is a more
> reasonable choice than 0.


> Basically, out of the three possibilities 0 is the one I find the least
> reasonable. As for the other two, my guts feeling tell me to go with the
> backwards compatible 1, however, you guys have way better arguments pro or
> con than guts feeling so I cannot join that discussion.

From the man page of wcwidth:

     The wcwidth() function returns  the number of columns needed
     to represent the wide character c.  If c is a printable wide
     character,  the value  is at  least 0.   If c  is null  wide
     character  (L'\0'),  the  value  is  0.   Otherwise,  -1  is
     returned.

The soft hyphen is printable (it is in the section “print” of LC_CTYPE
in localedata/locales/i18n), therefore the value returned by wcwidth()
is at least 0.  So -1 is not possible for the soft hyphen.
Comment 14 Egmont Koblinger 2017-09-07 20:25:41 UTC
My recommendation to consider was to make wcwidth return -1 and, in the mean time, mark it as non-printable. As far as I understand, properly written apps should never print this character, right?
Comment 15 Egmont Koblinger 2017-09-07 20:47:44 UTC
Don't get me wrong... I'm not saying that this is the solution we should go with. I'm not arguing that -1 is the best choice. I just wanted to make sure that this possibility is also considered.

If the final decision is 1, I'm absolutely fine with that.

If the final decision is 0, I wouldn't be that happy because then I think -1 is a better choice, however, I'd still accept that decision.

I justed wanted to give a heads up about a third possibility that probably wouldn't have been considered otherwise. The rest is up to you guys. Thanks for listening to me! :)
Comment 16 Mike FABIAN 2017-09-08 08:25:22 UTC
(In reply to Egmont Koblinger from comment #14)
> My recommendation to consider was to make wcwidth return -1 and, in the mean
> time, mark it as non-printable. As far as I understand, properly written
> apps should never print this character, right?

https://www.cs.tut.fi/~jkorpela/shy.html quotes ISO 8859-1 standard as:

>  The ISO 8859-1 standard defines, in section 6.3.3, both the graphic
>  presentation and the usage of soft hyphen, as follows:
> 
>     A graphic character that is imaged by a graphic symbol identical
>     with, or similar to, that representing hyphen, for use when a line
>     break has been established within a word.

So according to this, it should be printable.
Comment 17 Troy Korjuslommi 2017-09-11 11:06:10 UTC
I would like to point out that wcwidth of 1 for SHY would mean that
applications which haven't taken soft hyphens into consideration, as
they are rare in actual input, will display words with SHY in them very
awkwardly. Namely, as "the-os-o-phy" or "the os o phy." The actual
display will of course depend on the font in use. It can resemble a
hyphen or a space. Applications which are SHY aware, will of course
handle it separately, either breaking the word and adding a hyphen or
ignoring it.

I might add that I speculate that the reason SHY is so rarely used is
because of these kinds of disagreements over its display.

I don't see any disagreement over the intent of the SHY, so why not make
the lives of writers (who could then start including SHY in text) and
programmers (who would then find it worthwhile to write special handlers
for SHY).
Comment 18 Thorsten Glaser 2017-09-11 15:44:46 UTC
(In reply to Troy Korjuslommi from comment #17)

> I don't see any disagreement over the intent of the SHY, so why not make
> the lives of writers (who could then start including SHY in text) and

That is done even when wcwidth(U+00AD) == 1, because the application would
never send U+00AD to the tty but always either no character or one of the
other hyphen-ish codepoints.

In fact, if a font renders U+00AD different from U+002D, an SHY-aware
application might even PREFER it to have wcwidth 1 because then it COULD
send U+00AD to the tty *in the places where it expands to a hyphenation*
(and just omit it where not).

For GUI applications, wcwidth() is of no meaning anyway.

> awkwardly. Namely, as "the-os-o-phy" or "the os o phy." The actual

“the-os-o-phy” (“theo-sophy” is how I’d split it, though) is common for
the soft hyphenation point editing mode of word processors.
Comment 19 Troy Korjuslommi 2017-09-14 12:51:38 UTC
I was referring to non-SHY-aware apps. When iterating through input in
curses code, one needs wcwidth() for at least two reasons. One is to
calculate space needed to display a word, and the other is to determine
the position of the cursor (only applicable when input contains 2 column
wide characters). If SHY is wcwidth other than 0, the non-SHY-aware
applications will calculate the width incorrectly.

A non-SHY-aware application could easily add the U+00AD to the terminal,
and thus possibly cause cursor movement, and maybe even character
rendering, to occur.

An author who cares about grammar would actually hyphenate theosophy as
"the-o-so-phy." That was kind of my point, that words with more than two
syllables have two or more hyphens. And that hyphenation is a rule based
system, non-obvious and hard to guess, which is why SHY can be a useful
tool.
Comment 20 Egmont Koblinger 2017-09-14 13:04:44 UTC
(In reply to Troy Korjuslommi from comment #19)

> A non-SHY-aware application could easily add the U+00AD to the terminal,
> and thus possibly cause cursor movement, and maybe even character
> rendering, to occur.

There's two sides to this story: apps and terminal emulators. You seem to care about apps here, and forgot that altering wcwidth might have an effect on terminal emulators' behavior as well.

If all parties respect wcwidth() then either 0 or 1 is okay. In case of 0 the terminal emulator will not print anything nor advance the cursor, in accordance with what the app expects. In case of 1 the outcome again will be correct.

The story is about to foresee the impacts of apps as well as terminal emulators (and their combinations) that use hardcoded values rather than wcwidth. For example, I don't know if xterm always uses its built-in table, or only in certain cases; nor whether its author is open to adjust the table to follow what gets decided in this bugreport. There's also vte's (gnome-terminal's) issue of using glib's method instead, but I can most likely change that if really needed.

(Plus, again, let's not forget about the case of ssh'ing between different systems, potentially either one not even glibc-based.)
Comment 21 Thorsten Glaser 2017-09-14 16:34:07 UTC
(In reply to Egmont Koblinger from comment #20)

> (In reply to Troy Korjuslommi from comment #19)
> 
> > A non-SHY-aware application could easily add the U+00AD to the terminal,
> > and thus possibly cause cursor movement, and maybe even character
> > rendering, to occur.

Yes, that would be correct. The terminal is, in your terminology, *also*
a non-SHY-aware application.

> > wide characters). If SHY is wcwidth other than 0, the non-SHY-aware
> > applications will calculate the width incorrectly.

No, actually, if wcwidth is anything other than *1* they will calculate
it incorrectly, because, to a terminal, the character will always have
a constant width. (If wcwidth were 0 and an SHY-aware application were
to send U+00AD to the terminal in the place where a break DOES occur,
the terminal could NOT emit a space-using glyph otherwise.)

> There's two sides to this story: apps and terminal emulators. You seem to
> care about apps here, and forgot that altering wcwidth might have an effect
> on terminal emulators' behavior as well.
> 
> If all parties respect wcwidth() then either 0 or 1 is okay. In case of 0

Indeed, both use wcwith() and thus have to agree.

> (Plus, again, let's not forget about the case of ssh'ing between different
> systems, potentially either one not even glibc-based.)

One more point in favour of letting it stay at 1 to stay compatible with
everyone else in the world including previous releases.

> to follow what gets decided in this bugreport. There's also vte's
> (gnome-terminal's) issue of using glib's method instead, but I can most
> likely change that if really needed.

Either that, or add special handling of a couple of characters to vte…
it’ll likely handle stuff like direction changes or so already if it’s
not just a dumb terminal like xterm, so there’s bound to be a correct
place for it.
Comment 22 Egmont Koblinger 2017-09-14 18:25:47 UTC
(In reply to Thorsten Glaser from comment #21)

> Yes, that would be correct. The terminal is, in your terminology, *also*
> a non-SHY-aware application.

I'd rather not define this concept for terminals. They cannot make a choice in the sense client apps can.

Also for conciseness I'd reserve the word "app" or "application" for the client app that's running inside the terminal emulator, and not the terminal emulator itself in this discussion.

> No, actually, if wcwidth is anything other than *1* they will calculate
> it incorrectly, because, to a terminal, the character will always have
> a constant width. (If wcwidth were 0 and an SHY-aware application were
> to send U+00AD to the terminal in the place where a break DOES occur,
> the terminal could NOT emit a space-using glyph otherwise.)

Nope. SHY-aware apps by definition never send SHY to the terminal, they either send a regular hyphen U+2D or nothing at all, that's what makes them SHY-aware. (Especially since in several fonts the glyph of SHY is empty, it looks like a space.) If an app ever sends a SHY to the terminal emulator, it is SHY-unaware.

Hence for SHY-aware apps, wcwidth() of SHY is irrelevant.

For SHY-unaware ones it's important that what the application thinks will happen matches with what really happens in the terminal emulator. Both the application and the terminal emulator may or may not rely on wcwidth(), or the app may even rely on wcwidth() of a remote system.

> One more point in favour of letting it stay at 1 to stay compatible with
> everyone else in the world including previous releases.

I'm not arguing against 1 at all. In fact, my guts feeling tell me to go with 1 rather than 0. I just wouldn't want 0 being ditched with invalid arguments.

> Either that, or add special handling of a couple of characters to vte…
> it’ll likely handle stuff like direction changes or so already if it’s
> not just a dumb terminal like xterm, so there’s bound to be a correct
> place for it.

There's no BiDi in VTE, anyway, I wouldn't want to pollute this bugreport with this.
Comment 23 Thorsten Glaser 2017-09-14 19:03:51 UTC
> Nope. SHY-aware apps by definition never send SHY to the terminal, they either send a regu
> lar hyphen U+2D or nothing at all, that's what makes them SHY-aware. (Especially since in
> several fonts the glyph of SHY is empty, it looks like a space.) If an app ever sends a SH
> Y to the terminal emulator, it is SHY-unaware.
>
> Hence for SHY-aware apps, wcwidth() of SHY is irrelevant.

OK, granted, if that is the sense, you are, of course, correct.
(But that also means that, if it’s irrelevant for them, which,
again, if they send U+002D to the terminal instead, it is, then
all the more reason to stick to 1.)
Comment 24 Mike FABIAN 2017-09-15 08:22:28 UTC
(In reply to Thorsten Glaser from comment #23)
> > Nope. SHY-aware apps by definition never send SHY to the terminal, they either send a regu
> > lar hyphen U+2D or nothing at all, that's what makes them SHY-aware. (Especially since in
> > several fonts the glyph of SHY is empty, it looks like a space.) If an app ever sends a SH
> > Y to the terminal emulator, it is SHY-unaware.
> >
> > Hence for SHY-aware apps, wcwidth() of SHY is irrelevant.
> 
> OK, granted, if that is the sense, you are, of course, correct.
> (But that also means that, if it’s irrelevant for them, which,
> again, if they send U+002D to the terminal instead, it is, then
> all the more reason to stick to 1.)

Yes,that is really a good reason to stick to 1.
Comment 25 Mike FABIAN 2017-09-15 08:23:56 UTC
(In reply to Mike FABIAN from comment #24)
> (In reply to Thorsten Glaser from comment #23)
> > > Nope. SHY-aware apps by definition never send SHY to the terminal, they either send a regu
> > > lar hyphen U+2D or nothing at all, that's what makes them SHY-aware. (Especially since in
> > > several fonts the glyph of SHY is empty, it looks like a space.) If an app ever sends a SH
> > > Y to the terminal emulator, it is SHY-unaware.
> > >
> > > Hence for SHY-aware apps, wcwidth() of SHY is irrelevant.
> > 
> > OK, granted, if that is the sense, you are, of course, correct.
> > (But that also means that, if it’s irrelevant for them, which,
> > again, if they send U+002D to the terminal instead, it is, then
> > all the more reason to stick to 1.)
> 
> Yes,that is really a good reason to stick to 1.

So it looks like we have reached some agreement that width 1 is OK
for the soft hypen and I can close this bug as FIXED, right?

Closing as FIXED.