all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#64420: string-width of … is 2 in CJK environments
@ 2023-07-02 12:57 Dmitry Gutov
  2023-07-02 13:10 ` Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: Dmitry Gutov @ 2023-07-02 12:57 UTC (permalink / raw)
  To: 64420

Hi! This was reported to company-mode 
(https://github.com/company-mode/company-mode/issues/1388), as a 
scenario that makes the overlay-based completion popup misrender because 
the columns are not computed right when that char is present.

To repro:

   (set-language-environment "Chinese-BIG5")
   (string-width "…") ;; => 2

In the default language environment its width is reported to be 1.

This doesn't seem to make sense because it's rendered one column wide 
either way.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-02 12:57 bug#64420: string-width of … is 2 in CJK environments Dmitry Gutov
@ 2023-07-02 13:10 ` Eli Zaretskii
  2023-07-02 13:20   ` Dmitry Gutov
  2023-07-14  4:45 ` SUNG TAE KIM
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-02 13:10 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 64420

> Date: Sun, 2 Jul 2023 15:57:07 +0300
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> Hi! This was reported to company-mode 
> (https://github.com/company-mode/company-mode/issues/1388), as a 
> scenario that makes the overlay-based completion popup misrender because 
> the columns are not computed right when that char is present.
> 
> To repro:
> 
>    (set-language-environment "Chinese-BIG5")
>    (string-width "…") ;; => 2
> 
> In the default language environment its width is reported to be 1.
> 
> This doesn't seem to make sense because it's rendered one column wide 
> either way.

On GUI frames Lisp programs that need to know the actual width of some
string should use string-pixel-width, not string-width.  The latter is
basically only for TTY frames.

   (progn
     (set-language-environment "Chinese-BIG5")
     (ceiling (/ (string-pixel-width "…")
                 (float (default-font-width))))) ;; => 1





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-02 13:10 ` Eli Zaretskii
@ 2023-07-02 13:20   ` Dmitry Gutov
  2023-07-02 13:43     ` Eli Zaretskii
  0 siblings, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2023-07-02 13:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64420

On 02/07/2023 16:10, Eli Zaretskii wrote:
>> Date: Sun, 2 Jul 2023 15:57:07 +0300
>> From: Dmitry Gutov <dmitry@gutov.dev>
>>
>> Hi! This was reported to company-mode
>> (https://github.com/company-mode/company-mode/issues/1388), as a
>> scenario that makes the overlay-based completion popup misrender because
>> the columns are not computed right when that char is present.
>>
>> To repro:
>>
>>     (set-language-environment "Chinese-BIG5")
>>     (string-width "…") ;; => 2
>>
>> In the default language environment its width is reported to be 1.
>>
>> This doesn't seem to make sense because it's rendered one column wide
>> either way.
> 
> On GUI frames Lisp programs that need to know the actual width of some
> string should use string-pixel-width, not string-width.  The latter is
> basically only for TTY frames.
> 
>     (progn
>       (set-language-environment "Chinese-BIG5")
>       (ceiling (/ (string-pixel-width "…")
>                   (float (default-font-width))))) ;; => 1

Thank you.

Is there some inherent reason why string-width differs from the result 
of the above expression, and especially only does that on CJK?

Since the overlay-based popup is used on both GUI and Terminal frames, 
are you suggesting I define my own string-width like this?

(defun company--string-width (str)
   (if (display-graphic-p)
       (ceiling (/ (string-pixel-width str)
                   (float (default-font-width))))
     (string-width str)))






^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-02 13:20   ` Dmitry Gutov
@ 2023-07-02 13:43     ` Eli Zaretskii
  2023-07-07  2:13       ` Dmitry Gutov
  2023-07-11  2:23       ` Dmitry Gutov
  0 siblings, 2 replies; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-02 13:43 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 64420

> Date: Sun, 2 Jul 2023 16:20:25 +0300
> Cc: 64420@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> > On GUI frames Lisp programs that need to know the actual width of some
> > string should use string-pixel-width, not string-width.  The latter is
> > basically only for TTY frames.
> > 
> >     (progn
> >       (set-language-environment "Chinese-BIG5")
> >       (ceiling (/ (string-pixel-width "…")
> >                   (float (default-font-width))))) ;; => 1
> 
> Thank you.
> 
> Is there some inherent reason why string-width differs from the result 
> of the above expression

Because string-width doesn't consult the actual metrics of the font.
It uses a char-table that we set "by hand".

> and especially only does that on CJK?

In CJK locales, most characters are double-width because those locales
use fonts where the glyphs are wider.  Or at least this is the theory.
string-pixel-width is free from these assumptions because it actually
measures the font glyphs.

> Since the overlay-based popup is used on both GUI and Terminal frames, 
> are you suggesting I define my own string-width like this?
> 
> (defun company--string-width (str)
>    (if (display-graphic-p)
>        (ceiling (/ (string-pixel-width str)
>                    (float (default-font-width))))
>      (string-width str)))

Yes, definitely.  (Actually, display-multi-font-p is better than
display-graphic-p, but in practice they will return the same value.)





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-02 13:43     ` Eli Zaretskii
@ 2023-07-07  2:13       ` Dmitry Gutov
  2023-07-07  6:29         ` Eli Zaretskii
  2023-07-11  2:23       ` Dmitry Gutov
  1 sibling, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2023-07-07  2:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64420

Hi Eli,

On 02/07/2023 16:43, Eli Zaretskii wrote:
>> Is there some inherent reason why string-width differs from the result
>> of the above expression
> Because string-width doesn't consult the actual metrics of the font.
> It uses a char-table that we set "by hand".

Would it be appropriate to fix the entry for … in that table either way? 
Or does that not match the principle with which those entries are done?

>> and especially only does that on CJK?
> In CJK locales, most characters are double-width because those locales
> use fonts where the glyphs are wider.  Or at least this is the theory.
> string-pixel-width is free from these assumptions because it actually
> measures the font glyphs.

I'm guessing it's somewhat slower because of that too, but that doesn't 
seem like a problem so far.

>> Since the overlay-based popup is used on both GUI and Terminal frames,
>> are you suggesting I define my own string-width like this?
>>
>> (defun company--string-width (str)
>>     (if (display-graphic-p)
>>         (ceiling (/ (string-pixel-width str)
>>                     (float (default-font-width))))
>>       (string-width str)))
> Yes, definitely.  (Actually, display-multi-font-p is better than
> display-graphic-p, but in practice they will return the same value.)

Could you suggest a similar alternative to move-to-column? It's not 100% 
necessary, but we also have a piece of code where we take a width-aware 
substring from a buffer. And that logic uses 'move-to-column', which 
also has a problem with … in "Chinese-BIG5".





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-07  2:13       ` Dmitry Gutov
@ 2023-07-07  6:29         ` Eli Zaretskii
  2023-07-11  2:13           ` Dmitry Gutov
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-07  6:29 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 64420

> Date: Fri, 7 Jul 2023 05:13:50 +0300
> Cc: 64420@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> On 02/07/2023 16:43, Eli Zaretskii wrote:
> >> Is there some inherent reason why string-width differs from the result
> >> of the above expression
> > Because string-width doesn't consult the actual metrics of the font.
> > It uses a char-table that we set "by hand".
> 
> Would it be appropriate to fix the entry for … in that table either way? 

"Fix" in what way?  In most language-environments we get

  (char-width ?…) => 1

What's wrong with that?

> Or does that not match the principle with which those entries are done?

Sorry, I don't understand the question: what principle are you talking
about?

> >> and especially only does that on CJK?
> > In CJK locales, most characters are double-width because those locales
> > use fonts where the glyphs are wider.  Or at least this is the theory.
> > string-pixel-width is free from these assumptions because it actually
> > measures the font glyphs.
> 
> I'm guessing it's somewhat slower because of that too

It isn't.  The entries in char-width-table are set up when you switch
to the language-environment which requires that; see, for example,
lisp/language/chinese.el where we call set-language-info-alist for any
Chinese-* language-environment.

> >> (defun company--string-width (str)
> >>     (if (display-graphic-p)
> >>         (ceiling (/ (string-pixel-width str)
> >>                     (float (default-font-width))))
> >>       (string-width str)))
> > Yes, definitely.  (Actually, display-multi-font-p is better than
> > display-graphic-p, but in practice they will return the same value.)
> 
> Could you suggest a similar alternative to move-to-column?

Try this:

   (vertical-motion (cons (/ (float PIXELS) (default-font-width)) 0))

where PIXELS is the X coordinate in pixel units.  That is, make the
LINES argument of vertical-motion be a cons cell with its cdr zero and
its car the required horizontal position, a float, in units of the
frame's canonical character width.  vertical-motion works internally
in pixels when considering horizontal coordinates.

Caveat: vertical-motion uses _visual_ columns, relative to the
displayed portion of the line, so it differs from move-to-column when
the line is a continuation line, or is truncated on display, or the
window is hscrolled.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-07  6:29         ` Eli Zaretskii
@ 2023-07-11  2:13           ` Dmitry Gutov
  2023-07-11 11:41             ` Eli Zaretskii
  0 siblings, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2023-07-11  2:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64420

On 07/07/2023 09:29, Eli Zaretskii wrote:
>> Date: Fri, 7 Jul 2023 05:13:50 +0300
>> Cc: 64420@debbugs.gnu.org
>> From: Dmitry Gutov <dmitry@gutov.dev>
>>
>> On 02/07/2023 16:43, Eli Zaretskii wrote:
>>>> Is there some inherent reason why string-width differs from the result
>>>> of the above expression
>>> Because string-width doesn't consult the actual metrics of the font.
>>> It uses a char-table that we set "by hand".
>>
>> Would it be appropriate to fix the entry for … in that table either way?
> 
> "Fix" in what way?  In most language-environments we get
> 
>    (char-width ?…) => 1
> 
> What's wrong with that?

It returns 2 in Chinese-BIG5. While the actual metrics of the char don't 
change.

>> Or does that not match the principle with which those entries are done?
> 
> Sorry, I don't understand the question: what principle are you talking
> about?

The principles by which we fill in the said char-table which we fill "by 
hand". E.g. which characters to include, and which to leave with 
"automatic" metrics.

>>>> and especially only does that on CJK?
>>> In CJK locales, most characters are double-width because those locales
>>> use fonts where the glyphs are wider.  Or at least this is the theory.
>>> string-pixel-width is free from these assumptions because it actually
>>> measures the font glyphs.
>>
>> I'm guessing it's somewhat slower because of that too
> 
> It isn't.  The entries in char-width-table are set up when you switch
> to the language-environment which requires that; see, for example,
> lisp/language/chinese.el where we call set-language-info-alist for any
> Chinese-* language-environment.

What I meant is, string-lixel-width must be slower than string-width 
because it uses a temp buffer and actual measurements, whereas the 
latter function only does a table lookup, more or less (N times).

>>>> (defun company--string-width (str)
>>>>      (if (display-graphic-p)
>>>>          (ceiling (/ (string-pixel-width str)
>>>>                      (float (default-font-width))))
>>>>        (string-width str)))
>>> Yes, definitely.  (Actually, display-multi-font-p is better than
>>> display-graphic-p, but in practice they will return the same value.)
>>
>> Could you suggest a similar alternative to move-to-column?
> 
> Try this:
> 
>     (vertical-motion (cons (/ (float PIXELS) (default-font-width)) 0))

Thank you. I just uses the column values I was already working with. I'm 
trying whole-pixelwise addressing in the next version, but the better 
precision seems to necessitate a whole new approach, using 
string-pixel-width and the space :width display spec. Seems to be 
working okay too, in my brief testing.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-02 13:43     ` Eli Zaretskii
  2023-07-07  2:13       ` Dmitry Gutov
@ 2023-07-11  2:23       ` Dmitry Gutov
  2023-07-11 11:48         ` Eli Zaretskii
  1 sibling, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2023-07-11  2:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64420

On 02/07/2023 16:43, Eli Zaretskii wrote:
>> Since the overlay-based popup is used on both GUI and Terminal frames,
>> are you suggesting I define my own string-width like this?
>>
>> (defun company--string-width (str)
>>     (if (display-graphic-p)
>>         (ceiling (/ (string-pixel-width str)
>>                     (float (default-font-width))))
>>       (string-width str)))
> Yes, definitely.  (Actually, display-multi-font-p is better than
> display-graphic-p, but in practice they will return the same value.)

Regarding this approach, though: it seems to fail in my terminal Emacs.

Meaning, when I'm testing the feature using 'emacs -nw' (inside e.g. 
gnome-terminal), both (string-pixel-width "…") and (string-width "…") 
return 2. Whereas the character on display looks 1-character wide even 
there.

More than that, moving the cursor close to that character with C-f or 
C-b creates odd effects like the cursor jumping one position to the 
left, or a char being rendered twice at a certain position on the same 
line to the right of it (after I move the cursor there past the … char), 
in my case it's an opening paren. Nothing like that happens on the lines 
without this char, or after I switch the language env back to "English".

That happens in Emacs 29.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-11  2:13           ` Dmitry Gutov
@ 2023-07-11 11:41             ` Eli Zaretskii
  0 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-11 11:41 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 64420

> Date: Tue, 11 Jul 2023 05:13:57 +0300
> Cc: 64420@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> >> Would it be appropriate to fix the entry for … in that table either way?
> > 
> > "Fix" in what way?  In most language-environments we get
> > 
> >    (char-width ?…) => 1
> > 
> > What's wrong with that?
> 
> It returns 2 in Chinese-BIG5. While the actual metrics of the char don't 
> change.

I explained why this happens and why Emacs works that way.  If
something in my explanation is unclear, please ask more specific
questions.

> >> Or does that not match the principle with which those entries are done?
> > 
> > Sorry, I don't understand the question: what principle are you talking
> > about?
> 
> The principles by which we fill in the said char-table which we fill "by 
> hand". E.g. which characters to include, and which to leave with 
> "automatic" metrics.

We fill the table by hand, but the data is synchronized with the
Unicode Standard, and is reviewed each time we import a new Unicode
version.  The tweaking of the char-width tables in CJK locales is due
to the issue I explained in my previous message:

> >>> In CJK locales, most characters are double-width because those locales
> >>> use fonts where the glyphs are wider.  Or at least this is the theory.
> >>> string-pixel-width is free from these assumptions because it actually
> >>> measures the font glyphs.

> What I meant is, string-lixel-width must be slower than string-width 
> because it uses a temp buffer and actual measurements, whereas the 
> latter function only does a table lookup, more or less (N times).

It is slower, yes, but much more accurate.  TANSTAAFL.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-11  2:23       ` Dmitry Gutov
@ 2023-07-11 11:48         ` Eli Zaretskii
  2023-07-11 18:13           ` Dmitry Gutov
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-11 11:48 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 64420

> Date: Tue, 11 Jul 2023 05:23:03 +0300
> Cc: 64420@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> On 02/07/2023 16:43, Eli Zaretskii wrote:
> >> Since the overlay-based popup is used on both GUI and Terminal frames,
> >> are you suggesting I define my own string-width like this?
> >>
> >> (defun company--string-width (str)
> >>     (if (display-graphic-p)
> >>         (ceiling (/ (string-pixel-width str)
> >>                     (float (default-font-width))))
> >>       (string-width str)))
> > Yes, definitely.  (Actually, display-multi-font-p is better than
> > display-graphic-p, but in practice they will return the same value.)
> 
> Regarding this approach, though: it seems to fail in my terminal Emacs.

string-pixel-width is useless on TTY frames, because Emacs cannot
access the metrics of the characters on those frames.  In those cases
string-pixel-width falls back to use char-width, and you get the same
result.

> Meaning, when I'm testing the feature using 'emacs -nw' (inside e.g. 
> gnome-terminal), both (string-pixel-width "…") and (string-width "…") 
> return 2. Whereas the character on display looks 1-character wide even 
> there.

Once again, the assumption behind this "feature" of the CJK
language-environments is that whoever uses those environments has the
terminal emulators configured to use fonts where "…" and its ilk have
double size.  Of course, if you just switch language-environment on a
system that is otherwise configured for non-CJK locale, the terminal
emulator fonts will not magically change, and you get what you see.

> More than that, moving the cursor close to that character with C-f or 
> C-b creates odd effects like the cursor jumping one position to the 
> left, or a char being rendered twice at a certain position on the same 
> line to the right of it (after I move the cursor there past the … char), 

Yes, because we lie to the display engine about the character width.

If you worry that something in your package might not work well for
some users due to this issue, how about giving them a user-level
option to change the char-width of this character to 1?





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-11 11:48         ` Eli Zaretskii
@ 2023-07-11 18:13           ` Dmitry Gutov
  2023-07-11 18:45             ` Eli Zaretskii
  0 siblings, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2023-07-11 18:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64420

On 11/07/2023 14:48, Eli Zaretskii wrote:
>> Date: Tue, 11 Jul 2023 05:23:03 +0300
>> Cc: 64420@debbugs.gnu.org
>> From: Dmitry Gutov <dmitry@gutov.dev>
>>
>> On 02/07/2023 16:43, Eli Zaretskii wrote:
>>>> Since the overlay-based popup is used on both GUI and Terminal frames,
>>>> are you suggesting I define my own string-width like this?
>>>>
>>>> (defun company--string-width (str)
>>>>      (if (display-graphic-p)
>>>>          (ceiling (/ (string-pixel-width str)
>>>>                      (float (default-font-width))))
>>>>        (string-width str)))
>>> Yes, definitely.  (Actually, display-multi-font-p is better than
>>> display-graphic-p, but in practice they will return the same value.)
>>
>> Regarding this approach, though: it seems to fail in my terminal Emacs.
> 
> string-pixel-width is useless on TTY frames, because Emacs cannot
> access the metrics of the characters on those frames.  In those cases
> string-pixel-width falls back to use char-width, and you get the same
> result.

I guess that's the best we can do. This seems to work okay with most 
double-width characters, as long as the reported metrics match what 
happens on display.

And according to your explanation, we could probably drop the 
display-graphic-p check since both branches result in the same value on 
terminal (right?).

>> Meaning, when I'm testing the feature using 'emacs -nw' (inside e.g.
>> gnome-terminal), both (string-pixel-width "…") and (string-width "…")
>> return 2. Whereas the character on display looks 1-character wide even
>> there.
> 
> Once again, the assumption behind this "feature" of the CJK
> language-environments is that whoever uses those environments has the
> terminal emulators configured to use fonts where "…" and its ilk have
> double size.  Of course, if you just switch language-environment on a
> system that is otherwise configured for non-CJK locale, the terminal
> emulator fonts will not magically change, and you get what you see.

Does "…" actually have double width in some of their fonts?

This report stems from an issue opened on Github for company-mode (see 
the first message) from somebody who as I understand hails from one of 
those countries (I haven't clarified exactly), and they apparently have 
to work with the "Chinese-BIG5" language environment.

Are you saying that they misconfigured their system somehow, e.g. that 
Chinese-BIG5 is expected to be used with a certain set of default system 
fonts which have "…" at double width?

>> More than that, moving the cursor close to that character with C-f or
>> C-b creates odd effects like the cursor jumping one position to the
>> left, or a char being rendered twice at a certain position on the same
>> line to the right of it (after I move the cursor there past the … char),
> 
> Yes, because we lie to the display engine about the character width.
> 
> If you worry that something in your package might not work well for
> some users due to this issue, how about giving them a user-level
> option to change the char-width of this character to 1?

It's been suggested that we alter char-width-table dynamically too, as 
one option. I was just hoping to clarify that we don't carry an 
erroneous entry for this particular character.

If we did, it would be an easier solution for me to direct the users to 
the fix in Emacs 29/30, and delay the rollout of the new popup rendering 
feature a little bit. It will need a fair bit of testing period given 
the nature of the change.

Further, string-pixel-width and buffer-text-pixel-size have only been 
added in Emacs 29. Any chance you know some replacement I could use to 
backport the functionality to work in Emacs 25 or 26? 
buffer-text-pixel-size is defined in C.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-11 18:13           ` Dmitry Gutov
@ 2023-07-11 18:45             ` Eli Zaretskii
  2023-07-12  1:17               ` Dmitry Gutov
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-11 18:45 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 64420

> Date: Tue, 11 Jul 2023 21:13:26 +0300
> Cc: 64420@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> And according to your explanation, we could probably drop the 
> display-graphic-p check since both branches result in the same value on 
> terminal (right?).

You could drop it, yes.  But then string-width is faster, so maybe you
should keep it.

> > Once again, the assumption behind this "feature" of the CJK
> > language-environments is that whoever uses those environments has the
> > terminal emulators configured to use fonts where "…" and its ilk have
> > double size.  Of course, if you just switch language-environment on a
> > system that is otherwise configured for non-CJK locale, the terminal
> > emulator fonts will not magically change, and you get what you see.
> 
> Does "…" actually have double width in some of their fonts?

That's the assumption, yes.  (And not only this one character, you can
see which characters we assume have the same width in the function I
pointed out earlier in this thread, which we run when the
language-environment is switched to something CJK.)  It was definitely
correct at some point in the past, but the big question is whether it
is still correct.  I don't know who can tell us that nowadays.

> This report stems from an issue opened on Github for company-mode (see 
> the first message) from somebody who as I understand hails from one of 
> those countries (I haven't clarified exactly), and they apparently have 
> to work with the "Chinese-BIG5" language environment.
> 
> Are you saying that they misconfigured their system somehow, e.g. that 
> Chinese-BIG5 is expected to be used with a certain set of default system 
> fonts which have "…" at double width?

Either their systems are misconfigured, or the assumption about the
width of those characters is no longer true, at least not in a vast
enough majority of cases.  If we cannot get definitive answers, maybe
we should have an optional feature that disables the redefinition of
char-width for characters that Unicode does not define as "wide", and
then see whether someone still needs such tweaking of char-width.

> > If you worry that something in your package might not work well for
> > some users due to this issue, how about giving them a user-level
> > option to change the char-width of this character to 1?
> 
> It's been suggested that we alter char-width-table dynamically too, as 
> one option. I was just hoping to clarify that we don't carry an 
> erroneous entry for this particular character.

Whether it's "erroneous" or not depends on what fonts are actually
used.  char-width-table cannot know that, so we are guessing there.

> If we did, it would be an easier solution for me to direct the users to 
> the fix in Emacs 29/30, and delay the rollout of the new popup rendering 
> feature a little bit. It will need a fair bit of testing period given 
> the nature of the change.

We will not change the width in Emacs 29: that is too much for a
release branch, definitely at this point in the release cycle.  For
Emacs 30, if we want to change this, I'd rather do it as described
above, leaving the "fire escape" to get back the old behavior.  It
would be nice to hear from as many CJK users as possible which
characters in the widely used fonts are really double-width -- this
will help in the decision what exactly to change in
use-cjk-char-width-table.

> Further, string-pixel-width and buffer-text-pixel-size have only been 
> added in Emacs 29. Any chance you know some replacement I could use to 
> backport the functionality to work in Emacs 25 or 26? 
> buffer-text-pixel-size is defined in C.

You could use window-text-pixel-size instead.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-11 18:45             ` Eli Zaretskii
@ 2023-07-12  1:17               ` Dmitry Gutov
  2023-07-12 19:54                 ` Dmitry Gutov
  2023-07-12 21:11                 ` Yuan Fu
  0 siblings, 2 replies; 41+ messages in thread
From: Dmitry Gutov @ 2023-07-12  1:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64420

On 11/07/2023 21:45, Eli Zaretskii wrote:

>>> Once again, the assumption behind this "feature" of the CJK
>>> language-environments is that whoever uses those environments has the
>>> terminal emulators configured to use fonts where "…" and its ilk have
>>> double size.  Of course, if you just switch language-environment on a
>>> system that is otherwise configured for non-CJK locale, the terminal
>>> emulator fonts will not magically change, and you get what you see.
>>
>> Does "…" actually have double width in some of their fonts?
> 
> That's the assumption, yes.  (And not only this one character, you can
> see which characters we assume have the same width in the function I
> pointed out earlier in this thread, which we run when the
> language-environment is switched to something CJK.)  It was definitely
> correct at some point in the past, but the big question is whether it
> is still correct.  I don't know who can tell us that nowadays.

Whole ranges of characters, I see.

>> This report stems from an issue opened on Github for company-mode (see
>> the first message) from somebody who as I understand hails from one of
>> those countries (I haven't clarified exactly), and they apparently have
>> to work with the "Chinese-BIG5" language environment.
>>
>> Are you saying that they misconfigured their system somehow, e.g. that
>> Chinese-BIG5 is expected to be used with a certain set of default system
>> fonts which have "…" at double width?
> 
> Either their systems are misconfigured, or the assumption about the
> width of those characters is no longer true, at least not in a vast
> enough majority of cases.  If we cannot get definitive answers, maybe
> we should have an optional feature that disables the redefinition of
> char-width for characters that Unicode does not define as "wide", and
> then see whether someone still needs such tweaking of char-width.
> 
>>> If you worry that something in your package might not work well for
>>> some users due to this issue, how about giving them a user-level
>>> option to change the char-width of this character to 1?
>>
>> It's been suggested that we alter char-width-table dynamically too, as
>> one option. I was just hoping to clarify that we don't carry an
>> erroneous entry for this particular character.
> 
> Whether it's "erroneous" or not depends on what fonts are actually
> used.  char-width-table cannot know that, so we are guessing there.
> 
>> If we did, it would be an easier solution for me to direct the users to
>> the fix in Emacs 29/30, and delay the rollout of the new popup rendering
>> feature a little bit. It will need a fair bit of testing period given
>> the nature of the change.
> 
> We will not change the width in Emacs 29: that is too much for a
> release branch, definitely at this point in the release cycle.  For
> Emacs 30, if we want to change this, I'd rather do it as described
> above, leaving the "fire escape" to get back the old behavior.  It
> would be nice to hear from as many CJK users as possible which
> characters in the widely used fonts are really double-width -- this
> will help in the decision what exactly to change in
> use-cjk-char-width-table.

All right. I'll try to get more info from the issue reporter, at least.

>> Further, string-pixel-width and buffer-text-pixel-size have only been
>> added in Emacs 29. Any chance you know some replacement I could use to
>> backport the functionality to work in Emacs 25 or 26?
>> buffer-text-pixel-size is defined in C.
> 
> You could use window-text-pixel-size instead.

Either I'm doing something wrong, or this function's behavior was 
different in Emacs 28. There had been some changes to it during Emacs 
29's dev cycle, but I'm not sure which one would have that effect.

Anyway, with this definition:

(defun pixel-width (string)
   (if (zerop (length string))
       0
     ;; Keeping a work buffer around is more efficient than creating a
     ;; new temporary buffer.
     (with-current-buffer (get-buffer-create " *string-pixel-width*")
       ;; `display-line-numbers-mode' is enabled in internal buffers
       ;; that breaks width calculation, so need to disable (bug#59311)
       (when (bound-and-true-p display-line-numbers-mode)
         (display-line-numbers-mode -1))
       (delete-region (point-min) (point-max))
       (insert string)
       (save-window-excursion
         (set-window-buffer nil (current-buffer))
         (car
	 (window-text-pixel-size nil nil nil t))))))

In Emacs 29, (pixel-width "abc") returns 54 here (on a 4K screen).

But no matter what I do, it returns 0 in my Emacs 28.2 (from official 
tarball).

To get some more info: if I remove the 'car' call, the value that 
window-text-pixel-size returns is (54 . 36) in Emacs 29 and (0 . 108) in 
Emacs 28.2.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-12  1:17               ` Dmitry Gutov
@ 2023-07-12 19:54                 ` Dmitry Gutov
  2023-07-12 21:11                 ` Yuan Fu
  1 sibling, 0 replies; 41+ messages in thread
From: Dmitry Gutov @ 2023-07-12 19:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64420

Found the problem:

On 12/07/2023 04:17, Dmitry Gutov wrote:
>       (window-text-pixel-size nil nil nil t))))))

Looks like commit 61c254cafc9caa3b added the special meaning for the 
value t for arguments X-LIMIT and Y-LIMIT. It in the previous versions, 
I guess, it meant the same as 0. They also did't accept 
most-positive-fixnum, but worked okay with some lower integer values.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-12  1:17               ` Dmitry Gutov
  2023-07-12 19:54                 ` Dmitry Gutov
@ 2023-07-12 21:11                 ` Yuan Fu
  2023-07-13  5:23                   ` Eli Zaretskii
  1 sibling, 1 reply; 41+ messages in thread
From: Yuan Fu @ 2023-07-12 21:11 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, 64420



> On Jul 11, 2023, at 6:17 PM, Dmitry Gutov <dmitry@gutov.dev> wrote:
> 
> On 11/07/2023 21:45, Eli Zaretskii wrote:
> 
>>>> Once again, the assumption behind this "feature" of the CJK
>>>> language-environments is that whoever uses those environments has the
>>>> terminal emulators configured to use fonts where "…" and its ilk have
>>>> double size.  Of course, if you just switch language-environment on a
>>>> system that is otherwise configured for non-CJK locale, the terminal
>>>> emulator fonts will not magically change, and you get what you see.
>>> 
>>> Does "…" actually have double width in some of their fonts?
>> That's the assumption, yes.  (And not only this one character, you can
>> see which characters we assume have the same width in the function I
>> pointed out earlier in this thread, which we run when the
>> language-environment is switched to something CJK.)  It was definitely
>> correct at some point in the past, but the big question is whether it
>> is still correct.  I don't know who can tell us that nowadays.
> 
> Whole ranges of characters, I see.

Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph wide (like all CJK punctuation), ie, width=2.

However, it’s not as simple as “they used the wrong font”, because both Latin and CJK use the same Unicode code point for “…”, but expect different glyphs. In publication, this is solved by manually marking the text with style or font, so the software uses the desired glyph. Terminals and editors don’t have this luxury.

BTW it’s not just ellipses, CJK and Latin shares the same code points for quotes, em dash and middle dot while expecting different glyphs for them.

Since most terminal and editor (especially terminal) quires ASCII/Latin font before falling back to CJK fonts, I expect most terminal and editor to show the Latin glyph for “…” (width=1) most of the time.

So practically, it would be correct most of the time if we assume the following code points have a width of 1, regardless of locale:

– HORIZONTAL ELLIPSIS …
– LEFT/RIGHT DOUBLE QUOTATION MARK “”
– LEFT/RIGHT SINGLE QUOTATION MARK ‘’
– EM DASH —
– MIDDLE DOT ·

But obviously if someone configures their terminal or editor to use CJK font first, these characters MIGHT have width = 2. I said MIGHT because there are plenty CJK fonts that uses the 1-width Latin glyph for these characters by default.

It might be helpful to have a wrapper string-width that considers heuristics like this, while string-width goes strictly by Unicode and locale.

Source:
https://www.w3.org/TR/clreq/#table_of_non-bracket_indication_punctuation_marks

Yuan




^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-12 21:11                 ` Yuan Fu
@ 2023-07-13  5:23                   ` Eli Zaretskii
  2023-07-27  1:52                     ` Dmitry Gutov
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-13  5:23 UTC (permalink / raw)
  To: Yuan Fu; +Cc: dmitry, 64420

> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 12 Jul 2023 14:11:14 -0700
> Cc: Eli Zaretskii <eliz@gnu.org>,
>  64420@debbugs.gnu.org
> 
> Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph wide (like all CJK punctuation), ie, width=2.
> 
> However, it’s not as simple as “they used the wrong font”, because both Latin and CJK use the same Unicode code point for “…”, but expect different glyphs. In publication, this is solved by manually marking the text with style or font, so the software uses the desired glyph. Terminals and editors don’t have this luxury.
> 
> BTW it’s not just ellipses, CJK and Latin shares the same code points for quotes, em dash and middle dot while expecting different glyphs for them.
> 
> Since most terminal and editor (especially terminal) quires ASCII/Latin font before falling back to CJK fonts, I expect most terminal and editor to show the Latin glyph for “…” (width=1) most of the time.
> 
> So practically, it would be correct most of the time if we assume the following code points have a width of 1, regardless of locale:
> 
> – HORIZONTAL ELLIPSIS …
> – LEFT/RIGHT DOUBLE QUOTATION MARK “”
> – LEFT/RIGHT SINGLE QUOTATION MARK ‘’
> – EM DASH —
> – MIDDLE DOT ·
> 
> But obviously if someone configures their terminal or editor to use CJK font first, these characters MIGHT have width = 2. I said MIGHT because there are plenty CJK fonts that uses the 1-width Latin glyph for these characters by default.
> 
> It might be helpful to have a wrapper string-width that considers heuristics like this, while string-width goes strictly by Unicode and locale.

Thanks.  My conclusion from the above is a bit different: we should
introduce a user option to modify the behavior of
use-cjk-char-width-table, such that users who have fonts where these
characters are not double-width could have the width of these
characters left at their Unicode values.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-02 12:57 bug#64420: string-width of … is 2 in CJK environments Dmitry Gutov
  2023-07-02 13:10 ` Eli Zaretskii
@ 2023-07-14  4:45 ` SUNG TAE KIM
  2023-07-14  6:58   ` Eli Zaretskii
  2023-07-14  9:21 ` SUNG TAE KIM
  2023-07-16 16:59 ` SUNG TAE KIM
  3 siblings, 1 reply; 41+ messages in thread
From: SUNG TAE KIM @ 2023-07-14  4:45 UTC (permalink / raw)
  To: 64420

[-- Attachment #1: Type: text/plain, Size: 2579 bytes --]

Hi, I'm the issue(https://github.com/company-mode/company-mode/issues/1388)
reporter of emacs company package. I've been suggested to comment by the
project owner of the company package on the matter of
character-width-table. So, here's my thoughts.

There's many characters marked as A(ambiguous) width in the file  (
https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt) which is
one of the Unicode 15.0.0 Character Database. The characters inside the
general punctuation block (U+2000..U+206F) are marked as either N(Narrow)
or A(Ambiguous) width and the ellipsis character(U+2026) is marked as A.
Also there's a suggestion for rendering the ambiguous width unicode
character for Non-East Asian character in the Unicode 15.0.0 East Asian
Width Technical Report(http://www.unicode.org/reports/tr11/).

Quotes from the TR.

> 5 Recommendations
>
> When processing or displaying data
>
>  • Ambiguous characters behave like wide or narrow characters depending
on the context (language tag, script identification, associated font,
source of data, or explicit markup; all can provide the context). If the
context cannot be established reliably, they should be treated as narrow
characters by default.

My understanding of the report about the treatment of the ambiguous width
is that the context is paramount and the recommendation of the default is
narrow for the non-East Asian characters.

How about in practice? I've tested the rendering of a few ambiguous width
characters on some OSes - terminal.

macOS Mojave - builtin, kitty, iterm2
  Rendered as narrow character regardless of locale/font setting.

Windows 11 - old and new terminal
  Rendered as narrow character regardless of locale/font setting.

Ubuntu 20 - gnome-terminal
  User can set the width of ambiguous characters either narrow(default) or
wide through compatibility option.

I'm surprised gnome-terminal has this option. However, it seems incomplete
because when I try to delete an ambiguous width character rendered as a
wide one, the terminal masses up its cursor position whereas deleting a
wide character works fine.

So, I think the proper default width value of the ambiguous width
characters is narrow and there must be options for setting width for those
ambiguous width characters, but such change of default value might cause
breakage in the emacs packages which rely on the CJK language environment.

All in all, I think providing comprehensive options to change the width of
those ambiguous width characters will be desirable.

[-- Attachment #2: Type: text/html, Size: 2891 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-14  4:45 ` SUNG TAE KIM
@ 2023-07-14  6:58   ` Eli Zaretskii
  2023-07-16 11:51     ` Eli Zaretskii
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-14  6:58 UTC (permalink / raw)
  To: SUNG TAE KIM; +Cc: 64420

> From: SUNG TAE KIM <itaemu@gmail.com>
> Date: Fri, 14 Jul 2023 13:45:58 +0900
> 
> So, I think the proper default width value of the ambiguous width characters is narrow and there must
> be options for setting width for those ambiguous width characters, but such change of default value
> might cause breakage in the emacs packages which rely on the CJK language environment. 
> 
> All in all, I think providing comprehensive options to change the width of those ambiguous width
> characters will be desirable. 

Thanks, those are also my conclusions, as described here:

  https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64420#50

By default, Emacs already treats the ellipsis as a Narrow character,
and our current idea of "context" is the value of
language-environment, when the font information is not available.
Since Emacs doesn't currently support language tags or any other
feature which would allow the language to change on a per-buffer or
per-text region basis, the best we can do to allow finer-tuned width
of these characters is some kind of user customization, which assumes
that users know better which fonts are used by Emacs and by terminal
emulators they use for the Emacs TTY frames.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-02 12:57 bug#64420: string-width of … is 2 in CJK environments Dmitry Gutov
  2023-07-02 13:10 ` Eli Zaretskii
  2023-07-14  4:45 ` SUNG TAE KIM
@ 2023-07-14  9:21 ` SUNG TAE KIM
  2023-07-14 11:04   ` Eli Zaretskii
  2023-07-16 16:59 ` SUNG TAE KIM
  3 siblings, 1 reply; 41+ messages in thread
From: SUNG TAE KIM @ 2023-07-14  9:21 UTC (permalink / raw)
  To: eliz; +Cc: 64420

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

> By default, Emacs already treats the ellipsis as a Narrow character, and
our current idea of "context" is the value of language-environment, when
the font information is not available.

I'll try to clarify my opinion a bit more.

What I meant by default was default in the CJK language environment and the
default width of the ambiguous characters in CJK environment should be
narrow. Current emacs changes the width of ambiguous characters to wide if
the user activates the CJK environment. The unicode standard recommendation
is set the width narrow at unclear circumstances but emacs changes the
width to wide even if it can't know what font is currently used. For that
reason, I don't think such behavior is aligned well with the unicode
standard. Furthermore, The majority of the default width of those
characters in the CJK environment is narrow on contemporary implementation
of the terminals from my limited experience. However, Considering the emacs
package ecosystem, current emacs behavior is ok as long as there's an easy
option for changing such values.

I hope this makes sense.

[-- Attachment #2: Type: text/html, Size: 1186 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-14  9:21 ` SUNG TAE KIM
@ 2023-07-14 11:04   ` Eli Zaretskii
  2023-07-14 20:11     ` Yuan Fu
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-14 11:04 UTC (permalink / raw)
  To: SUNG TAE KIM; +Cc: 64420

> From: SUNG TAE KIM <itaemu@gmail.com>
> Date: Fri, 14 Jul 2023 18:21:42 +0900
> Cc: 64420@debbugs.gnu.org
> 
> What I meant by default was default in the CJK language environment and the default width of the
> ambiguous characters in CJK environment should be narrow. Current emacs changes the width of
> ambiguous characters to wide if the user activates the CJK environment. The unicode standard
> recommendation is set the width narrow at unclear circumstances but emacs changes the width to
> wide even if it can't know what font is currently used. For that reason, I don't think such behavior is
> aligned well with the unicode standard.

We don't blindly follow the Unicode Standard.  We seriously consider
its recommendations, and then do whatever we think is best for our
users.

> Furthermore, The majority of the default width of those
> characters in the CJK environment is narrow on contemporary implementation of the terminals from
> my limited experience. However, Considering the emacs package ecosystem, current emacs
> behavior is ok as long as there's an easy option for changing such values.

It is not yet clear to me whether handling these characters as narrow
by default in CJK language-environments is TRT.  But adding an option
to do so is a first step in that direction, if indeed this is the
right direction: we can in the future make this optional behavior be
the default, if we arrive at the conclusion that most users configure
their fonts and their terminal emulators such that these characters
have the narrow width.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-14 11:04   ` Eli Zaretskii
@ 2023-07-14 20:11     ` Yuan Fu
  0 siblings, 0 replies; 41+ messages in thread
From: Yuan Fu @ 2023-07-14 20:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: SUNG TAE KIM, 64420



> On Jul 14, 2023, at 4:04 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: SUNG TAE KIM <itaemu@gmail.com>
>> Date: Fri, 14 Jul 2023 18:21:42 +0900
>> Cc: 64420@debbugs.gnu.org
>> 
>> What I meant by default was default in the CJK language environment and the default width of the
>> ambiguous characters in CJK environment should be narrow. Current emacs changes the width of
>> ambiguous characters to wide if the user activates the CJK environment. The unicode standard
>> recommendation is set the width narrow at unclear circumstances but emacs changes the width to
>> wide even if it can't know what font is currently used. For that reason, I don't think such behavior is
>> aligned well with the unicode standard.
> 
> We don't blindly follow the Unicode Standard.  We seriously consider
> its recommendations, and then do whatever we think is best for our
> users.
> 
>> Furthermore, The majority of the default width of those
>> characters in the CJK environment is narrow on contemporary implementation of the terminals from
>> my limited experience. However, Considering the emacs package ecosystem, current emacs
>> behavior is ok as long as there's an easy option for changing such values.
> 
> It is not yet clear to me whether handling these characters as narrow
> by default in CJK language-environments is TRT.  But adding an option
> to do so is a first step in that direction, if indeed this is the
> right direction: we can in the future make this optional behavior be
> the default, if we arrive at the conclusion that most users configure
> their fonts and their terminal emulators such that these characters
> have the narrow width.

I tend to agree with Sung Tae, but this sounds like a reasonable compromise to me.

Yuan




^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-14  6:58   ` Eli Zaretskii
@ 2023-07-16 11:51     ` Eli Zaretskii
  0 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-16 11:51 UTC (permalink / raw)
  To: itaemu, casouri; +Cc: 64420

> Cc: 64420@debbugs.gnu.org
> Date: Fri, 14 Jul 2023 09:58:42 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> By default, Emacs already treats the ellipsis as a Narrow character,
> and our current idea of "context" is the value of
> language-environment, when the font information is not available.
> Since Emacs doesn't currently support language tags or any other
> feature which would allow the language to change on a per-buffer or
> per-text region basis, the best we can do to allow finer-tuned width
> of these characters is some kind of user customization, which assumes
> that users know better which fonts are used by Emacs and by terminal
> emulators they use for the Emacs TTY frames.

Would someone please go over the characters whose width is marked as
"ambiguous" ("A") in Unicode's EastAsianWidth.txt file, and tell which
ones of them we should make single-column, when the above mentioned
user options tells us to default to "narrow"?  I think all those up to
codepoint #x324F should be treated like that, but maybe I decided
wrong?

TIA





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-02 12:57 bug#64420: string-width of … is 2 in CJK environments Dmitry Gutov
                   ` (2 preceding siblings ...)
  2023-07-14  9:21 ` SUNG TAE KIM
@ 2023-07-16 16:59 ` SUNG TAE KIM
  2023-07-16 17:15   ` Eli Zaretskii
  3 siblings, 1 reply; 41+ messages in thread
From: SUNG TAE KIM @ 2023-07-16 16:59 UTC (permalink / raw)
  To: eliz, casouri; +Cc: 64420

I see no issue in changing default width of ambiguous characters to
narrow except variation selector blocks(FE00..FE0F, E0100..E01EF) and
private-use blocks(E000..F8FF, F0000..FFFFD, 100000..10FFFD) because
the characters in the former blocks are not standalone[1] and the
characters of the latter blocks are reserved for 3rd-party and
everything else seems standalone characters.

[1] https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-16 16:59 ` SUNG TAE KIM
@ 2023-07-16 17:15   ` Eli Zaretskii
  2023-08-05 15:01     ` Eli Zaretskii
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-07-16 17:15 UTC (permalink / raw)
  To: SUNG TAE KIM; +Cc: casouri, 64420

> From: SUNG TAE KIM <itaemu@gmail.com>
> Date: Mon, 17 Jul 2023 01:59:15 +0900
> Cc: 64420@debbugs.gnu.org
> 
> I see no issue in changing default width of ambiguous characters to
> narrow except variation selector blocks(FE00..FE0F, E0100..E01EF) and
> private-use blocks(E000..F8FF, F0000..FFFFD, 100000..10FFFD) because
> the characters in the former blocks are not standalone[1] and the
> characters of the latter blocks are reserved for 3rd-party and
> everything else seems standalone characters.
> 
> [1] https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)

Thanks.

I said I intend to end at #x324F because use-cjk-char-width-table
doesn't touch ambiguous characters with higher codepoints, so they are
already narrow in Emacs, and we don't need to "fix" them.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-13  5:23                   ` Eli Zaretskii
@ 2023-07-27  1:52                     ` Dmitry Gutov
  0 siblings, 0 replies; 41+ messages in thread
From: Dmitry Gutov @ 2023-07-27  1:52 UTC (permalink / raw)
  To: Eli Zaretskii, Yuan Fu; +Cc: 64420

On 13/07/2023 08:23, Eli Zaretskii wrote:
>> From: Yuan Fu<casouri@gmail.com>
>> Date: Wed, 12 Jul 2023 14:11:14 -0700
>> Cc: Eli Zaretskii<eliz@gnu.org>,
>>   64420@debbugs.gnu.org
>>
>> Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph wide (like all CJK punctuation), ie, width=2.
>>
>> However, it’s not as simple as “they used the wrong font”, because both Latin and CJK use the same Unicode code point for “…”, but expect different glyphs. In publication, this is solved by manually marking the text with style or font, so the software uses the desired glyph. Terminals and editors don’t have this luxury.
>>
>> BTW it’s not just ellipses, CJK and Latin shares the same code points for quotes, em dash and middle dot while expecting different glyphs for them.
>>
>> Since most terminal and editor (especially terminal) quires ASCII/Latin font before falling back to CJK fonts, I expect most terminal and editor to show the Latin glyph for “…” (width=1) most of the time.
>>
>> So practically, it would be correct most of the time if we assume the following code points have a width of 1, regardless of locale:
>>
>> – HORIZONTAL ELLIPSIS …
>> – LEFT/RIGHT DOUBLE QUOTATION MARK “”
>> – LEFT/RIGHT SINGLE QUOTATION MARK ‘’
>> – EM DASH —
>> – MIDDLE DOT ·
>>
>> But obviously if someone configures their terminal or editor to use CJK font first, these characters MIGHT have width = 2. I said MIGHT because there are plenty CJK fonts that uses the 1-width Latin glyph for these characters by default.
>>
>> It might be helpful to have a wrapper string-width that considers heuristics like this, while string-width goes strictly by Unicode and locale.
> Thanks.  My conclusion from the above is a bit different: we should
> introduce a user option to modify the behavior of
> use-cjk-char-width-table, such that users who have fonts where these
> characters are not double-width could have the width of these
> characters left at their Unicode values.

We could add an option, and then go with the default value which 
corresponds to whatever seems the common opinion here.

Anyway, it doesn't seem like anybody else in this discussion is better 
equipped to choose that user option's name, or write the rest of the patch.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-07-16 17:15   ` Eli Zaretskii
@ 2023-08-05 15:01     ` Eli Zaretskii
  2023-08-10 21:58       ` Yuan Fu
  2023-08-11 23:52       ` Dmitry Gutov
  0 siblings, 2 replies; 41+ messages in thread
From: Eli Zaretskii @ 2023-08-05 15:01 UTC (permalink / raw)
  To: dmitry; +Cc: itaemu, casouri, 64420

> Cc: casouri@gmail.com, 64420@debbugs.gnu.org
> Date: Sun, 16 Jul 2023 20:15:30 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > From: SUNG TAE KIM <itaemu@gmail.com>
> > Date: Mon, 17 Jul 2023 01:59:15 +0900
> > Cc: 64420@debbugs.gnu.org
> > 
> > I see no issue in changing default width of ambiguous characters to
> > narrow except variation selector blocks(FE00..FE0F, E0100..E01EF) and
> > private-use blocks(E000..F8FF, F0000..FFFFD, 100000..10FFFD) because
> > the characters in the former blocks are not standalone[1] and the
> > characters of the latter blocks are reserved for 3rd-party and
> > everything else seems standalone characters.
> > 
> > [1] https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)
> 
> Thanks.
> 
> I said I intend to end at #x324F because use-cjk-char-width-table
> doesn't touch ambiguous characters with higher codepoints, so they are
> already narrow in Emacs, and we don't need to "fix" them.

OK, this is now installed on master.  We have a new user option named
cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the
characters proclaimed by Unicode as "ambiguous" will have char-width
of 1, not 2.  Note that this option should be set either via 'setopt'
or the Customize interface, not via 'setq'.

Let me know how well this works for you.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-05 15:01     ` Eli Zaretskii
@ 2023-08-10 21:58       ` Yuan Fu
  2023-08-11  5:53         ` Eli Zaretskii
  2023-08-11 23:52       ` Dmitry Gutov
  1 sibling, 1 reply; 41+ messages in thread
From: Yuan Fu @ 2023-08-10 21:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: SUNG TAE KIM, Dmitry Gutov, 64420



> On Aug 5, 2023, at 8:01 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> Cc: casouri@gmail.com, 64420@debbugs.gnu.org
>> Date: Sun, 16 Jul 2023 20:15:30 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>> 
>>> From: SUNG TAE KIM <itaemu@gmail.com>
>>> Date: Mon, 17 Jul 2023 01:59:15 +0900
>>> Cc: 64420@debbugs.gnu.org
>>> 
>>> I see no issue in changing default width of ambiguous characters to
>>> narrow except variation selector blocks(FE00..FE0F, E0100..E01EF) and
>>> private-use blocks(E000..F8FF, F0000..FFFFD, 100000..10FFFD) because
>>> the characters in the former blocks are not standalone[1] and the
>>> characters of the latter blocks are reserved for 3rd-party and
>>> everything else seems standalone characters.
>>> 
>>> [1] https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)
>> 
>> Thanks.
>> 
>> I said I intend to end at #x324F because use-cjk-char-width-table
>> doesn't touch ambiguous characters with higher codepoints, so they are
>> already narrow in Emacs, and we don't need to "fix" them.
> 
> OK, this is now installed on master.  We have a new user option named
> cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the
> characters proclaimed by Unicode as "ambiguous" will have char-width
> of 1, not 2.  Note that this option should be set either via 'setopt'
> or the Customize interface, not via 'setq'.
> 
> Let me know how well this works for you.

Thanks! I can’t tell you how well it works tho since I don’t use company :-)

Yuan




^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-10 21:58       ` Yuan Fu
@ 2023-08-11  5:53         ` Eli Zaretskii
  2023-08-11 18:07           ` Yuan Fu
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-08-11  5:53 UTC (permalink / raw)
  To: Yuan Fu; +Cc: itaemu, dmitry, 64420

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 10 Aug 2023 14:58:37 -0700
> Cc: Dmitry Gutov <dmitry@gutov.dev>,
>  SUNG TAE KIM <itaemu@gmail.com>,
>  64420@debbugs.gnu.org
> 
> > OK, this is now installed on master.  We have a new user option named
> > cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the
> > characters proclaimed by Unicode as "ambiguous" will have char-width
> > of 1, not 2.  Note that this option should be set either via 'setopt'
> > or the Customize interface, not via 'setq'.
> > 
> > Let me know how well this works for you.
> 
> Thanks! I can’t tell you how well it works tho since I don’t use company :-)

You don't need company to see if this works well for you.  Just use
string-width or even char-width with some problematic characters (you
can find the list of them in characters.el, search for "ambiguous"),
and compare the results when this new variable is nil and non-nil.
I'm interested to know how many people need the variable to be non-nil
(its default) to have the width match the fonts they use in Emacs,
both in GUI and in TTY frames, since there's the claim that no one
needs those characters be considered full-width nowadays.  If that
claim is correct, we should consider changing the default value of
this variable in Emacs 30.

TIA





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-11  5:53         ` Eli Zaretskii
@ 2023-08-11 18:07           ` Yuan Fu
  2023-08-11 18:36             ` Eli Zaretskii
                               ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Yuan Fu @ 2023-08-11 18:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: itaemu, dmitry, 64420

[-- Attachment #1: Type: text/plain, Size: 2763 bytes --]



> On Aug 10, 2023, at 10:53 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 10 Aug 2023 14:58:37 -0700
>> Cc: Dmitry Gutov <dmitry@gutov.dev>,
>> SUNG TAE KIM <itaemu@gmail.com>,
>> 64420@debbugs.gnu.org
>> 
>>> OK, this is now installed on master.  We have a new user option named
>>> cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the
>>> characters proclaimed by Unicode as "ambiguous" will have char-width
>>> of 1, not 2.  Note that this option should be set either via 'setopt'
>>> or the Customize interface, not via 'setq'.
>>> 
>>> Let me know how well this works for you.
>> 
>> Thanks! I can’t tell you how well it works tho since I don’t use company :-)
> 
> You don't need company to see if this works well for you.  Just use
> string-width or even char-width with some problematic characters (you
> can find the list of them in characters.el, search for "ambiguous"),
> and compare the results when this new variable is nil and non-nil.
> I'm interested to know how many people need the variable to be non-nil
> (its default) to have the width match the fonts they use in Emacs,
> both in GUI and in TTY frames, since there's the claim that no one
> needs those characters be considered full-width nowadays.  If that
> claim is correct, we should consider changing the default value of
> this variable in Emacs 30.

On my machine, all the ambiguous characters have width of 1, even with the default value of cjk-ambiguous-chars-are-wide (I use utf8_en locale). That’s expected.

I tried printing all the ambiguous characters, I attached a screenshot of them (the first line is a line of CJK characters for reference). (Scrrenshot-1.png, screenshot-2.png)

On terminal, I saw an interesting option, “Ambiguous characters are double-width” (terminal-setting.png), which is the same as cjk-ambiguous-chars-are-wide. If I turn it on all the ambiguous characters are indeed displayed in double-width. (terminal-narrow.png, terminal-wide.png)

On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway.

On terminal, at least iterm2 displays ambiguous characters as single-width by default, (I assume) regardless of locale. And it displays a warning when you try to turn the “Ambiguous characters are double-width” option [1].

Yuan

[1] "You probably don't want to turn this on. It will confuse interactive programs. You might want it if you work mostly with East Asian text combined with legacy or mathematical character sets. Are you sure you want this?"


[-- Attachment #2: terminal-wide.png --]
[-- Type: image/png, Size: 193958 bytes --]

[-- Attachment #3: terminal-narrow.png --]
[-- Type: image/png, Size: 165356 bytes --]

[-- Attachment #4: terminal-setting.png --]
[-- Type: image/png, Size: 243101 bytes --]

[-- Attachment #5: screenshot-2.png --]
[-- Type: image/png, Size: 141983 bytes --]

[-- Attachment #6: screenshot-1.png --]
[-- Type: image/png, Size: 186883 bytes --]

[-- Attachment #7: amiguous-width.txt --]
[-- Type: text/plain, Size: 2355 bytes --]

中文中国中文中国中文中国中文中国
¡¤§¨ª­®°±²³´¶·¸¹º¼½¾
¿ÆÐ×Øàáæèéêìíðòó÷øùú
üþāđēěĦħīıIJijĸĿŀŁłńňʼn
ŊŋōŒœŦŧūǎǐǒǔǖǘǚǜɑɡ˄ˇ
ˉˊˋˍː˘˙˚˛˝˟̀́̂̃̄̅̆̇̈
̛̖̗̘̙̜̉̊̋̌̍̎̏̐̑̒̓̔̕̚
̡̢̧̨̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̰
̴̵̶̷̸̱̲̳̹̺̻̼̽̾̿̀́͂̓̈́
͇͈͉͍͎͆͊͋͌ͅ͏͓͔͕͖͐͑͒͗͘
͙͚͛ͣͤͥͦͧͨͩͪͫͬ͜͟͢͝͞͠͡
ͭͮͯΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡ
ΣΤΥΦΧΨΩαβγδεζηθικλμν
ξοπρστυφχψωЁАБВГДЕЖЗ
ИЙКЛМНОПРСТУФХЦЧШЩЪЫ
ЬЭЮЯабвгдежзийклмноп
рстуфхцчшщъыьэюяё‐–—
―‖‘’“”†‡•․‥…‧‰′″‵‾⁴ⁿ
₁₂₃₄€℃℅℉ℓ№℡™ΩÅ⅓⅔⅛⅜⅝⅞
ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅰⅱⅲⅳⅴⅵⅶⅷ
ⅸⅹ↉←↑→↓↔↕↖↗↘↙↸↹⇒⇔⇧∀∂
∃∇∈∋∏∑∕√∝∞∟∠∣∥∧∨∩∪∫∬
∮∴∵∶∷∼∽≈≌≒≠≡≤≥≦≧≪≫≮≯
⊂⊃⊆⊇⊕⊙⊥⊿⌒①②③④⑤⑥⑦⑧⑨⑩⑪
⑫⑬⑭⑮⑯⑰⑱⑲⑳⑴⑵⑶⑷⑸⑹⑺⑻⑼⑽⑾
⑿⒀⒁⒂⒃⒄⒅⒆⒇⒈⒉⒊⒋⒌⒍⒎⒏⒐⒑⒒
⒓⒔⒕⒖⒗⒘⒙⒚⒛⒜⒝⒞⒟⒠⒡⒢⒣⒤⒥⒦
⒧⒨⒩⒪⒫⒬⒭⒮⒯⒰⒱⒲⒳⒴⒵ⒶⒷⒸⒹⒺ
ⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎ
Ⓩⓐⓑⓒⓓⓔⓕⓖⓗⓘⓙⓚⓛⓜⓝⓞⓟⓠⓡⓢ
ⓣⓤⓥⓦⓧⓨⓩ⓫⓬⓭⓮⓯⓰⓱⓲⓳⓴⓵⓶⓷
⓸⓹⓺⓻⓼⓽⓾⓿─━│┃┄┅┆┇┈┉┊┋
┌┍┎┏┐┑┒┓└┕┖┗┘┙┚┛├┝┞┟
┠┡┢┣┤┥┦┧┨┩┪┫┬┭┮┯┰┱┲┳
┴┵┶┷┸┹┺┻┼┽┾┿╀╁╂╃╄╅╆╇
╈╉╊╋═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟
╠╡╢╣╤╥╦╧╨╩╪╫╬╭╮╯╰╱╲╳
▀▁▂▃▄▅▆▇█▉▊▋▌▍▎▏▒▓▔▕
■□▣▤▥▦▧▨▩▲△▶▷▼▽◀◁◆◇◈
◎●◐◑◢◣◤◥◯★☆☎☏☜☞♀♂♠♡♣
♤♥♧♨♩♪♬♭♯⚞⚟⚿⛆⛇⛈⛉⛊⛋⛌⛍
⛏⛐⛑⛒⛓⛕⛖⛗⛘⛙⛚⛛⛜⛝⛞⛟⛠⛡⛣⛨
⛩⛫⛬⛭⛮⛯⛰⛱⛴⛶⛷⛸⛹⛻⛼⛾⛿✽❶❷
❸❹❺❻❼❽❾❿⭖⭗⭘⭙㉈㉉㉊㉋㉌㉍㉎㉏

^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-11 18:07           ` Yuan Fu
@ 2023-08-11 18:36             ` Eli Zaretskii
  2023-08-12 20:18               ` Yuan Fu
  2023-08-11 22:34             ` Dmitry Gutov
  2023-08-13  0:22             ` Dmitry Gutov
  2 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-08-11 18:36 UTC (permalink / raw)
  To: Yuan Fu; +Cc: itaemu, dmitry, 64420

> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 11 Aug 2023 11:07:26 -0700
> Cc: dmitry@gutov.dev,
>  itaemu@gmail.com,
>  64420@debbugs.gnu.org
> 
> On terminal, I saw an interesting option, “Ambiguous characters are double-width” (terminal-setting.png), which is the same as cjk-ambiguous-chars-are-wide. If I turn it on all the ambiguous characters are indeed displayed in double-width. (terminal-narrow.png, terminal-wide.png)

And in that case, you need to set cjk-ambiguous-chars-are-wide non-nil
to have Emacs display those characters correctly?  Or does that option
have no effect on the correctness of the |Emacs display on that
terminal?

> On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway.

Is the actual width closer to 1 or to 2?

Thanks.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-11 18:07           ` Yuan Fu
  2023-08-11 18:36             ` Eli Zaretskii
@ 2023-08-11 22:34             ` Dmitry Gutov
  2023-08-13  0:22             ` Dmitry Gutov
  2 siblings, 0 replies; 41+ messages in thread
From: Dmitry Gutov @ 2023-08-11 22:34 UTC (permalink / raw)
  To: Yuan Fu, Eli Zaretskii; +Cc: itaemu, 64420

On 11/08/2023 21:07, Yuan Fu wrote:
> On my machine, all the ambiguous characters have width of 1, even with the default value of cjk-ambiguous-chars-are-wide (I use utf8_en locale).

What if you start the test with

(set-language-environment "Chinese-BIG5")

?





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-05 15:01     ` Eli Zaretskii
  2023-08-10 21:58       ` Yuan Fu
@ 2023-08-11 23:52       ` Dmitry Gutov
  2023-08-12  5:50         ` Eli Zaretskii
  1 sibling, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2023-08-11 23:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: itaemu, casouri, 64420

On 05/08/2023 18:01, Eli Zaretskii wrote:
> OK, this is now installed on master.  We have a new user option named
> cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the
> characters proclaimed by Unicode as "ambiguous" will have char-width
> of 1, not 2.  Note that this option should be set either via 'setopt'
> or the Customize interface, not via 'setq'.
> 
> Let me know how well this works for you.

Seems to work fine, thank you.

With the caveat that, in the terminal, if I switch to Chinese-BIG5 and 
visit a file with ambiguous characters like … (which triggers some bugs 
with display and navigation around those chars), (setopt 
cjk-ambiguous-chars-are-wide nil) doesn't fix those bugs -- I have to 
kill and re-visit the buffer for them to go away. But maybe that's expected.

In GUI everything's fine, the 'setopt' call makes things better right away.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-11 23:52       ` Dmitry Gutov
@ 2023-08-12  5:50         ` Eli Zaretskii
  2023-08-12 16:40           ` Dmitry Gutov
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-08-12  5:50 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: itaemu, casouri, 64420

> Date: Sat, 12 Aug 2023 02:52:29 +0300
> Cc: itaemu@gmail.com, casouri@gmail.com, 64420@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> On 05/08/2023 18:01, Eli Zaretskii wrote:
> > OK, this is now installed on master.  We have a new user option named
> > cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the
> > characters proclaimed by Unicode as "ambiguous" will have char-width
> > of 1, not 2.  Note that this option should be set either via 'setopt'
> > or the Customize interface, not via 'setq'.
> > 
> > Let me know how well this works for you.
> 
> Seems to work fine, thank you.
> 
> With the caveat that, in the terminal, if I switch to Chinese-BIG5 and 
> visit a file with ambiguous characters like … (which triggers some bugs 
> with display and navigation around those chars), (setopt 
> cjk-ambiguous-chars-are-wide nil) doesn't fix those bugs -- I have to 
> kill and re-visit the buffer for them to go away. But maybe that's expected.

Does "M-x redraw-display RET" solve the problem after setting the
variable?





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-12  5:50         ` Eli Zaretskii
@ 2023-08-12 16:40           ` Dmitry Gutov
  2023-08-12 17:09             ` Eli Zaretskii
  0 siblings, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2023-08-12 16:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: itaemu, casouri, 64420

On 12/08/2023 08:50, Eli Zaretskii wrote:
>> Date: Sat, 12 Aug 2023 02:52:29 +0300
>> Cc:itaemu@gmail.com,casouri@gmail.com,64420@debbugs.gnu.org
>> From: Dmitry Gutov<dmitry@gutov.dev>
>>
>> On 05/08/2023 18:01, Eli Zaretskii wrote:
>>> OK, this is now installed on master.  We have a new user option named
>>> cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the
>>> characters proclaimed by Unicode as "ambiguous" will have char-width
>>> of 1, not 2.  Note that this option should be set either via 'setopt'
>>> or the Customize interface, not via 'setq'.
>>>
>>> Let me know how well this works for you.
>> Seems to work fine, thank you.
>>
>> With the caveat that, in the terminal, if I switch to Chinese-BIG5 and
>> visit a file with ambiguous characters like … (which triggers some bugs
>> with display and navigation around those chars), (setopt
>> cjk-ambiguous-chars-are-wide nil) doesn't fix those bugs -- I have to
>> kill and re-visit the buffer for them to go away. But maybe that's expected.
> Does "M-x redraw-display RET" solve the problem after setting the
> variable?

Looks like it does, yes.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-12 16:40           ` Dmitry Gutov
@ 2023-08-12 17:09             ` Eli Zaretskii
  0 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2023-08-12 17:09 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: itaemu, casouri, 64420

> Date: Sat, 12 Aug 2023 19:40:01 +0300
> Cc: itaemu@gmail.com, casouri@gmail.com, 64420@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> >> With the caveat that, in the terminal, if I switch to Chinese-BIG5 and
> >> visit a file with ambiguous characters like … (which triggers some bugs
> >> with display and navigation around those chars), (setopt
> >> cjk-ambiguous-chars-are-wide nil) doesn't fix those bugs -- I have to
> >> kill and re-visit the buffer for them to go away. But maybe that's expected.
> > Does "M-x redraw-display RET" solve the problem after setting the
> > variable?
> 
> Looks like it does, yes.

I guess the :set function should trigger that or something.  Let me
think about that.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-11 18:36             ` Eli Zaretskii
@ 2023-08-12 20:18               ` Yuan Fu
  0 siblings, 0 replies; 41+ messages in thread
From: Yuan Fu @ 2023-08-12 20:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: SUNG TAE KIM, Dmitry Gutov, 64420



> On Aug 11, 2023, at 11:36 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 11 Aug 2023 11:07:26 -0700
>> Cc: dmitry@gutov.dev,
>> itaemu@gmail.com,
>> 64420@debbugs.gnu.org
>> 
>> On terminal, I saw an interesting option, “Ambiguous characters are double-width” (terminal-setting.png), which is the same as cjk-ambiguous-chars-are-wide. If I turn it on all the ambiguous characters are indeed displayed in double-width. (terminal-narrow.png, terminal-wide.png)
> 
> And in that case, you need to set cjk-ambiguous-chars-are-wide non-nil
> to have Emacs display those characters correctly?  Or does that option
> have no effect on the correctness of the |Emacs display on that
> terminal?

The value of cjk-ambiguous-chars-are-wide has no effect on the display of those characters in the terminal, at least in the terminal I use (iTerm2). Only the terminal option has an effect.

The screenshot I took are actually from cat, not Emacs. I tried with Emacs and found out that the terminal and Emacs must agree on the width of those characters, otherwise the cursor movement is broken (perhaps that’s not surprising to you). The cursor movement works if either a) I turn on "Ambiguous characters are double-width” in the terminal and (set-language-environment "Chinese-BIG5”) in Emacs, or b) I turn off "Ambiguous characters are double-width” (which is off by default) and use default locale (utf8_enUS).

> 
>> On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway.
> 
> Is the actual width closer to 1 or to 2?
> 

I’d say 2.

Yuan






^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-11 18:07           ` Yuan Fu
  2023-08-11 18:36             ` Eli Zaretskii
  2023-08-11 22:34             ` Dmitry Gutov
@ 2023-08-13  0:22             ` Dmitry Gutov
  2023-08-13  5:24               ` Eli Zaretskii
  2 siblings, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2023-08-13  0:22 UTC (permalink / raw)
  To: Yuan Fu, Eli Zaretskii; +Cc: itaemu, 64420

On 11/08/2023 21:07, Yuan Fu wrote:
> On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway.

BTW, I think most double-width characters on GUI are less than 2 
characters wide?

So the point here would be that some "ambiguous" ones are still wider 
than 1, I guess.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-13  0:22             ` Dmitry Gutov
@ 2023-08-13  5:24               ` Eli Zaretskii
  2023-08-13 10:48                 ` Dmitry Gutov
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-08-13  5:24 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: itaemu, casouri, 64420

> Date: Sun, 13 Aug 2023 03:22:41 +0300
> Cc: itaemu@gmail.com, 64420@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> On 11/08/2023 21:07, Yuan Fu wrote:
> > On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway.
> 
> BTW, I think most double-width characters on GUI are less than 2 
> characters wide?
> 
> So the point here would be that some "ambiguous" ones are still wider 
> than 1, I guess.

According to Yuan, at least in his environment those characters have a
width that is closer to 2 than to 1.  In which case using 2 would
produce better alignment.  Of course, using string-pixel-width will
produce an even better alignment.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-13  5:24               ` Eli Zaretskii
@ 2023-08-13 10:48                 ` Dmitry Gutov
  2023-08-13 12:01                   ` Eli Zaretskii
  0 siblings, 1 reply; 41+ messages in thread
From: Dmitry Gutov @ 2023-08-13 10:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: itaemu, casouri, 64420

On 13/08/2023 08:24, Eli Zaretskii wrote:
>> Date: Sun, 13 Aug 2023 03:22:41 +0300
>> Cc:itaemu@gmail.com,64420@debbugs.gnu.org
>> From: Dmitry Gutov<dmitry@gutov.dev>
>>
>> On 11/08/2023 21:07, Yuan Fu wrote:
>>> On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway.
>> BTW, I think most double-width characters on GUI are less than 2
>> characters wide?
>>
>> So the point here would be that some "ambiguous" ones are still wider
>> than 1, I guess.
> According to Yuan, at least in his environment those characters have a
> width that is closer to 2 than to 1.  In which case using 2 would
> produce better alignment.  Of course, using string-pixel-width will
> produce an even better alignment.

In GUI, that is. But if they are displayed with width 1 in terminal, we 
better make string-width return 1 for them too.

That might be slightly worse for certain applications (like popup in 
company), but at least the basic rendering and navigation bugs in 
terminal will be fixed this way. And the new popup rendering for company 
(using string-width and spacing instructions) is close to being ready 
anyway.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-13 10:48                 ` Dmitry Gutov
@ 2023-08-13 12:01                   ` Eli Zaretskii
  2023-08-13 12:53                     ` Dmitry Gutov
  0 siblings, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2023-08-13 12:01 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: itaemu, casouri, 64420

> Date: Sun, 13 Aug 2023 13:48:42 +0300
> Cc: casouri@gmail.com, itaemu@gmail.com, 64420@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> >> So the point here would be that some "ambiguous" ones are still wider
> >> than 1, I guess.
> > According to Yuan, at least in his environment those characters have a
> > width that is closer to 2 than to 1.  In which case using 2 would
> > produce better alignment.  Of course, using string-pixel-width will
> > produce an even better alignment.
> 
> In GUI, that is. But if they are displayed with width 1 in terminal, we 
> better make string-width return 1 for them too.

Yes.  But it turns out that how wide these characters are on TTY
frames depends on the terminal emulator and its own options regarding
those characters.  So some users will want the value 1 and others will
want the value 2, depending on which terminals they use and what
options of those terminals they like best.

The important part is that the Emacs's notion of the character width
is consistent with that of the terminal.

> That might be slightly worse for certain applications (like popup in 
> company), but at least the basic rendering and navigation bugs in 
> terminal will be fixed this way. And the new popup rendering for company 
> (using string-width and spacing instructions) is close to being ready 
> anyway.

Yes, sure.  There's no doubt on my side that this option is useful;
I'm just trying to collect data that would allow us to decide on the
best default value, that's all.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* bug#64420: string-width of … is 2 in CJK environments
  2023-08-13 12:01                   ` Eli Zaretskii
@ 2023-08-13 12:53                     ` Dmitry Gutov
  0 siblings, 0 replies; 41+ messages in thread
From: Dmitry Gutov @ 2023-08-13 12:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: itaemu, casouri, 64420

On 13/08/2023 15:01, Eli Zaretskii wrote:
>> Date: Sun, 13 Aug 2023 13:48:42 +0300
>> Cc: casouri@gmail.com, itaemu@gmail.com, 64420@debbugs.gnu.org
>> From: Dmitry Gutov <dmitry@gutov.dev>
>>
>>>> So the point here would be that some "ambiguous" ones are still wider
>>>> than 1, I guess.
>>> According to Yuan, at least in his environment those characters have a
>>> width that is closer to 2 than to 1.  In which case using 2 would
>>> produce better alignment.  Of course, using string-pixel-width will
>>> produce an even better alignment.
>>
>> In GUI, that is. But if they are displayed with width 1 in terminal, we
>> better make string-width return 1 for them too.
> 
> Yes.  But it turns out that how wide these characters are on TTY
> frames depends on the terminal emulator and its own options regarding
> those characters.  So some users will want the value 1 and others will
> want the value 2, depending on which terminals they use and what
> options of those terminals they like best.

That's where having the option that we just added will be beneficial. As 
opposed to, say, changing the behavior outright.

> The important part is that the Emacs's notion of the character width
> is consistent with that of the terminal.
> 
>> That might be slightly worse for certain applications (like popup in
>> company), but at least the basic rendering and navigation bugs in
>> terminal will be fixed this way. And the new popup rendering for company
>> (using string-width and spacing instructions) is close to being ready
>> anyway.
> 
> Yes, sure.  There's no doubt on my side that this option is useful;
> I'm just trying to collect data that would allow us to decide on the
> best default value, that's all.

Yuan seems to be saying that iTerm2, at least, defaults to showing the 
ambiguous chars at width 1 and issues a warning when the user tries to 
change that option.

gnome-terminal also has that default. I just checked on my machine 
(Ubuntu from 2022), and the description in this bug report from 2015 
also says that: https://bugzilla.gnome.org/show_bug.cgi?id=749414, so 
the default is not new.

Others are welcome to report their experience.

I've found a couple of conflicting reports regarding Microsoft Terminal, 
someone could test that too.





^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2023-08-13 12:53 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-02 12:57 bug#64420: string-width of … is 2 in CJK environments Dmitry Gutov
2023-07-02 13:10 ` Eli Zaretskii
2023-07-02 13:20   ` Dmitry Gutov
2023-07-02 13:43     ` Eli Zaretskii
2023-07-07  2:13       ` Dmitry Gutov
2023-07-07  6:29         ` Eli Zaretskii
2023-07-11  2:13           ` Dmitry Gutov
2023-07-11 11:41             ` Eli Zaretskii
2023-07-11  2:23       ` Dmitry Gutov
2023-07-11 11:48         ` Eli Zaretskii
2023-07-11 18:13           ` Dmitry Gutov
2023-07-11 18:45             ` Eli Zaretskii
2023-07-12  1:17               ` Dmitry Gutov
2023-07-12 19:54                 ` Dmitry Gutov
2023-07-12 21:11                 ` Yuan Fu
2023-07-13  5:23                   ` Eli Zaretskii
2023-07-27  1:52                     ` Dmitry Gutov
2023-07-14  4:45 ` SUNG TAE KIM
2023-07-14  6:58   ` Eli Zaretskii
2023-07-16 11:51     ` Eli Zaretskii
2023-07-14  9:21 ` SUNG TAE KIM
2023-07-14 11:04   ` Eli Zaretskii
2023-07-14 20:11     ` Yuan Fu
2023-07-16 16:59 ` SUNG TAE KIM
2023-07-16 17:15   ` Eli Zaretskii
2023-08-05 15:01     ` Eli Zaretskii
2023-08-10 21:58       ` Yuan Fu
2023-08-11  5:53         ` Eli Zaretskii
2023-08-11 18:07           ` Yuan Fu
2023-08-11 18:36             ` Eli Zaretskii
2023-08-12 20:18               ` Yuan Fu
2023-08-11 22:34             ` Dmitry Gutov
2023-08-13  0:22             ` Dmitry Gutov
2023-08-13  5:24               ` Eli Zaretskii
2023-08-13 10:48                 ` Dmitry Gutov
2023-08-13 12:01                   ` Eli Zaretskii
2023-08-13 12:53                     ` Dmitry Gutov
2023-08-11 23:52       ` Dmitry Gutov
2023-08-12  5:50         ` Eli Zaretskii
2023-08-12 16:40           ` Dmitry Gutov
2023-08-12 17:09             ` Eli Zaretskii

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.