* bug#64420: string-width of … is 2 in CJK environments @ 2023-07-02 12:57 Dmitry Gutov 2023-07-02 13:10 ` Eli Zaretskii ` (3 more replies) 0 siblings, 4 replies; 41+ messages in thread From: Dmitry Gutov @ 2023-07-02 12:57 UTC (permalink / raw) To: 64420 Hi! This was reported to company-mode (https://github.com/company-mode/company-mode/issues/1388), as a scenario that makes the overlay-based completion popup misrender because the columns are not computed right when that char is present. To repro: (set-language-environment "Chinese-BIG5") (string-width "…") ;; => 2 In the default language environment its width is reported to be 1. This doesn't seem to make sense because it's rendered one column wide either way. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-02 12:57 bug#64420: string-width of … is 2 in CJK environments Dmitry Gutov @ 2023-07-02 13:10 ` Eli Zaretskii 2023-07-02 13:20 ` Dmitry Gutov 2023-07-14 4:45 ` SUNG TAE KIM ` (2 subsequent siblings) 3 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-07-02 13:10 UTC (permalink / raw) To: Dmitry Gutov; +Cc: 64420 > Date: Sun, 2 Jul 2023 15:57:07 +0300 > From: Dmitry Gutov <dmitry@gutov.dev> > > Hi! This was reported to company-mode > (https://github.com/company-mode/company-mode/issues/1388), as a > scenario that makes the overlay-based completion popup misrender because > the columns are not computed right when that char is present. > > To repro: > > (set-language-environment "Chinese-BIG5") > (string-width "…") ;; => 2 > > In the default language environment its width is reported to be 1. > > This doesn't seem to make sense because it's rendered one column wide > either way. On GUI frames Lisp programs that need to know the actual width of some string should use string-pixel-width, not string-width. The latter is basically only for TTY frames. (progn (set-language-environment "Chinese-BIG5") (ceiling (/ (string-pixel-width "…") (float (default-font-width))))) ;; => 1 ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-02 13:10 ` Eli Zaretskii @ 2023-07-02 13:20 ` Dmitry Gutov 2023-07-02 13:43 ` Eli Zaretskii 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Gutov @ 2023-07-02 13:20 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64420 On 02/07/2023 16:10, Eli Zaretskii wrote: >> Date: Sun, 2 Jul 2023 15:57:07 +0300 >> From: Dmitry Gutov <dmitry@gutov.dev> >> >> Hi! This was reported to company-mode >> (https://github.com/company-mode/company-mode/issues/1388), as a >> scenario that makes the overlay-based completion popup misrender because >> the columns are not computed right when that char is present. >> >> To repro: >> >> (set-language-environment "Chinese-BIG5") >> (string-width "…") ;; => 2 >> >> In the default language environment its width is reported to be 1. >> >> This doesn't seem to make sense because it's rendered one column wide >> either way. > > On GUI frames Lisp programs that need to know the actual width of some > string should use string-pixel-width, not string-width. The latter is > basically only for TTY frames. > > (progn > (set-language-environment "Chinese-BIG5") > (ceiling (/ (string-pixel-width "…") > (float (default-font-width))))) ;; => 1 Thank you. Is there some inherent reason why string-width differs from the result of the above expression, and especially only does that on CJK? Since the overlay-based popup is used on both GUI and Terminal frames, are you suggesting I define my own string-width like this? (defun company--string-width (str) (if (display-graphic-p) (ceiling (/ (string-pixel-width str) (float (default-font-width)))) (string-width str))) ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-02 13:20 ` Dmitry Gutov @ 2023-07-02 13:43 ` Eli Zaretskii 2023-07-07 2:13 ` Dmitry Gutov 2023-07-11 2:23 ` Dmitry Gutov 0 siblings, 2 replies; 41+ messages in thread From: Eli Zaretskii @ 2023-07-02 13:43 UTC (permalink / raw) To: Dmitry Gutov; +Cc: 64420 > Date: Sun, 2 Jul 2023 16:20:25 +0300 > Cc: 64420@debbugs.gnu.org > From: Dmitry Gutov <dmitry@gutov.dev> > > > On GUI frames Lisp programs that need to know the actual width of some > > string should use string-pixel-width, not string-width. The latter is > > basically only for TTY frames. > > > > (progn > > (set-language-environment "Chinese-BIG5") > > (ceiling (/ (string-pixel-width "…") > > (float (default-font-width))))) ;; => 1 > > Thank you. > > Is there some inherent reason why string-width differs from the result > of the above expression Because string-width doesn't consult the actual metrics of the font. It uses a char-table that we set "by hand". > and especially only does that on CJK? In CJK locales, most characters are double-width because those locales use fonts where the glyphs are wider. Or at least this is the theory. string-pixel-width is free from these assumptions because it actually measures the font glyphs. > Since the overlay-based popup is used on both GUI and Terminal frames, > are you suggesting I define my own string-width like this? > > (defun company--string-width (str) > (if (display-graphic-p) > (ceiling (/ (string-pixel-width str) > (float (default-font-width)))) > (string-width str))) Yes, definitely. (Actually, display-multi-font-p is better than display-graphic-p, but in practice they will return the same value.) ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-02 13:43 ` Eli Zaretskii @ 2023-07-07 2:13 ` Dmitry Gutov 2023-07-07 6:29 ` Eli Zaretskii 2023-07-11 2:23 ` Dmitry Gutov 1 sibling, 1 reply; 41+ messages in thread From: Dmitry Gutov @ 2023-07-07 2:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64420 Hi Eli, On 02/07/2023 16:43, Eli Zaretskii wrote: >> Is there some inherent reason why string-width differs from the result >> of the above expression > Because string-width doesn't consult the actual metrics of the font. > It uses a char-table that we set "by hand". Would it be appropriate to fix the entry for … in that table either way? Or does that not match the principle with which those entries are done? >> and especially only does that on CJK? > In CJK locales, most characters are double-width because those locales > use fonts where the glyphs are wider. Or at least this is the theory. > string-pixel-width is free from these assumptions because it actually > measures the font glyphs. I'm guessing it's somewhat slower because of that too, but that doesn't seem like a problem so far. >> Since the overlay-based popup is used on both GUI and Terminal frames, >> are you suggesting I define my own string-width like this? >> >> (defun company--string-width (str) >> (if (display-graphic-p) >> (ceiling (/ (string-pixel-width str) >> (float (default-font-width)))) >> (string-width str))) > Yes, definitely. (Actually, display-multi-font-p is better than > display-graphic-p, but in practice they will return the same value.) Could you suggest a similar alternative to move-to-column? It's not 100% necessary, but we also have a piece of code where we take a width-aware substring from a buffer. And that logic uses 'move-to-column', which also has a problem with … in "Chinese-BIG5". ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-07 2:13 ` Dmitry Gutov @ 2023-07-07 6:29 ` Eli Zaretskii 2023-07-11 2:13 ` Dmitry Gutov 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-07-07 6:29 UTC (permalink / raw) To: Dmitry Gutov; +Cc: 64420 > Date: Fri, 7 Jul 2023 05:13:50 +0300 > Cc: 64420@debbugs.gnu.org > From: Dmitry Gutov <dmitry@gutov.dev> > > On 02/07/2023 16:43, Eli Zaretskii wrote: > >> Is there some inherent reason why string-width differs from the result > >> of the above expression > > Because string-width doesn't consult the actual metrics of the font. > > It uses a char-table that we set "by hand". > > Would it be appropriate to fix the entry for … in that table either way? "Fix" in what way? In most language-environments we get (char-width ?…) => 1 What's wrong with that? > Or does that not match the principle with which those entries are done? Sorry, I don't understand the question: what principle are you talking about? > >> and especially only does that on CJK? > > In CJK locales, most characters are double-width because those locales > > use fonts where the glyphs are wider. Or at least this is the theory. > > string-pixel-width is free from these assumptions because it actually > > measures the font glyphs. > > I'm guessing it's somewhat slower because of that too It isn't. The entries in char-width-table are set up when you switch to the language-environment which requires that; see, for example, lisp/language/chinese.el where we call set-language-info-alist for any Chinese-* language-environment. > >> (defun company--string-width (str) > >> (if (display-graphic-p) > >> (ceiling (/ (string-pixel-width str) > >> (float (default-font-width)))) > >> (string-width str))) > > Yes, definitely. (Actually, display-multi-font-p is better than > > display-graphic-p, but in practice they will return the same value.) > > Could you suggest a similar alternative to move-to-column? Try this: (vertical-motion (cons (/ (float PIXELS) (default-font-width)) 0)) where PIXELS is the X coordinate in pixel units. That is, make the LINES argument of vertical-motion be a cons cell with its cdr zero and its car the required horizontal position, a float, in units of the frame's canonical character width. vertical-motion works internally in pixels when considering horizontal coordinates. Caveat: vertical-motion uses _visual_ columns, relative to the displayed portion of the line, so it differs from move-to-column when the line is a continuation line, or is truncated on display, or the window is hscrolled. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-07 6:29 ` Eli Zaretskii @ 2023-07-11 2:13 ` Dmitry Gutov 2023-07-11 11:41 ` Eli Zaretskii 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Gutov @ 2023-07-11 2:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64420 On 07/07/2023 09:29, Eli Zaretskii wrote: >> Date: Fri, 7 Jul 2023 05:13:50 +0300 >> Cc: 64420@debbugs.gnu.org >> From: Dmitry Gutov <dmitry@gutov.dev> >> >> On 02/07/2023 16:43, Eli Zaretskii wrote: >>>> Is there some inherent reason why string-width differs from the result >>>> of the above expression >>> Because string-width doesn't consult the actual metrics of the font. >>> It uses a char-table that we set "by hand". >> >> Would it be appropriate to fix the entry for … in that table either way? > > "Fix" in what way? In most language-environments we get > > (char-width ?…) => 1 > > What's wrong with that? It returns 2 in Chinese-BIG5. While the actual metrics of the char don't change. >> Or does that not match the principle with which those entries are done? > > Sorry, I don't understand the question: what principle are you talking > about? The principles by which we fill in the said char-table which we fill "by hand". E.g. which characters to include, and which to leave with "automatic" metrics. >>>> and especially only does that on CJK? >>> In CJK locales, most characters are double-width because those locales >>> use fonts where the glyphs are wider. Or at least this is the theory. >>> string-pixel-width is free from these assumptions because it actually >>> measures the font glyphs. >> >> I'm guessing it's somewhat slower because of that too > > It isn't. The entries in char-width-table are set up when you switch > to the language-environment which requires that; see, for example, > lisp/language/chinese.el where we call set-language-info-alist for any > Chinese-* language-environment. What I meant is, string-lixel-width must be slower than string-width because it uses a temp buffer and actual measurements, whereas the latter function only does a table lookup, more or less (N times). >>>> (defun company--string-width (str) >>>> (if (display-graphic-p) >>>> (ceiling (/ (string-pixel-width str) >>>> (float (default-font-width)))) >>>> (string-width str))) >>> Yes, definitely. (Actually, display-multi-font-p is better than >>> display-graphic-p, but in practice they will return the same value.) >> >> Could you suggest a similar alternative to move-to-column? > > Try this: > > (vertical-motion (cons (/ (float PIXELS) (default-font-width)) 0)) Thank you. I just uses the column values I was already working with. I'm trying whole-pixelwise addressing in the next version, but the better precision seems to necessitate a whole new approach, using string-pixel-width and the space :width display spec. Seems to be working okay too, in my brief testing. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-11 2:13 ` Dmitry Gutov @ 2023-07-11 11:41 ` Eli Zaretskii 0 siblings, 0 replies; 41+ messages in thread From: Eli Zaretskii @ 2023-07-11 11:41 UTC (permalink / raw) To: Dmitry Gutov; +Cc: 64420 > Date: Tue, 11 Jul 2023 05:13:57 +0300 > Cc: 64420@debbugs.gnu.org > From: Dmitry Gutov <dmitry@gutov.dev> > > >> Would it be appropriate to fix the entry for … in that table either way? > > > > "Fix" in what way? In most language-environments we get > > > > (char-width ?…) => 1 > > > > What's wrong with that? > > It returns 2 in Chinese-BIG5. While the actual metrics of the char don't > change. I explained why this happens and why Emacs works that way. If something in my explanation is unclear, please ask more specific questions. > >> Or does that not match the principle with which those entries are done? > > > > Sorry, I don't understand the question: what principle are you talking > > about? > > The principles by which we fill in the said char-table which we fill "by > hand". E.g. which characters to include, and which to leave with > "automatic" metrics. We fill the table by hand, but the data is synchronized with the Unicode Standard, and is reviewed each time we import a new Unicode version. The tweaking of the char-width tables in CJK locales is due to the issue I explained in my previous message: > >>> In CJK locales, most characters are double-width because those locales > >>> use fonts where the glyphs are wider. Or at least this is the theory. > >>> string-pixel-width is free from these assumptions because it actually > >>> measures the font glyphs. > What I meant is, string-lixel-width must be slower than string-width > because it uses a temp buffer and actual measurements, whereas the > latter function only does a table lookup, more or less (N times). It is slower, yes, but much more accurate. TANSTAAFL. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-02 13:43 ` Eli Zaretskii 2023-07-07 2:13 ` Dmitry Gutov @ 2023-07-11 2:23 ` Dmitry Gutov 2023-07-11 11:48 ` Eli Zaretskii 1 sibling, 1 reply; 41+ messages in thread From: Dmitry Gutov @ 2023-07-11 2:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64420 On 02/07/2023 16:43, Eli Zaretskii wrote: >> Since the overlay-based popup is used on both GUI and Terminal frames, >> are you suggesting I define my own string-width like this? >> >> (defun company--string-width (str) >> (if (display-graphic-p) >> (ceiling (/ (string-pixel-width str) >> (float (default-font-width)))) >> (string-width str))) > Yes, definitely. (Actually, display-multi-font-p is better than > display-graphic-p, but in practice they will return the same value.) Regarding this approach, though: it seems to fail in my terminal Emacs. Meaning, when I'm testing the feature using 'emacs -nw' (inside e.g. gnome-terminal), both (string-pixel-width "…") and (string-width "…") return 2. Whereas the character on display looks 1-character wide even there. More than that, moving the cursor close to that character with C-f or C-b creates odd effects like the cursor jumping one position to the left, or a char being rendered twice at a certain position on the same line to the right of it (after I move the cursor there past the … char), in my case it's an opening paren. Nothing like that happens on the lines without this char, or after I switch the language env back to "English". That happens in Emacs 29. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-11 2:23 ` Dmitry Gutov @ 2023-07-11 11:48 ` Eli Zaretskii 2023-07-11 18:13 ` Dmitry Gutov 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-07-11 11:48 UTC (permalink / raw) To: Dmitry Gutov; +Cc: 64420 > Date: Tue, 11 Jul 2023 05:23:03 +0300 > Cc: 64420@debbugs.gnu.org > From: Dmitry Gutov <dmitry@gutov.dev> > > On 02/07/2023 16:43, Eli Zaretskii wrote: > >> Since the overlay-based popup is used on both GUI and Terminal frames, > >> are you suggesting I define my own string-width like this? > >> > >> (defun company--string-width (str) > >> (if (display-graphic-p) > >> (ceiling (/ (string-pixel-width str) > >> (float (default-font-width)))) > >> (string-width str))) > > Yes, definitely. (Actually, display-multi-font-p is better than > > display-graphic-p, but in practice they will return the same value.) > > Regarding this approach, though: it seems to fail in my terminal Emacs. string-pixel-width is useless on TTY frames, because Emacs cannot access the metrics of the characters on those frames. In those cases string-pixel-width falls back to use char-width, and you get the same result. > Meaning, when I'm testing the feature using 'emacs -nw' (inside e.g. > gnome-terminal), both (string-pixel-width "…") and (string-width "…") > return 2. Whereas the character on display looks 1-character wide even > there. Once again, the assumption behind this "feature" of the CJK language-environments is that whoever uses those environments has the terminal emulators configured to use fonts where "…" and its ilk have double size. Of course, if you just switch language-environment on a system that is otherwise configured for non-CJK locale, the terminal emulator fonts will not magically change, and you get what you see. > More than that, moving the cursor close to that character with C-f or > C-b creates odd effects like the cursor jumping one position to the > left, or a char being rendered twice at a certain position on the same > line to the right of it (after I move the cursor there past the … char), Yes, because we lie to the display engine about the character width. If you worry that something in your package might not work well for some users due to this issue, how about giving them a user-level option to change the char-width of this character to 1? ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-11 11:48 ` Eli Zaretskii @ 2023-07-11 18:13 ` Dmitry Gutov 2023-07-11 18:45 ` Eli Zaretskii 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Gutov @ 2023-07-11 18:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64420 On 11/07/2023 14:48, Eli Zaretskii wrote: >> Date: Tue, 11 Jul 2023 05:23:03 +0300 >> Cc: 64420@debbugs.gnu.org >> From: Dmitry Gutov <dmitry@gutov.dev> >> >> On 02/07/2023 16:43, Eli Zaretskii wrote: >>>> Since the overlay-based popup is used on both GUI and Terminal frames, >>>> are you suggesting I define my own string-width like this? >>>> >>>> (defun company--string-width (str) >>>> (if (display-graphic-p) >>>> (ceiling (/ (string-pixel-width str) >>>> (float (default-font-width)))) >>>> (string-width str))) >>> Yes, definitely. (Actually, display-multi-font-p is better than >>> display-graphic-p, but in practice they will return the same value.) >> >> Regarding this approach, though: it seems to fail in my terminal Emacs. > > string-pixel-width is useless on TTY frames, because Emacs cannot > access the metrics of the characters on those frames. In those cases > string-pixel-width falls back to use char-width, and you get the same > result. I guess that's the best we can do. This seems to work okay with most double-width characters, as long as the reported metrics match what happens on display. And according to your explanation, we could probably drop the display-graphic-p check since both branches result in the same value on terminal (right?). >> Meaning, when I'm testing the feature using 'emacs -nw' (inside e.g. >> gnome-terminal), both (string-pixel-width "…") and (string-width "…") >> return 2. Whereas the character on display looks 1-character wide even >> there. > > Once again, the assumption behind this "feature" of the CJK > language-environments is that whoever uses those environments has the > terminal emulators configured to use fonts where "…" and its ilk have > double size. Of course, if you just switch language-environment on a > system that is otherwise configured for non-CJK locale, the terminal > emulator fonts will not magically change, and you get what you see. Does "…" actually have double width in some of their fonts? This report stems from an issue opened on Github for company-mode (see the first message) from somebody who as I understand hails from one of those countries (I haven't clarified exactly), and they apparently have to work with the "Chinese-BIG5" language environment. Are you saying that they misconfigured their system somehow, e.g. that Chinese-BIG5 is expected to be used with a certain set of default system fonts which have "…" at double width? >> More than that, moving the cursor close to that character with C-f or >> C-b creates odd effects like the cursor jumping one position to the >> left, or a char being rendered twice at a certain position on the same >> line to the right of it (after I move the cursor there past the … char), > > Yes, because we lie to the display engine about the character width. > > If you worry that something in your package might not work well for > some users due to this issue, how about giving them a user-level > option to change the char-width of this character to 1? It's been suggested that we alter char-width-table dynamically too, as one option. I was just hoping to clarify that we don't carry an erroneous entry for this particular character. If we did, it would be an easier solution for me to direct the users to the fix in Emacs 29/30, and delay the rollout of the new popup rendering feature a little bit. It will need a fair bit of testing period given the nature of the change. Further, string-pixel-width and buffer-text-pixel-size have only been added in Emacs 29. Any chance you know some replacement I could use to backport the functionality to work in Emacs 25 or 26? buffer-text-pixel-size is defined in C. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-11 18:13 ` Dmitry Gutov @ 2023-07-11 18:45 ` Eli Zaretskii 2023-07-12 1:17 ` Dmitry Gutov 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-07-11 18:45 UTC (permalink / raw) To: Dmitry Gutov; +Cc: 64420 > Date: Tue, 11 Jul 2023 21:13:26 +0300 > Cc: 64420@debbugs.gnu.org > From: Dmitry Gutov <dmitry@gutov.dev> > > And according to your explanation, we could probably drop the > display-graphic-p check since both branches result in the same value on > terminal (right?). You could drop it, yes. But then string-width is faster, so maybe you should keep it. > > Once again, the assumption behind this "feature" of the CJK > > language-environments is that whoever uses those environments has the > > terminal emulators configured to use fonts where "…" and its ilk have > > double size. Of course, if you just switch language-environment on a > > system that is otherwise configured for non-CJK locale, the terminal > > emulator fonts will not magically change, and you get what you see. > > Does "…" actually have double width in some of their fonts? That's the assumption, yes. (And not only this one character, you can see which characters we assume have the same width in the function I pointed out earlier in this thread, which we run when the language-environment is switched to something CJK.) It was definitely correct at some point in the past, but the big question is whether it is still correct. I don't know who can tell us that nowadays. > This report stems from an issue opened on Github for company-mode (see > the first message) from somebody who as I understand hails from one of > those countries (I haven't clarified exactly), and they apparently have > to work with the "Chinese-BIG5" language environment. > > Are you saying that they misconfigured their system somehow, e.g. that > Chinese-BIG5 is expected to be used with a certain set of default system > fonts which have "…" at double width? Either their systems are misconfigured, or the assumption about the width of those characters is no longer true, at least not in a vast enough majority of cases. If we cannot get definitive answers, maybe we should have an optional feature that disables the redefinition of char-width for characters that Unicode does not define as "wide", and then see whether someone still needs such tweaking of char-width. > > If you worry that something in your package might not work well for > > some users due to this issue, how about giving them a user-level > > option to change the char-width of this character to 1? > > It's been suggested that we alter char-width-table dynamically too, as > one option. I was just hoping to clarify that we don't carry an > erroneous entry for this particular character. Whether it's "erroneous" or not depends on what fonts are actually used. char-width-table cannot know that, so we are guessing there. > If we did, it would be an easier solution for me to direct the users to > the fix in Emacs 29/30, and delay the rollout of the new popup rendering > feature a little bit. It will need a fair bit of testing period given > the nature of the change. We will not change the width in Emacs 29: that is too much for a release branch, definitely at this point in the release cycle. For Emacs 30, if we want to change this, I'd rather do it as described above, leaving the "fire escape" to get back the old behavior. It would be nice to hear from as many CJK users as possible which characters in the widely used fonts are really double-width -- this will help in the decision what exactly to change in use-cjk-char-width-table. > Further, string-pixel-width and buffer-text-pixel-size have only been > added in Emacs 29. Any chance you know some replacement I could use to > backport the functionality to work in Emacs 25 or 26? > buffer-text-pixel-size is defined in C. You could use window-text-pixel-size instead. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-11 18:45 ` Eli Zaretskii @ 2023-07-12 1:17 ` Dmitry Gutov 2023-07-12 19:54 ` Dmitry Gutov 2023-07-12 21:11 ` Yuan Fu 0 siblings, 2 replies; 41+ messages in thread From: Dmitry Gutov @ 2023-07-12 1:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64420 On 11/07/2023 21:45, Eli Zaretskii wrote: >>> Once again, the assumption behind this "feature" of the CJK >>> language-environments is that whoever uses those environments has the >>> terminal emulators configured to use fonts where "…" and its ilk have >>> double size. Of course, if you just switch language-environment on a >>> system that is otherwise configured for non-CJK locale, the terminal >>> emulator fonts will not magically change, and you get what you see. >> >> Does "…" actually have double width in some of their fonts? > > That's the assumption, yes. (And not only this one character, you can > see which characters we assume have the same width in the function I > pointed out earlier in this thread, which we run when the > language-environment is switched to something CJK.) It was definitely > correct at some point in the past, but the big question is whether it > is still correct. I don't know who can tell us that nowadays. Whole ranges of characters, I see. >> This report stems from an issue opened on Github for company-mode (see >> the first message) from somebody who as I understand hails from one of >> those countries (I haven't clarified exactly), and they apparently have >> to work with the "Chinese-BIG5" language environment. >> >> Are you saying that they misconfigured their system somehow, e.g. that >> Chinese-BIG5 is expected to be used with a certain set of default system >> fonts which have "…" at double width? > > Either their systems are misconfigured, or the assumption about the > width of those characters is no longer true, at least not in a vast > enough majority of cases. If we cannot get definitive answers, maybe > we should have an optional feature that disables the redefinition of > char-width for characters that Unicode does not define as "wide", and > then see whether someone still needs such tweaking of char-width. > >>> If you worry that something in your package might not work well for >>> some users due to this issue, how about giving them a user-level >>> option to change the char-width of this character to 1? >> >> It's been suggested that we alter char-width-table dynamically too, as >> one option. I was just hoping to clarify that we don't carry an >> erroneous entry for this particular character. > > Whether it's "erroneous" or not depends on what fonts are actually > used. char-width-table cannot know that, so we are guessing there. > >> If we did, it would be an easier solution for me to direct the users to >> the fix in Emacs 29/30, and delay the rollout of the new popup rendering >> feature a little bit. It will need a fair bit of testing period given >> the nature of the change. > > We will not change the width in Emacs 29: that is too much for a > release branch, definitely at this point in the release cycle. For > Emacs 30, if we want to change this, I'd rather do it as described > above, leaving the "fire escape" to get back the old behavior. It > would be nice to hear from as many CJK users as possible which > characters in the widely used fonts are really double-width -- this > will help in the decision what exactly to change in > use-cjk-char-width-table. All right. I'll try to get more info from the issue reporter, at least. >> Further, string-pixel-width and buffer-text-pixel-size have only been >> added in Emacs 29. Any chance you know some replacement I could use to >> backport the functionality to work in Emacs 25 or 26? >> buffer-text-pixel-size is defined in C. > > You could use window-text-pixel-size instead. Either I'm doing something wrong, or this function's behavior was different in Emacs 28. There had been some changes to it during Emacs 29's dev cycle, but I'm not sure which one would have that effect. Anyway, with this definition: (defun pixel-width (string) (if (zerop (length string)) 0 ;; Keeping a work buffer around is more efficient than creating a ;; new temporary buffer. (with-current-buffer (get-buffer-create " *string-pixel-width*") ;; `display-line-numbers-mode' is enabled in internal buffers ;; that breaks width calculation, so need to disable (bug#59311) (when (bound-and-true-p display-line-numbers-mode) (display-line-numbers-mode -1)) (delete-region (point-min) (point-max)) (insert string) (save-window-excursion (set-window-buffer nil (current-buffer)) (car (window-text-pixel-size nil nil nil t)))))) In Emacs 29, (pixel-width "abc") returns 54 here (on a 4K screen). But no matter what I do, it returns 0 in my Emacs 28.2 (from official tarball). To get some more info: if I remove the 'car' call, the value that window-text-pixel-size returns is (54 . 36) in Emacs 29 and (0 . 108) in Emacs 28.2. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-12 1:17 ` Dmitry Gutov @ 2023-07-12 19:54 ` Dmitry Gutov 2023-07-12 21:11 ` Yuan Fu 1 sibling, 0 replies; 41+ messages in thread From: Dmitry Gutov @ 2023-07-12 19:54 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64420 Found the problem: On 12/07/2023 04:17, Dmitry Gutov wrote: > (window-text-pixel-size nil nil nil t)))))) Looks like commit 61c254cafc9caa3b added the special meaning for the value t for arguments X-LIMIT and Y-LIMIT. It in the previous versions, I guess, it meant the same as 0. They also did't accept most-positive-fixnum, but worked okay with some lower integer values. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-12 1:17 ` Dmitry Gutov 2023-07-12 19:54 ` Dmitry Gutov @ 2023-07-12 21:11 ` Yuan Fu 2023-07-13 5:23 ` Eli Zaretskii 1 sibling, 1 reply; 41+ messages in thread From: Yuan Fu @ 2023-07-12 21:11 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Eli Zaretskii, 64420 > On Jul 11, 2023, at 6:17 PM, Dmitry Gutov <dmitry@gutov.dev> wrote: > > On 11/07/2023 21:45, Eli Zaretskii wrote: > >>>> Once again, the assumption behind this "feature" of the CJK >>>> language-environments is that whoever uses those environments has the >>>> terminal emulators configured to use fonts where "…" and its ilk have >>>> double size. Of course, if you just switch language-environment on a >>>> system that is otherwise configured for non-CJK locale, the terminal >>>> emulator fonts will not magically change, and you get what you see. >>> >>> Does "…" actually have double width in some of their fonts? >> That's the assumption, yes. (And not only this one character, you can >> see which characters we assume have the same width in the function I >> pointed out earlier in this thread, which we run when the >> language-environment is switched to something CJK.) It was definitely >> correct at some point in the past, but the big question is whether it >> is still correct. I don't know who can tell us that nowadays. > > Whole ranges of characters, I see. Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph wide (like all CJK punctuation), ie, width=2. However, it’s not as simple as “they used the wrong font”, because both Latin and CJK use the same Unicode code point for “…”, but expect different glyphs. In publication, this is solved by manually marking the text with style or font, so the software uses the desired glyph. Terminals and editors don’t have this luxury. BTW it’s not just ellipses, CJK and Latin shares the same code points for quotes, em dash and middle dot while expecting different glyphs for them. Since most terminal and editor (especially terminal) quires ASCII/Latin font before falling back to CJK fonts, I expect most terminal and editor to show the Latin glyph for “…” (width=1) most of the time. So practically, it would be correct most of the time if we assume the following code points have a width of 1, regardless of locale: – HORIZONTAL ELLIPSIS … – LEFT/RIGHT DOUBLE QUOTATION MARK “” – LEFT/RIGHT SINGLE QUOTATION MARK ‘’ – EM DASH — – MIDDLE DOT · But obviously if someone configures their terminal or editor to use CJK font first, these characters MIGHT have width = 2. I said MIGHT because there are plenty CJK fonts that uses the 1-width Latin glyph for these characters by default. It might be helpful to have a wrapper string-width that considers heuristics like this, while string-width goes strictly by Unicode and locale. Source: https://www.w3.org/TR/clreq/#table_of_non-bracket_indication_punctuation_marks Yuan ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-12 21:11 ` Yuan Fu @ 2023-07-13 5:23 ` Eli Zaretskii 2023-07-27 1:52 ` Dmitry Gutov 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-07-13 5:23 UTC (permalink / raw) To: Yuan Fu; +Cc: dmitry, 64420 > From: Yuan Fu <casouri@gmail.com> > Date: Wed, 12 Jul 2023 14:11:14 -0700 > Cc: Eli Zaretskii <eliz@gnu.org>, > 64420@debbugs.gnu.org > > Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph wide (like all CJK punctuation), ie, width=2. > > However, it’s not as simple as “they used the wrong font”, because both Latin and CJK use the same Unicode code point for “…”, but expect different glyphs. In publication, this is solved by manually marking the text with style or font, so the software uses the desired glyph. Terminals and editors don’t have this luxury. > > BTW it’s not just ellipses, CJK and Latin shares the same code points for quotes, em dash and middle dot while expecting different glyphs for them. > > Since most terminal and editor (especially terminal) quires ASCII/Latin font before falling back to CJK fonts, I expect most terminal and editor to show the Latin glyph for “…” (width=1) most of the time. > > So practically, it would be correct most of the time if we assume the following code points have a width of 1, regardless of locale: > > – HORIZONTAL ELLIPSIS … > – LEFT/RIGHT DOUBLE QUOTATION MARK “” > – LEFT/RIGHT SINGLE QUOTATION MARK ‘’ > – EM DASH — > – MIDDLE DOT · > > But obviously if someone configures their terminal or editor to use CJK font first, these characters MIGHT have width = 2. I said MIGHT because there are plenty CJK fonts that uses the 1-width Latin glyph for these characters by default. > > It might be helpful to have a wrapper string-width that considers heuristics like this, while string-width goes strictly by Unicode and locale. Thanks. My conclusion from the above is a bit different: we should introduce a user option to modify the behavior of use-cjk-char-width-table, such that users who have fonts where these characters are not double-width could have the width of these characters left at their Unicode values. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-13 5:23 ` Eli Zaretskii @ 2023-07-27 1:52 ` Dmitry Gutov 0 siblings, 0 replies; 41+ messages in thread From: Dmitry Gutov @ 2023-07-27 1:52 UTC (permalink / raw) To: Eli Zaretskii, Yuan Fu; +Cc: 64420 On 13/07/2023 08:23, Eli Zaretskii wrote: >> From: Yuan Fu<casouri@gmail.com> >> Date: Wed, 12 Jul 2023 14:11:14 -0700 >> Cc: Eli Zaretskii<eliz@gnu.org>, >> 64420@debbugs.gnu.org >> >> Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph wide (like all CJK punctuation), ie, width=2. >> >> However, it’s not as simple as “they used the wrong font”, because both Latin and CJK use the same Unicode code point for “…”, but expect different glyphs. In publication, this is solved by manually marking the text with style or font, so the software uses the desired glyph. Terminals and editors don’t have this luxury. >> >> BTW it’s not just ellipses, CJK and Latin shares the same code points for quotes, em dash and middle dot while expecting different glyphs for them. >> >> Since most terminal and editor (especially terminal) quires ASCII/Latin font before falling back to CJK fonts, I expect most terminal and editor to show the Latin glyph for “…” (width=1) most of the time. >> >> So practically, it would be correct most of the time if we assume the following code points have a width of 1, regardless of locale: >> >> – HORIZONTAL ELLIPSIS … >> – LEFT/RIGHT DOUBLE QUOTATION MARK “” >> – LEFT/RIGHT SINGLE QUOTATION MARK ‘’ >> – EM DASH — >> – MIDDLE DOT · >> >> But obviously if someone configures their terminal or editor to use CJK font first, these characters MIGHT have width = 2. I said MIGHT because there are plenty CJK fonts that uses the 1-width Latin glyph for these characters by default. >> >> It might be helpful to have a wrapper string-width that considers heuristics like this, while string-width goes strictly by Unicode and locale. > Thanks. My conclusion from the above is a bit different: we should > introduce a user option to modify the behavior of > use-cjk-char-width-table, such that users who have fonts where these > characters are not double-width could have the width of these > characters left at their Unicode values. We could add an option, and then go with the default value which corresponds to whatever seems the common opinion here. Anyway, it doesn't seem like anybody else in this discussion is better equipped to choose that user option's name, or write the rest of the patch. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-02 12:57 bug#64420: string-width of … is 2 in CJK environments Dmitry Gutov 2023-07-02 13:10 ` Eli Zaretskii @ 2023-07-14 4:45 ` SUNG TAE KIM 2023-07-14 6:58 ` Eli Zaretskii 2023-07-14 9:21 ` SUNG TAE KIM 2023-07-16 16:59 ` SUNG TAE KIM 3 siblings, 1 reply; 41+ messages in thread From: SUNG TAE KIM @ 2023-07-14 4:45 UTC (permalink / raw) To: 64420 [-- Attachment #1: Type: text/plain, Size: 2579 bytes --] Hi, I'm the issue(https://github.com/company-mode/company-mode/issues/1388) reporter of emacs company package. I've been suggested to comment by the project owner of the company package on the matter of character-width-table. So, here's my thoughts. There's many characters marked as A(ambiguous) width in the file ( https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt) which is one of the Unicode 15.0.0 Character Database. The characters inside the general punctuation block (U+2000..U+206F) are marked as either N(Narrow) or A(Ambiguous) width and the ellipsis character(U+2026) is marked as A. Also there's a suggestion for rendering the ambiguous width unicode character for Non-East Asian character in the Unicode 15.0.0 East Asian Width Technical Report(http://www.unicode.org/reports/tr11/). Quotes from the TR. > 5 Recommendations > > When processing or displaying data > > • Ambiguous characters behave like wide or narrow characters depending on the context (language tag, script identification, associated font, source of data, or explicit markup; all can provide the context). If the context cannot be established reliably, they should be treated as narrow characters by default. My understanding of the report about the treatment of the ambiguous width is that the context is paramount and the recommendation of the default is narrow for the non-East Asian characters. How about in practice? I've tested the rendering of a few ambiguous width characters on some OSes - terminal. macOS Mojave - builtin, kitty, iterm2 Rendered as narrow character regardless of locale/font setting. Windows 11 - old and new terminal Rendered as narrow character regardless of locale/font setting. Ubuntu 20 - gnome-terminal User can set the width of ambiguous characters either narrow(default) or wide through compatibility option. I'm surprised gnome-terminal has this option. However, it seems incomplete because when I try to delete an ambiguous width character rendered as a wide one, the terminal masses up its cursor position whereas deleting a wide character works fine. So, I think the proper default width value of the ambiguous width characters is narrow and there must be options for setting width for those ambiguous width characters, but such change of default value might cause breakage in the emacs packages which rely on the CJK language environment. All in all, I think providing comprehensive options to change the width of those ambiguous width characters will be desirable. [-- Attachment #2: Type: text/html, Size: 2891 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-14 4:45 ` SUNG TAE KIM @ 2023-07-14 6:58 ` Eli Zaretskii 2023-07-16 11:51 ` Eli Zaretskii 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-07-14 6:58 UTC (permalink / raw) To: SUNG TAE KIM; +Cc: 64420 > From: SUNG TAE KIM <itaemu@gmail.com> > Date: Fri, 14 Jul 2023 13:45:58 +0900 > > So, I think the proper default width value of the ambiguous width characters is narrow and there must > be options for setting width for those ambiguous width characters, but such change of default value > might cause breakage in the emacs packages which rely on the CJK language environment. > > All in all, I think providing comprehensive options to change the width of those ambiguous width > characters will be desirable. Thanks, those are also my conclusions, as described here: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64420#50 By default, Emacs already treats the ellipsis as a Narrow character, and our current idea of "context" is the value of language-environment, when the font information is not available. Since Emacs doesn't currently support language tags or any other feature which would allow the language to change on a per-buffer or per-text region basis, the best we can do to allow finer-tuned width of these characters is some kind of user customization, which assumes that users know better which fonts are used by Emacs and by terminal emulators they use for the Emacs TTY frames. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-14 6:58 ` Eli Zaretskii @ 2023-07-16 11:51 ` Eli Zaretskii 0 siblings, 0 replies; 41+ messages in thread From: Eli Zaretskii @ 2023-07-16 11:51 UTC (permalink / raw) To: itaemu, casouri; +Cc: 64420 > Cc: 64420@debbugs.gnu.org > Date: Fri, 14 Jul 2023 09:58:42 +0300 > From: Eli Zaretskii <eliz@gnu.org> > > By default, Emacs already treats the ellipsis as a Narrow character, > and our current idea of "context" is the value of > language-environment, when the font information is not available. > Since Emacs doesn't currently support language tags or any other > feature which would allow the language to change on a per-buffer or > per-text region basis, the best we can do to allow finer-tuned width > of these characters is some kind of user customization, which assumes > that users know better which fonts are used by Emacs and by terminal > emulators they use for the Emacs TTY frames. Would someone please go over the characters whose width is marked as "ambiguous" ("A") in Unicode's EastAsianWidth.txt file, and tell which ones of them we should make single-column, when the above mentioned user options tells us to default to "narrow"? I think all those up to codepoint #x324F should be treated like that, but maybe I decided wrong? TIA ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-02 12:57 bug#64420: string-width of … is 2 in CJK environments Dmitry Gutov 2023-07-02 13:10 ` Eli Zaretskii 2023-07-14 4:45 ` SUNG TAE KIM @ 2023-07-14 9:21 ` SUNG TAE KIM 2023-07-14 11:04 ` Eli Zaretskii 2023-07-16 16:59 ` SUNG TAE KIM 3 siblings, 1 reply; 41+ messages in thread From: SUNG TAE KIM @ 2023-07-14 9:21 UTC (permalink / raw) To: eliz; +Cc: 64420 [-- Attachment #1: Type: text/plain, Size: 1095 bytes --] > By default, Emacs already treats the ellipsis as a Narrow character, and our current idea of "context" is the value of language-environment, when the font information is not available. I'll try to clarify my opinion a bit more. What I meant by default was default in the CJK language environment and the default width of the ambiguous characters in CJK environment should be narrow. Current emacs changes the width of ambiguous characters to wide if the user activates the CJK environment. The unicode standard recommendation is set the width narrow at unclear circumstances but emacs changes the width to wide even if it can't know what font is currently used. For that reason, I don't think such behavior is aligned well with the unicode standard. Furthermore, The majority of the default width of those characters in the CJK environment is narrow on contemporary implementation of the terminals from my limited experience. However, Considering the emacs package ecosystem, current emacs behavior is ok as long as there's an easy option for changing such values. I hope this makes sense. [-- Attachment #2: Type: text/html, Size: 1186 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-14 9:21 ` SUNG TAE KIM @ 2023-07-14 11:04 ` Eli Zaretskii 2023-07-14 20:11 ` Yuan Fu 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-07-14 11:04 UTC (permalink / raw) To: SUNG TAE KIM; +Cc: 64420 > From: SUNG TAE KIM <itaemu@gmail.com> > Date: Fri, 14 Jul 2023 18:21:42 +0900 > Cc: 64420@debbugs.gnu.org > > What I meant by default was default in the CJK language environment and the default width of the > ambiguous characters in CJK environment should be narrow. Current emacs changes the width of > ambiguous characters to wide if the user activates the CJK environment. The unicode standard > recommendation is set the width narrow at unclear circumstances but emacs changes the width to > wide even if it can't know what font is currently used. For that reason, I don't think such behavior is > aligned well with the unicode standard. We don't blindly follow the Unicode Standard. We seriously consider its recommendations, and then do whatever we think is best for our users. > Furthermore, The majority of the default width of those > characters in the CJK environment is narrow on contemporary implementation of the terminals from > my limited experience. However, Considering the emacs package ecosystem, current emacs > behavior is ok as long as there's an easy option for changing such values. It is not yet clear to me whether handling these characters as narrow by default in CJK language-environments is TRT. But adding an option to do so is a first step in that direction, if indeed this is the right direction: we can in the future make this optional behavior be the default, if we arrive at the conclusion that most users configure their fonts and their terminal emulators such that these characters have the narrow width. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-14 11:04 ` Eli Zaretskii @ 2023-07-14 20:11 ` Yuan Fu 0 siblings, 0 replies; 41+ messages in thread From: Yuan Fu @ 2023-07-14 20:11 UTC (permalink / raw) To: Eli Zaretskii; +Cc: SUNG TAE KIM, 64420 > On Jul 14, 2023, at 4:04 AM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: SUNG TAE KIM <itaemu@gmail.com> >> Date: Fri, 14 Jul 2023 18:21:42 +0900 >> Cc: 64420@debbugs.gnu.org >> >> What I meant by default was default in the CJK language environment and the default width of the >> ambiguous characters in CJK environment should be narrow. Current emacs changes the width of >> ambiguous characters to wide if the user activates the CJK environment. The unicode standard >> recommendation is set the width narrow at unclear circumstances but emacs changes the width to >> wide even if it can't know what font is currently used. For that reason, I don't think such behavior is >> aligned well with the unicode standard. > > We don't blindly follow the Unicode Standard. We seriously consider > its recommendations, and then do whatever we think is best for our > users. > >> Furthermore, The majority of the default width of those >> characters in the CJK environment is narrow on contemporary implementation of the terminals from >> my limited experience. However, Considering the emacs package ecosystem, current emacs >> behavior is ok as long as there's an easy option for changing such values. > > It is not yet clear to me whether handling these characters as narrow > by default in CJK language-environments is TRT. But adding an option > to do so is a first step in that direction, if indeed this is the > right direction: we can in the future make this optional behavior be > the default, if we arrive at the conclusion that most users configure > their fonts and their terminal emulators such that these characters > have the narrow width. I tend to agree with Sung Tae, but this sounds like a reasonable compromise to me. Yuan ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-02 12:57 bug#64420: string-width of … is 2 in CJK environments Dmitry Gutov ` (2 preceding siblings ...) 2023-07-14 9:21 ` SUNG TAE KIM @ 2023-07-16 16:59 ` SUNG TAE KIM 2023-07-16 17:15 ` Eli Zaretskii 3 siblings, 1 reply; 41+ messages in thread From: SUNG TAE KIM @ 2023-07-16 16:59 UTC (permalink / raw) To: eliz, casouri; +Cc: 64420 I see no issue in changing default width of ambiguous characters to narrow except variation selector blocks(FE00..FE0F, E0100..E01EF) and private-use blocks(E000..F8FF, F0000..FFFFD, 100000..10FFFD) because the characters in the former blocks are not standalone[1] and the characters of the latter blocks are reserved for 3rd-party and everything else seems standalone characters. [1] https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block) ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-16 16:59 ` SUNG TAE KIM @ 2023-07-16 17:15 ` Eli Zaretskii 2023-08-05 15:01 ` Eli Zaretskii 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-07-16 17:15 UTC (permalink / raw) To: SUNG TAE KIM; +Cc: casouri, 64420 > From: SUNG TAE KIM <itaemu@gmail.com> > Date: Mon, 17 Jul 2023 01:59:15 +0900 > Cc: 64420@debbugs.gnu.org > > I see no issue in changing default width of ambiguous characters to > narrow except variation selector blocks(FE00..FE0F, E0100..E01EF) and > private-use blocks(E000..F8FF, F0000..FFFFD, 100000..10FFFD) because > the characters in the former blocks are not standalone[1] and the > characters of the latter blocks are reserved for 3rd-party and > everything else seems standalone characters. > > [1] https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block) Thanks. I said I intend to end at #x324F because use-cjk-char-width-table doesn't touch ambiguous characters with higher codepoints, so they are already narrow in Emacs, and we don't need to "fix" them. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-07-16 17:15 ` Eli Zaretskii @ 2023-08-05 15:01 ` Eli Zaretskii 2023-08-10 21:58 ` Yuan Fu 2023-08-11 23:52 ` Dmitry Gutov 0 siblings, 2 replies; 41+ messages in thread From: Eli Zaretskii @ 2023-08-05 15:01 UTC (permalink / raw) To: dmitry; +Cc: itaemu, casouri, 64420 > Cc: casouri@gmail.com, 64420@debbugs.gnu.org > Date: Sun, 16 Jul 2023 20:15:30 +0300 > From: Eli Zaretskii <eliz@gnu.org> > > > From: SUNG TAE KIM <itaemu@gmail.com> > > Date: Mon, 17 Jul 2023 01:59:15 +0900 > > Cc: 64420@debbugs.gnu.org > > > > I see no issue in changing default width of ambiguous characters to > > narrow except variation selector blocks(FE00..FE0F, E0100..E01EF) and > > private-use blocks(E000..F8FF, F0000..FFFFD, 100000..10FFFD) because > > the characters in the former blocks are not standalone[1] and the > > characters of the latter blocks are reserved for 3rd-party and > > everything else seems standalone characters. > > > > [1] https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block) > > Thanks. > > I said I intend to end at #x324F because use-cjk-char-width-table > doesn't touch ambiguous characters with higher codepoints, so they are > already narrow in Emacs, and we don't need to "fix" them. OK, this is now installed on master. We have a new user option named cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the characters proclaimed by Unicode as "ambiguous" will have char-width of 1, not 2. Note that this option should be set either via 'setopt' or the Customize interface, not via 'setq'. Let me know how well this works for you. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-05 15:01 ` Eli Zaretskii @ 2023-08-10 21:58 ` Yuan Fu 2023-08-11 5:53 ` Eli Zaretskii 2023-08-11 23:52 ` Dmitry Gutov 1 sibling, 1 reply; 41+ messages in thread From: Yuan Fu @ 2023-08-10 21:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: SUNG TAE KIM, Dmitry Gutov, 64420 > On Aug 5, 2023, at 8:01 AM, Eli Zaretskii <eliz@gnu.org> wrote: > >> Cc: casouri@gmail.com, 64420@debbugs.gnu.org >> Date: Sun, 16 Jul 2023 20:15:30 +0300 >> From: Eli Zaretskii <eliz@gnu.org> >> >>> From: SUNG TAE KIM <itaemu@gmail.com> >>> Date: Mon, 17 Jul 2023 01:59:15 +0900 >>> Cc: 64420@debbugs.gnu.org >>> >>> I see no issue in changing default width of ambiguous characters to >>> narrow except variation selector blocks(FE00..FE0F, E0100..E01EF) and >>> private-use blocks(E000..F8FF, F0000..FFFFD, 100000..10FFFD) because >>> the characters in the former blocks are not standalone[1] and the >>> characters of the latter blocks are reserved for 3rd-party and >>> everything else seems standalone characters. >>> >>> [1] https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block) >> >> Thanks. >> >> I said I intend to end at #x324F because use-cjk-char-width-table >> doesn't touch ambiguous characters with higher codepoints, so they are >> already narrow in Emacs, and we don't need to "fix" them. > > OK, this is now installed on master. We have a new user option named > cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the > characters proclaimed by Unicode as "ambiguous" will have char-width > of 1, not 2. Note that this option should be set either via 'setopt' > or the Customize interface, not via 'setq'. > > Let me know how well this works for you. Thanks! I can’t tell you how well it works tho since I don’t use company :-) Yuan ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-10 21:58 ` Yuan Fu @ 2023-08-11 5:53 ` Eli Zaretskii 2023-08-11 18:07 ` Yuan Fu 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-08-11 5:53 UTC (permalink / raw) To: Yuan Fu; +Cc: itaemu, dmitry, 64420 > From: Yuan Fu <casouri@gmail.com> > Date: Thu, 10 Aug 2023 14:58:37 -0700 > Cc: Dmitry Gutov <dmitry@gutov.dev>, > SUNG TAE KIM <itaemu@gmail.com>, > 64420@debbugs.gnu.org > > > OK, this is now installed on master. We have a new user option named > > cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the > > characters proclaimed by Unicode as "ambiguous" will have char-width > > of 1, not 2. Note that this option should be set either via 'setopt' > > or the Customize interface, not via 'setq'. > > > > Let me know how well this works for you. > > Thanks! I can’t tell you how well it works tho since I don’t use company :-) You don't need company to see if this works well for you. Just use string-width or even char-width with some problematic characters (you can find the list of them in characters.el, search for "ambiguous"), and compare the results when this new variable is nil and non-nil. I'm interested to know how many people need the variable to be non-nil (its default) to have the width match the fonts they use in Emacs, both in GUI and in TTY frames, since there's the claim that no one needs those characters be considered full-width nowadays. If that claim is correct, we should consider changing the default value of this variable in Emacs 30. TIA ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-11 5:53 ` Eli Zaretskii @ 2023-08-11 18:07 ` Yuan Fu 2023-08-11 18:36 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Yuan Fu @ 2023-08-11 18:07 UTC (permalink / raw) To: Eli Zaretskii; +Cc: itaemu, dmitry, 64420 [-- Attachment #1: Type: text/plain, Size: 2763 bytes --] > On Aug 10, 2023, at 10:53 PM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Yuan Fu <casouri@gmail.com> >> Date: Thu, 10 Aug 2023 14:58:37 -0700 >> Cc: Dmitry Gutov <dmitry@gutov.dev>, >> SUNG TAE KIM <itaemu@gmail.com>, >> 64420@debbugs.gnu.org >> >>> OK, this is now installed on master. We have a new user option named >>> cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the >>> characters proclaimed by Unicode as "ambiguous" will have char-width >>> of 1, not 2. Note that this option should be set either via 'setopt' >>> or the Customize interface, not via 'setq'. >>> >>> Let me know how well this works for you. >> >> Thanks! I can’t tell you how well it works tho since I don’t use company :-) > > You don't need company to see if this works well for you. Just use > string-width or even char-width with some problematic characters (you > can find the list of them in characters.el, search for "ambiguous"), > and compare the results when this new variable is nil and non-nil. > I'm interested to know how many people need the variable to be non-nil > (its default) to have the width match the fonts they use in Emacs, > both in GUI and in TTY frames, since there's the claim that no one > needs those characters be considered full-width nowadays. If that > claim is correct, we should consider changing the default value of > this variable in Emacs 30. On my machine, all the ambiguous characters have width of 1, even with the default value of cjk-ambiguous-chars-are-wide (I use utf8_en locale). That’s expected. I tried printing all the ambiguous characters, I attached a screenshot of them (the first line is a line of CJK characters for reference). (Scrrenshot-1.png, screenshot-2.png) On terminal, I saw an interesting option, “Ambiguous characters are double-width” (terminal-setting.png), which is the same as cjk-ambiguous-chars-are-wide. If I turn it on all the ambiguous characters are indeed displayed in double-width. (terminal-narrow.png, terminal-wide.png) On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway. On terminal, at least iterm2 displays ambiguous characters as single-width by default, (I assume) regardless of locale. And it displays a warning when you try to turn the “Ambiguous characters are double-width” option [1]. Yuan [1] "You probably don't want to turn this on. It will confuse interactive programs. You might want it if you work mostly with East Asian text combined with legacy or mathematical character sets. Are you sure you want this?" [-- Attachment #2: terminal-wide.png --] [-- Type: image/png, Size: 193958 bytes --] [-- Attachment #3: terminal-narrow.png --] [-- Type: image/png, Size: 165356 bytes --] [-- Attachment #4: terminal-setting.png --] [-- Type: image/png, Size: 243101 bytes --] [-- Attachment #5: screenshot-2.png --] [-- Type: image/png, Size: 141983 bytes --] [-- Attachment #6: screenshot-1.png --] [-- Type: image/png, Size: 186883 bytes --] [-- Attachment #7: amiguous-width.txt --] [-- Type: text/plain, Size: 2355 bytes --] ä¸æä¸å½ä¸æä¸å½ä¸æä¸å½ä¸æä¸å½ ¡¤§¨ªÂ®°±²³´¶·¸¹º¼½¾ ¿ÃÃÃÃà áæèéêìÃðòó÷øùú üþÄÄÄÄĦħīıIJijĸĿÅÅÅÅÅÅ ÅÅÅÅÅŦŧūÇÇÇÇÇÇÇÇÉÉ¡ËË ËËËËËËËËËËËÌÌÌÌÌÌ ÌÌÌ ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ ÌÌÌÌ Ì¡Ì¢Ì£Ì¤Ì¥Ì¦Ì§Ì¨Ì©ÌªÌ«Ì¬Ì̮̯̰ ̴̵̶̷̸̱̲̳̹̺̻̼̽̾̿ÍÍÍÍÍ Í ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ ÍÍÍÍÍÍÍÍ Í¡Í¢Í£Í¤Í¥Í¦Í§Í¨Í©ÍªÍ«Í¬ ÍͮͯÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎΠΡ ΣΤΥΦΧΨΩαβγδεζηθικλμν ξοÏÏÏÏÏ ÏÏÏÏÐÐÐÐÐÐÐÐÐ ÐÐÐÐÐÐÐÐРСТУФХЦЧШЩЪЫ ЬÐЮЯабвгдежзийклмноп ÑÑÑÑÑÑ ÑÑÑÑÑÑÑÑÑÑÑâââ âââââââ â¡â¢â¤â¥â¦â§â°â²â³âµâ¾â´â¿ âââââ¬ââ ââââ¡â¢â¦â«â â â â â â â ⠡⠢⠣⠤⠥⠦⠧⠨⠩⠪⠫⠰⠱⠲⠳⠴⠵⠶⠷ ⠸⠹ââââââââââââ¸â¹âââ§ââ ââââââââââââ â£â¥â§â¨â©âªâ«â¬ â®â´âµâ¶â·â¼â½ââââ â¡â¤â¥â¦â§âªâ«â®â¯ âââââââ¥â¿ââ â¡â¢â£â¤â¥â¦â§â¨â©âª â«â¬ââ®â¯â°â±â²â³â´âµâ¶â·â¸â¹âºâ»â¼â½â¾ â¿ââââââ âââââââââââââ ââââââââââââââ â¡â¢â£â¤â¥â¦ â§â¨â©âªâ«â¬ââ®â¯â°â±â²â³â´âµâ¶â·â¸â¹âº â»â¼â½â¾â¿ââââââ âââââââââ ââââââââââââââââââ â¡â¢ â£â¤â¥â¦â§â¨â©â«â¬ââ®â¯â°â±â²â³â´âµâ¶â· â¸â¹âºâ»â¼â½â¾â¿ââââââ ââââââ ââââââââââââââââââââ â â¡â¢â£â¤â¥â¦â§â¨â©âªâ«â¬ââ®â¯â°â±â²â³ â´âµâ¶â·â¸â¹âºâ»â¼â½â¾â¿ââââââ ââ ââââââââââââââââââââ â â¡â¢â£â¤â¥â¦â§â¨â©âªâ«â¬ââ®â¯â°â±â²â³ ââââââ ââââââââââââââ â â¡â£â¤â¥â¦â§â¨â©â²â³â¶â·â¼â½âââââ âââââ¢â£â¤â¥â¯â ââââââââ â¡â£ â¤â¥â§â¨â©âªâ¬ââ¯âââ¿ââââââââ âââââââââââââââââ â¡â£â¨ â©â«â¬ââ®â¯â°â±â´â¶â·â¸â¹â»â¼â¾â¿â½â¶â· â¸â¹âºâ»â¼â½â¾â¿ââââãããããããã ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-11 18:07 ` Yuan Fu @ 2023-08-11 18:36 ` Eli Zaretskii 2023-08-12 20:18 ` Yuan Fu 2023-08-11 22:34 ` Dmitry Gutov 2023-08-13 0:22 ` Dmitry Gutov 2 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-08-11 18:36 UTC (permalink / raw) To: Yuan Fu; +Cc: itaemu, dmitry, 64420 > From: Yuan Fu <casouri@gmail.com> > Date: Fri, 11 Aug 2023 11:07:26 -0700 > Cc: dmitry@gutov.dev, > itaemu@gmail.com, > 64420@debbugs.gnu.org > > On terminal, I saw an interesting option, “Ambiguous characters are double-width” (terminal-setting.png), which is the same as cjk-ambiguous-chars-are-wide. If I turn it on all the ambiguous characters are indeed displayed in double-width. (terminal-narrow.png, terminal-wide.png) And in that case, you need to set cjk-ambiguous-chars-are-wide non-nil to have Emacs display those characters correctly? Or does that option have no effect on the correctness of the |Emacs display on that terminal? > On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway. Is the actual width closer to 1 or to 2? Thanks. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-11 18:36 ` Eli Zaretskii @ 2023-08-12 20:18 ` Yuan Fu 0 siblings, 0 replies; 41+ messages in thread From: Yuan Fu @ 2023-08-12 20:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: SUNG TAE KIM, Dmitry Gutov, 64420 > On Aug 11, 2023, at 11:36 AM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Yuan Fu <casouri@gmail.com> >> Date: Fri, 11 Aug 2023 11:07:26 -0700 >> Cc: dmitry@gutov.dev, >> itaemu@gmail.com, >> 64420@debbugs.gnu.org >> >> On terminal, I saw an interesting option, “Ambiguous characters are double-width” (terminal-setting.png), which is the same as cjk-ambiguous-chars-are-wide. If I turn it on all the ambiguous characters are indeed displayed in double-width. (terminal-narrow.png, terminal-wide.png) > > And in that case, you need to set cjk-ambiguous-chars-are-wide non-nil > to have Emacs display those characters correctly? Or does that option > have no effect on the correctness of the |Emacs display on that > terminal? The value of cjk-ambiguous-chars-are-wide has no effect on the display of those characters in the terminal, at least in the terminal I use (iTerm2). Only the terminal option has an effect. The screenshot I took are actually from cat, not Emacs. I tried with Emacs and found out that the terminal and Emacs must agree on the width of those characters, otherwise the cursor movement is broken (perhaps that’s not surprising to you). The cursor movement works if either a) I turn on "Ambiguous characters are double-width” in the terminal and (set-language-environment "Chinese-BIG5”) in Emacs, or b) I turn off "Ambiguous characters are double-width” (which is off by default) and use default locale (utf8_enUS). > >> On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway. > > Is the actual width closer to 1 or to 2? > I’d say 2. Yuan ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-11 18:07 ` Yuan Fu 2023-08-11 18:36 ` Eli Zaretskii @ 2023-08-11 22:34 ` Dmitry Gutov 2023-08-13 0:22 ` Dmitry Gutov 2 siblings, 0 replies; 41+ messages in thread From: Dmitry Gutov @ 2023-08-11 22:34 UTC (permalink / raw) To: Yuan Fu, Eli Zaretskii; +Cc: itaemu, 64420 On 11/08/2023 21:07, Yuan Fu wrote: > On my machine, all the ambiguous characters have width of 1, even with the default value of cjk-ambiguous-chars-are-wide (I use utf8_en locale). What if you start the test with (set-language-environment "Chinese-BIG5") ? ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-11 18:07 ` Yuan Fu 2023-08-11 18:36 ` Eli Zaretskii 2023-08-11 22:34 ` Dmitry Gutov @ 2023-08-13 0:22 ` Dmitry Gutov 2023-08-13 5:24 ` Eli Zaretskii 2 siblings, 1 reply; 41+ messages in thread From: Dmitry Gutov @ 2023-08-13 0:22 UTC (permalink / raw) To: Yuan Fu, Eli Zaretskii; +Cc: itaemu, 64420 On 11/08/2023 21:07, Yuan Fu wrote: > On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway. BTW, I think most double-width characters on GUI are less than 2 characters wide? So the point here would be that some "ambiguous" ones are still wider than 1, I guess. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-13 0:22 ` Dmitry Gutov @ 2023-08-13 5:24 ` Eli Zaretskii 2023-08-13 10:48 ` Dmitry Gutov 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-08-13 5:24 UTC (permalink / raw) To: Dmitry Gutov; +Cc: itaemu, casouri, 64420 > Date: Sun, 13 Aug 2023 03:22:41 +0300 > Cc: itaemu@gmail.com, 64420@debbugs.gnu.org > From: Dmitry Gutov <dmitry@gutov.dev> > > On 11/08/2023 21:07, Yuan Fu wrote: > > On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway. > > BTW, I think most double-width characters on GUI are less than 2 > characters wide? > > So the point here would be that some "ambiguous" ones are still wider > than 1, I guess. According to Yuan, at least in his environment those characters have a width that is closer to 2 than to 1. In which case using 2 would produce better alignment. Of course, using string-pixel-width will produce an even better alignment. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-13 5:24 ` Eli Zaretskii @ 2023-08-13 10:48 ` Dmitry Gutov 2023-08-13 12:01 ` Eli Zaretskii 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Gutov @ 2023-08-13 10:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: itaemu, casouri, 64420 On 13/08/2023 08:24, Eli Zaretskii wrote: >> Date: Sun, 13 Aug 2023 03:22:41 +0300 >> Cc:itaemu@gmail.com,64420@debbugs.gnu.org >> From: Dmitry Gutov<dmitry@gutov.dev> >> >> On 11/08/2023 21:07, Yuan Fu wrote: >>> On GUI display, the later-half of the ambiguous characters are definitely wider than one char, but they aren’t quite 2 chars wide either. But I guess it doesn’t matter too much since one should use pixel size on GUI anyway. >> BTW, I think most double-width characters on GUI are less than 2 >> characters wide? >> >> So the point here would be that some "ambiguous" ones are still wider >> than 1, I guess. > According to Yuan, at least in his environment those characters have a > width that is closer to 2 than to 1. In which case using 2 would > produce better alignment. Of course, using string-pixel-width will > produce an even better alignment. In GUI, that is. But if they are displayed with width 1 in terminal, we better make string-width return 1 for them too. That might be slightly worse for certain applications (like popup in company), but at least the basic rendering and navigation bugs in terminal will be fixed this way. And the new popup rendering for company (using string-width and spacing instructions) is close to being ready anyway. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-13 10:48 ` Dmitry Gutov @ 2023-08-13 12:01 ` Eli Zaretskii 2023-08-13 12:53 ` Dmitry Gutov 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-08-13 12:01 UTC (permalink / raw) To: Dmitry Gutov; +Cc: itaemu, casouri, 64420 > Date: Sun, 13 Aug 2023 13:48:42 +0300 > Cc: casouri@gmail.com, itaemu@gmail.com, 64420@debbugs.gnu.org > From: Dmitry Gutov <dmitry@gutov.dev> > > >> So the point here would be that some "ambiguous" ones are still wider > >> than 1, I guess. > > According to Yuan, at least in his environment those characters have a > > width that is closer to 2 than to 1. In which case using 2 would > > produce better alignment. Of course, using string-pixel-width will > > produce an even better alignment. > > In GUI, that is. But if they are displayed with width 1 in terminal, we > better make string-width return 1 for them too. Yes. But it turns out that how wide these characters are on TTY frames depends on the terminal emulator and its own options regarding those characters. So some users will want the value 1 and others will want the value 2, depending on which terminals they use and what options of those terminals they like best. The important part is that the Emacs's notion of the character width is consistent with that of the terminal. > That might be slightly worse for certain applications (like popup in > company), but at least the basic rendering and navigation bugs in > terminal will be fixed this way. And the new popup rendering for company > (using string-width and spacing instructions) is close to being ready > anyway. Yes, sure. There's no doubt on my side that this option is useful; I'm just trying to collect data that would allow us to decide on the best default value, that's all. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-13 12:01 ` Eli Zaretskii @ 2023-08-13 12:53 ` Dmitry Gutov 0 siblings, 0 replies; 41+ messages in thread From: Dmitry Gutov @ 2023-08-13 12:53 UTC (permalink / raw) To: Eli Zaretskii; +Cc: itaemu, casouri, 64420 On 13/08/2023 15:01, Eli Zaretskii wrote: >> Date: Sun, 13 Aug 2023 13:48:42 +0300 >> Cc: casouri@gmail.com, itaemu@gmail.com, 64420@debbugs.gnu.org >> From: Dmitry Gutov <dmitry@gutov.dev> >> >>>> So the point here would be that some "ambiguous" ones are still wider >>>> than 1, I guess. >>> According to Yuan, at least in his environment those characters have a >>> width that is closer to 2 than to 1. In which case using 2 would >>> produce better alignment. Of course, using string-pixel-width will >>> produce an even better alignment. >> >> In GUI, that is. But if they are displayed with width 1 in terminal, we >> better make string-width return 1 for them too. > > Yes. But it turns out that how wide these characters are on TTY > frames depends on the terminal emulator and its own options regarding > those characters. So some users will want the value 1 and others will > want the value 2, depending on which terminals they use and what > options of those terminals they like best. That's where having the option that we just added will be beneficial. As opposed to, say, changing the behavior outright. > The important part is that the Emacs's notion of the character width > is consistent with that of the terminal. > >> That might be slightly worse for certain applications (like popup in >> company), but at least the basic rendering and navigation bugs in >> terminal will be fixed this way. And the new popup rendering for company >> (using string-width and spacing instructions) is close to being ready >> anyway. > > Yes, sure. There's no doubt on my side that this option is useful; > I'm just trying to collect data that would allow us to decide on the > best default value, that's all. Yuan seems to be saying that iTerm2, at least, defaults to showing the ambiguous chars at width 1 and issues a warning when the user tries to change that option. gnome-terminal also has that default. I just checked on my machine (Ubuntu from 2022), and the description in this bug report from 2015 also says that: https://bugzilla.gnome.org/show_bug.cgi?id=749414, so the default is not new. Others are welcome to report their experience. I've found a couple of conflicting reports regarding Microsoft Terminal, someone could test that too. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-05 15:01 ` Eli Zaretskii 2023-08-10 21:58 ` Yuan Fu @ 2023-08-11 23:52 ` Dmitry Gutov 2023-08-12 5:50 ` Eli Zaretskii 1 sibling, 1 reply; 41+ messages in thread From: Dmitry Gutov @ 2023-08-11 23:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: itaemu, casouri, 64420 On 05/08/2023 18:01, Eli Zaretskii wrote: > OK, this is now installed on master. We have a new user option named > cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the > characters proclaimed by Unicode as "ambiguous" will have char-width > of 1, not 2. Note that this option should be set either via 'setopt' > or the Customize interface, not via 'setq'. > > Let me know how well this works for you. Seems to work fine, thank you. With the caveat that, in the terminal, if I switch to Chinese-BIG5 and visit a file with ambiguous characters like … (which triggers some bugs with display and navigation around those chars), (setopt cjk-ambiguous-chars-are-wide nil) doesn't fix those bugs -- I have to kill and re-visit the buffer for them to go away. But maybe that's expected. In GUI everything's fine, the 'setopt' call makes things better right away. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-11 23:52 ` Dmitry Gutov @ 2023-08-12 5:50 ` Eli Zaretskii 2023-08-12 16:40 ` Dmitry Gutov 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2023-08-12 5:50 UTC (permalink / raw) To: Dmitry Gutov; +Cc: itaemu, casouri, 64420 > Date: Sat, 12 Aug 2023 02:52:29 +0300 > Cc: itaemu@gmail.com, casouri@gmail.com, 64420@debbugs.gnu.org > From: Dmitry Gutov <dmitry@gutov.dev> > > On 05/08/2023 18:01, Eli Zaretskii wrote: > > OK, this is now installed on master. We have a new user option named > > cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the > > characters proclaimed by Unicode as "ambiguous" will have char-width > > of 1, not 2. Note that this option should be set either via 'setopt' > > or the Customize interface, not via 'setq'. > > > > Let me know how well this works for you. > > Seems to work fine, thank you. > > With the caveat that, in the terminal, if I switch to Chinese-BIG5 and > visit a file with ambiguous characters like … (which triggers some bugs > with display and navigation around those chars), (setopt > cjk-ambiguous-chars-are-wide nil) doesn't fix those bugs -- I have to > kill and re-visit the buffer for them to go away. But maybe that's expected. Does "M-x redraw-display RET" solve the problem after setting the variable? ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-12 5:50 ` Eli Zaretskii @ 2023-08-12 16:40 ` Dmitry Gutov 2023-08-12 17:09 ` Eli Zaretskii 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Gutov @ 2023-08-12 16:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: itaemu, casouri, 64420 On 12/08/2023 08:50, Eli Zaretskii wrote: >> Date: Sat, 12 Aug 2023 02:52:29 +0300 >> Cc:itaemu@gmail.com,casouri@gmail.com,64420@debbugs.gnu.org >> From: Dmitry Gutov<dmitry@gutov.dev> >> >> On 05/08/2023 18:01, Eli Zaretskii wrote: >>> OK, this is now installed on master. We have a new user option named >>> cjk-ambiguous-chars-are-wide; its default is t, but if set to nil, the >>> characters proclaimed by Unicode as "ambiguous" will have char-width >>> of 1, not 2. Note that this option should be set either via 'setopt' >>> or the Customize interface, not via 'setq'. >>> >>> Let me know how well this works for you. >> Seems to work fine, thank you. >> >> With the caveat that, in the terminal, if I switch to Chinese-BIG5 and >> visit a file with ambiguous characters like … (which triggers some bugs >> with display and navigation around those chars), (setopt >> cjk-ambiguous-chars-are-wide nil) doesn't fix those bugs -- I have to >> kill and re-visit the buffer for them to go away. But maybe that's expected. > Does "M-x redraw-display RET" solve the problem after setting the > variable? Looks like it does, yes. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#64420: string-width of … is 2 in CJK environments 2023-08-12 16:40 ` Dmitry Gutov @ 2023-08-12 17:09 ` Eli Zaretskii 0 siblings, 0 replies; 41+ messages in thread From: Eli Zaretskii @ 2023-08-12 17:09 UTC (permalink / raw) To: Dmitry Gutov; +Cc: itaemu, casouri, 64420 > Date: Sat, 12 Aug 2023 19:40:01 +0300 > Cc: itaemu@gmail.com, casouri@gmail.com, 64420@debbugs.gnu.org > From: Dmitry Gutov <dmitry@gutov.dev> > > >> With the caveat that, in the terminal, if I switch to Chinese-BIG5 and > >> visit a file with ambiguous characters like … (which triggers some bugs > >> with display and navigation around those chars), (setopt > >> cjk-ambiguous-chars-are-wide nil) doesn't fix those bugs -- I have to > >> kill and re-visit the buffer for them to go away. But maybe that's expected. > > Does "M-x redraw-display RET" solve the problem after setting the > > variable? > > Looks like it does, yes. I guess the :set function should trigger that or something. Let me think about that. ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2023-08-13 12:53 UTC | newest] Thread overview: 41+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-02 12:57 bug#64420: string-width of … is 2 in CJK environments Dmitry Gutov 2023-07-02 13:10 ` Eli Zaretskii 2023-07-02 13:20 ` Dmitry Gutov 2023-07-02 13:43 ` Eli Zaretskii 2023-07-07 2:13 ` Dmitry Gutov 2023-07-07 6:29 ` Eli Zaretskii 2023-07-11 2:13 ` Dmitry Gutov 2023-07-11 11:41 ` Eli Zaretskii 2023-07-11 2:23 ` Dmitry Gutov 2023-07-11 11:48 ` Eli Zaretskii 2023-07-11 18:13 ` Dmitry Gutov 2023-07-11 18:45 ` Eli Zaretskii 2023-07-12 1:17 ` Dmitry Gutov 2023-07-12 19:54 ` Dmitry Gutov 2023-07-12 21:11 ` Yuan Fu 2023-07-13 5:23 ` Eli Zaretskii 2023-07-27 1:52 ` Dmitry Gutov 2023-07-14 4:45 ` SUNG TAE KIM 2023-07-14 6:58 ` Eli Zaretskii 2023-07-16 11:51 ` Eli Zaretskii 2023-07-14 9:21 ` SUNG TAE KIM 2023-07-14 11:04 ` Eli Zaretskii 2023-07-14 20:11 ` Yuan Fu 2023-07-16 16:59 ` SUNG TAE KIM 2023-07-16 17:15 ` Eli Zaretskii 2023-08-05 15:01 ` Eli Zaretskii 2023-08-10 21:58 ` Yuan Fu 2023-08-11 5:53 ` Eli Zaretskii 2023-08-11 18:07 ` Yuan Fu 2023-08-11 18:36 ` Eli Zaretskii 2023-08-12 20:18 ` Yuan Fu 2023-08-11 22:34 ` Dmitry Gutov 2023-08-13 0:22 ` Dmitry Gutov 2023-08-13 5:24 ` Eli Zaretskii 2023-08-13 10:48 ` Dmitry Gutov 2023-08-13 12:01 ` Eli Zaretskii 2023-08-13 12:53 ` Dmitry Gutov 2023-08-11 23:52 ` Dmitry Gutov 2023-08-12 5:50 ` Eli Zaretskii 2023-08-12 16:40 ` Dmitry Gutov 2023-08-12 17:09 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).