unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#63029: [BUG?] format inconsistency in deciding string widths on different locales
@ 2023-04-23 10:23 Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-04-23 11:06 ` Ihor Radchenko
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-04-23 10:23 UTC (permalink / raw)
  To: 63029

Hello,

I don't quite know yet whether this is a bug in Emacs.  Here are the
observed results, and note the unicode character:

--8<---------------cut here---------------start------------->8---
$ for locale in {en_US,fr_FR,de_DE,zh_CN,ja_JA}.UTF-8; do
    printf "$locale\t"
    LANG="$locale" src/emacs -Q -batch \
                   -eval '(message "%S" (format "%-5.5s" "1234…"))'
done
--8<---------------cut here---------------end--------------->8---

This results in the following output:

--8<---------------cut here---------------start------------->8---
en_US.UTF-8	"1234…"
fr_FR.UTF-8	"1234…"
de_DE.UTF-8	"1234…"
zh_CN.UTF-8	"1234 "
ja_JA.UTF-8	"1234 "
--8<---------------cut here---------------end--------------->8---

Notice that in zh_CN and ja_JA, we have a space instead of the expected
ellipsis character.


If this is expected behavior, how do we know how "wide" the `format'
function thinks any given character is?  In other words, why _does_ it
think "…" should be two-character wide?  And how do we, the elisp users,
get this information?  I tried to dive into the C code for
`styled_format', but got lost.  Thanks.

----------

Reproduced on this in-source build:

In GNU Emacs 30.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version
 3.24.37, cairo version 1.17.8) of 2023-04-23 built on fw.net.yu
Repository revision: 3badd2358d5f0af71887ee1cc9d39c2f312b6888
Repository branch: master
System Description: Arch Linux

Configured using:
 'configure --sysconfdir=/etc --prefix=/usr --localstatedir=/var
 --with-cairo --with-harfbuzz --with-libsystemd --with-modules
 --with-pgtk --with-native-compilation CFLAGS=-Og'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY
PDUMPER PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XIM GTK3 ZLIB

-- 
Best,


RY

[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#63029: [BUG?] format inconsistency in deciding string widths on different locales
  2023-04-23 10:23 bug#63029: [BUG?] format inconsistency in deciding string widths on different locales Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-04-23 11:06 ` Ihor Radchenko
  2023-04-23 11:08 ` Ihor Radchenko
  2023-04-23 14:19 ` Eli Zaretskii
  2 siblings, 0 replies; 7+ messages in thread
From: Ihor Radchenko @ 2023-04-23 11:06 UTC (permalink / raw)
  To: Ruijie Yu; +Cc: 63029

Ruijie Yu via "Bug reports for GNU Emacs, the Swiss army knife of text
>                    -eval '(message "%S" (format "%-5.5s" "1234…"))'
> ...
> en_US.UTF-8	"1234…"
> fr_FR.UTF-8	"1234…"
> de_DE.UTF-8	"1234…"
> zh_CN.UTF-8	"1234 "
> ja_JA.UTF-8	"1234 "

Context: https://orgmode.org/list/sdv7cu4ugk2.fsf@netyu.xyz

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#63029: [BUG?] format inconsistency in deciding string widths on different locales
  2023-04-23 10:23 bug#63029: [BUG?] format inconsistency in deciding string widths on different locales Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-04-23 11:06 ` Ihor Radchenko
@ 2023-04-23 11:08 ` Ihor Radchenko
  2023-04-23 14:19 ` Eli Zaretskii
  2 siblings, 0 replies; 7+ messages in thread
From: Ihor Radchenko @ 2023-04-23 11:08 UTC (permalink / raw)
  To: Ruijie Yu; +Cc: 63029

Ruijie Yu via "Bug reports for GNU Emacs, the Swiss army knife of text
editors" <bug-gnu-emacs@gnu.org> writes:

> en_US.UTF-8	"1234…"
> fr_FR.UTF-8	"1234…"
> de_DE.UTF-8	"1234…"
> zh_CN.UTF-8	"1234 "
> ja_JA.UTF-8	"1234 "

I can reproduce on the latest master, Emacs 28, Emacs 27, and Emacs 26.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#63029: [BUG?] format inconsistency in deciding string widths on different locales
  2023-04-23 10:23 bug#63029: [BUG?] format inconsistency in deciding string widths on different locales Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-04-23 11:06 ` Ihor Radchenko
  2023-04-23 11:08 ` Ihor Radchenko
@ 2023-04-23 14:19 ` Eli Zaretskii
  2023-04-23 14:23   ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-04-23 14:19 UTC (permalink / raw)
  To: Ruijie Yu; +Cc: 63029

> Date: Sun, 23 Apr 2023 18:23:02 +0800
> From:  Ruijie Yu via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> I don't quite know yet whether this is a bug in Emacs.  Here are the
> observed results, and note the unicode character:
> 
> --8<---------------cut here---------------start------------->8---
> $ for locale in {en_US,fr_FR,de_DE,zh_CN,ja_JA}.UTF-8; do
>     printf "$locale\t"
>     LANG="$locale" src/emacs -Q -batch \
>                    -eval '(message "%S" (format "%-5.5s" "1234…"))'
> done
> --8<---------------cut here---------------end--------------->8---
> 
> This results in the following output:
> 
> --8<---------------cut here---------------start------------->8---
> en_US.UTF-8	"1234…"
> fr_FR.UTF-8	"1234…"
> de_DE.UTF-8	"1234…"
> zh_CN.UTF-8	"1234 "
> ja_JA.UTF-8	"1234 "
> --8<---------------cut here---------------end--------------->8---
> 
> Notice that in zh_CN and ja_JA, we have a space instead of the expected
> ellipsis character.
> 
> 
> If this is expected behavior, how do we know how "wide" the `format'
> function thinks any given character is?  In other words, why _does_ it
> think "…" should be two-character wide?

This is a kludgey feature: in CJK locales some characters are always
considered double-width.  See code in characters.el that begins with a
comment around line 1140.  The function use-cjk-char-width-table
defined there is invoked (via the setup-function of the language
environment) when the language environment in Emacs is set to one of
those CJK locales.

The reason for this is that in CJK fonts these characters are supposed
to be rendered using full-width glyphs.

See also bug#54138 and
https://lists.gnu.org/archive/html/emacs-devel/2022-02/msg00917.html.

> And how do we, the elisp users, get this information?

I don't understand this question.  Please elaborate: what information
do you want to get, besides the width of the characters (which is
accessible via char-width-table).





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#63029: [BUG?] format inconsistency in deciding string widths on different locales
  2023-04-23 14:19 ` Eli Zaretskii
@ 2023-04-23 14:23   ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-04-23 14:32     ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-04-23 14:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 63029


Eli Zaretskii <eliz@gnu.org> writes:

>> If this is expected behavior, how do we know how "wide" the `format'
>> function thinks any given character is?  In other words, why _does_ it
>> think "…" should be two-character wide?
>
> This is a kludgey feature: in CJK locales some characters are always
> considered double-width.  See code in characters.el that begins with a
> comment around line 1140.  The function use-cjk-char-width-table
> defined there is invoked (via the setup-function of the language
> environment) when the language environment in Emacs is set to one of
> those CJK locales.
>
> The reason for this is that in CJK fonts these characters are supposed
> to be rendered using full-width glyphs.
>
> See also bug#54138 and
> https://lists.gnu.org/archive/html/emacs-devel/2022-02/msg00917.html.

Thanks for the link.  I have found the answer in your response there.

>> And how do we, the elisp users, get this information?
>
> I don't understand this question.  Please elaborate: what information
> do you want to get, besides the width of the characters (which is
> accessible via char-width-table).

You mentioning `char-width-table' here and `char-width' on the linked
thread precisely answered my question.  I was looking for `char-width'
without knowing its name.  Thanks.

-- 
Best,


RY

[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#63029: [BUG?] format inconsistency in deciding string widths on different locales
  2023-04-23 14:23   ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-04-23 14:32     ` Eli Zaretskii
  2023-04-23 14:38       ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-04-23 14:32 UTC (permalink / raw)
  To: Ruijie Yu; +Cc: 63029

> From: Ruijie Yu <ruijie@netyu.xyz>
> Cc: 63029@debbugs.gnu.org
> Date: Sun, 23 Apr 2023 22:23:16 +0800
> 
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > See also bug#54138 and
> > https://lists.gnu.org/archive/html/emacs-devel/2022-02/msg00917.html.
> 
> Thanks for the link.  I have found the answer in your response there.
> 
> >> And how do we, the elisp users, get this information?
> >
> > I don't understand this question.  Please elaborate: what information
> > do you want to get, besides the width of the characters (which is
> > accessible via char-width-table).
> 
> You mentioning `char-width-table' here and `char-width' on the linked
> thread precisely answered my question.  I was looking for `char-width'
> without knowing its name.  Thanks.

OK, so can we close this issue?

Btw, the recommended method of computing the width of a string is via
string-pixel-width.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#63029: [BUG?] format inconsistency in deciding string widths on different locales
  2023-04-23 14:32     ` Eli Zaretskii
@ 2023-04-23 14:38       ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 7+ messages in thread
From: Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-04-23 14:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 63029-done


Eli Zaretskii <eliz@gnu.org> writes:

> OK, so can we close this issue?

We can -- done.

> Btw, the recommended method of computing the width of a string is via
> string-pixel-width.

Will take a look at this function.  Thanks.

-- 
Best,


RY

[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-23 14:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-23 10:23 bug#63029: [BUG?] format inconsistency in deciding string widths on different locales Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-23 11:06 ` Ihor Radchenko
2023-04-23 11:08 ` Ihor Radchenko
2023-04-23 14:19 ` Eli Zaretskii
2023-04-23 14:23   ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-23 14:32     ` Eli Zaretskii
2023-04-23 14:38       ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).