* format width specifier and locale
@ 2022-02-23 20:02 Joost Kremers
2022-02-24 6:40 ` Eli Zaretskii
0 siblings, 1 reply; 5+ messages in thread
From: Joost Kremers @ 2022-02-23 20:02 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 950 bytes --]
Hi list,
While looking into a bug report for a package of mine,[1] it turned out that part
of the bug is caused by a strange interaction between `format` and the system's
locale that looks like a bug, but I'd like to ask here first before I submit a
bug report.
The following two `format` calls produce identical-length strings when issued in
an Emacs with `current-language-environment` set to "English" (or, in my case
"UTF-8"), but when the language environment is set to "Chinese-GBK", the second
call produces a string that is one character shorter:
(format (concat "%50s") "Boutet de Monvel's calculus and groupoids I")
(format (concat "%50s") "Boutet de Monvel’s calculus and groupoids I")
The difference between the two string is the apostrophe character: in the first
string, it's an ASCII apostrophe, in the second string it's a RIGHT SINGLE
QUOTATION MARK.
The effect can be seen in this screen shot:
[-- Attachment #2: Screenshot from 2022-02-23 21.19.14.png --]
[-- Type: image/png, Size: 57261 bytes --]
[-- Attachment #3: Type: text/plain, Size: 375 bytes --]
The font used is DejaVu Sans Mono. I switched language environment with `M-x
set-language-environment`. The second string produced in the Chinese language
environment has one space less.
So is there a reason this is happening, or is it indeed a bug?
TIA
Joost
Footnotes:
[1] <https://github.com/joostkremers/ebib/issues/243>.
--
Joost Kremers
Life has its moments
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: format width specifier and locale
2022-02-23 20:02 format width specifier and locale Joost Kremers
@ 2022-02-24 6:40 ` Eli Zaretskii
2022-02-24 7:16 ` Eli Zaretskii
0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2022-02-24 6:40 UTC (permalink / raw)
To: Joost Kremers; +Cc: emacs-devel
> From: Joost Kremers <joostkremers@fastmail.fm>
> Date: Wed, 23 Feb 2022 21:02:31 +0100
>
> While looking into a bug report for a package of mine,[1] it turned out that part
> of the bug is caused by a strange interaction between `format` and the system's
> locale that looks like a bug, but I'd like to ask here first before I submit a
> bug report.
>
> The following two `format` calls produce identical-length strings when issued in
> an Emacs with `current-language-environment` set to "English" (or, in my case
> "UTF-8"), but when the language environment is set to "Chinese-GBK", the second
> call produces a string that is one character shorter:
>
> (format (concat "%50s") "Boutet de Monvel's calculus and groupoids I")
> (format (concat "%50s") "Boutet de Monvel’s calculus and groupoids I")
>
> The difference between the two string is the apostrophe character: in the first
> string, it's an ASCII apostrophe, in the second string it's a RIGHT SINGLE
> QUOTATION MARK.
If you play with the value of text-quoting-style, does that affect the
results in any way?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: format width specifier and locale
2022-02-24 6:40 ` Eli Zaretskii
@ 2022-02-24 7:16 ` Eli Zaretskii
2022-02-24 7:54 ` Eli Zaretskii
0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2022-02-24 7:16 UTC (permalink / raw)
To: joostkremers; +Cc: emacs-devel
> Date: Thu, 24 Feb 2022 08:40:23 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
>
> > (format (concat "%50s") "Boutet de Monvel's calculus and groupoids I")
> > (format (concat "%50s") "Boutet de Monvel’s calculus and groupoids I")
> >
> > The difference between the two string is the apostrophe character: in the first
> > string, it's an ASCII apostrophe, in the second string it's a RIGHT SINGLE
> > QUOTATION MARK.
>
> If you play with the value of text-quoting-style, does that affect the
> results in any way?
Also, in which Emacs version(s) do you see this?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: format width specifier and locale
2022-02-24 7:16 ` Eli Zaretskii
@ 2022-02-24 7:54 ` Eli Zaretskii
2022-02-24 8:37 ` Joost
0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2022-02-24 7:54 UTC (permalink / raw)
To: joostkremers; +Cc: emacs-devel
> Date: Thu, 24 Feb 2022 09:16:23 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
>
> > Date: Thu, 24 Feb 2022 08:40:23 +0200
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: emacs-devel@gnu.org
> >
> > > (format (concat "%50s") "Boutet de Monvel's calculus and groupoids I")
> > > (format (concat "%50s") "Boutet de Monvel’s calculus and groupoids I")
> > >
> > > The difference between the two string is the apostrophe character: in the first
> > > string, it's an ASCII apostrophe, in the second string it's a RIGHT SINGLE
> > > QUOTATION MARK.
> >
> > If you play with the value of text-quoting-style, does that affect the
> > results in any way?
>
> Also, in which Emacs version(s) do you see this?
Please ignore these questions. I see the reason for that behavior:
in the Chinese-GBK language environment, Emacs thinks that the U+2019
character is a double-width character:
emacs -Q
C-x RET l Chinese-GBK RET
M-: (char-width #x2019) RET
This yields 2, not 1.
I'm investigating why we override the default width of U+2019 in this
language-environment (see the function use-cjk-char-width-table in
characters.el and lisp/language/chinese.el that uses it). Stay tuned
(and maybe file a bug report, so this doesn't get forgotten by any
chance).
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: format width specifier and locale
2022-02-24 7:54 ` Eli Zaretskii
@ 2022-02-24 8:37 ` Joost
0 siblings, 0 replies; 5+ messages in thread
From: Joost @ 2022-02-24 8:37 UTC (permalink / raw)
To: emacs-devel
On Thu, 24 Feb 2022, at 08:54, Eli Zaretskii wrote:
> (and maybe file a bug report, so this doesn't get forgotten by any
> chance).
I filed a bug report.
--
Joost Kremers
Life has its moments
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-02-24 8:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-23 20:02 format width specifier and locale Joost Kremers
2022-02-24 6:40 ` Eli Zaretskii
2022-02-24 7:16 ` Eli Zaretskii
2022-02-24 7:54 ` Eli Zaretskii
2022-02-24 8:37 ` Joost
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.