unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* format width specifier and locale
@ 2022-02-23 20:02 Joost Kremers
  2022-02-24  6:40 ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: Joost Kremers @ 2022-02-23 20:02 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

Hi list,

While looking into a bug report for a package of mine,[1] it turned out that part
of the bug is caused by a strange interaction between `format` and the system's
locale that looks like a bug, but I'd like to ask here first before I submit a
bug report.

The following two `format` calls produce identical-length strings when issued in
an Emacs with `current-language-environment` set to "English" (or, in my case
"UTF-8"), but when the language environment is set to "Chinese-GBK", the second
call produces a string that is one character shorter:

    (format (concat "%50s") "Boutet de Monvel's calculus and groupoids I")
    (format (concat "%50s") "Boutet de Monvel’s calculus and groupoids I") 

The difference between the two string is the apostrophe character: in the first
string, it's an ASCII apostrophe, in the second string it's a RIGHT SINGLE
QUOTATION MARK.

The effect can be seen in this screen shot:


[-- Attachment #2: Screenshot from 2022-02-23 21.19.14.png --]
[-- Type: image/png, Size: 57261 bytes --]

[-- Attachment #3: Type: text/plain, Size: 375 bytes --]


The font used is DejaVu Sans Mono. I switched language environment with `M-x
set-language-environment`. The second string produced in the Chinese language
environment has one space less.

So is there a reason this is happening, or is it indeed a bug?

TIA

Joost




Footnotes:
[1]  <https://github.com/joostkremers/ebib/issues/243>.

-- 
Joost Kremers
Life has its moments

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: format width specifier and locale
  2022-02-23 20:02 format width specifier and locale Joost Kremers
@ 2022-02-24  6:40 ` Eli Zaretskii
  2022-02-24  7:16   ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2022-02-24  6:40 UTC (permalink / raw)
  To: Joost Kremers; +Cc: emacs-devel

> From: Joost Kremers <joostkremers@fastmail.fm>
> Date: Wed, 23 Feb 2022 21:02:31 +0100
> 
> While looking into a bug report for a package of mine,[1] it turned out that part
> of the bug is caused by a strange interaction between `format` and the system's
> locale that looks like a bug, but I'd like to ask here first before I submit a
> bug report.
> 
> The following two `format` calls produce identical-length strings when issued in
> an Emacs with `current-language-environment` set to "English" (or, in my case
> "UTF-8"), but when the language environment is set to "Chinese-GBK", the second
> call produces a string that is one character shorter:
> 
>     (format (concat "%50s") "Boutet de Monvel's calculus and groupoids I")
>     (format (concat "%50s") "Boutet de Monvel’s calculus and groupoids I") 
> 
> The difference between the two string is the apostrophe character: in the first
> string, it's an ASCII apostrophe, in the second string it's a RIGHT SINGLE
> QUOTATION MARK.

If you play with the value of text-quoting-style, does that affect the
results in any way?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: format width specifier and locale
  2022-02-24  6:40 ` Eli Zaretskii
@ 2022-02-24  7:16   ` Eli Zaretskii
  2022-02-24  7:54     ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2022-02-24  7:16 UTC (permalink / raw)
  To: joostkremers; +Cc: emacs-devel

> Date: Thu, 24 Feb 2022 08:40:23 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
> 
> >     (format (concat "%50s") "Boutet de Monvel's calculus and groupoids I")
> >     (format (concat "%50s") "Boutet de Monvel’s calculus and groupoids I") 
> > 
> > The difference between the two string is the apostrophe character: in the first
> > string, it's an ASCII apostrophe, in the second string it's a RIGHT SINGLE
> > QUOTATION MARK.
> 
> If you play with the value of text-quoting-style, does that affect the
> results in any way?

Also, in which Emacs version(s) do you see this?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: format width specifier and locale
  2022-02-24  7:16   ` Eli Zaretskii
@ 2022-02-24  7:54     ` Eli Zaretskii
  2022-02-24  8:37       ` Joost
  0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2022-02-24  7:54 UTC (permalink / raw)
  To: joostkremers; +Cc: emacs-devel

> Date: Thu, 24 Feb 2022 09:16:23 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
> 
> > Date: Thu, 24 Feb 2022 08:40:23 +0200
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: emacs-devel@gnu.org
> > 
> > >     (format (concat "%50s") "Boutet de Monvel's calculus and groupoids I")
> > >     (format (concat "%50s") "Boutet de Monvel’s calculus and groupoids I") 
> > > 
> > > The difference between the two string is the apostrophe character: in the first
> > > string, it's an ASCII apostrophe, in the second string it's a RIGHT SINGLE
> > > QUOTATION MARK.
> > 
> > If you play with the value of text-quoting-style, does that affect the
> > results in any way?
> 
> Also, in which Emacs version(s) do you see this?

Please ignore these questions.  I see the reason for that behavior:
in the Chinese-GBK language environment, Emacs thinks that the U+2019
character is a double-width character:

  emacs -Q
  C-x RET l Chinese-GBK RET
  M-: (char-width #x2019) RET

This yields 2, not 1.

I'm investigating why we override the default width of U+2019 in this
language-environment (see the function use-cjk-char-width-table in
characters.el and lisp/language/chinese.el that uses it).  Stay tuned
(and maybe file a bug report, so this doesn't get forgotten by any
chance).

Thanks.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: format width specifier and locale
  2022-02-24  7:54     ` Eli Zaretskii
@ 2022-02-24  8:37       ` Joost
  0 siblings, 0 replies; 5+ messages in thread
From: Joost @ 2022-02-24  8:37 UTC (permalink / raw)
  To: emacs-devel



On Thu, 24 Feb 2022, at 08:54, Eli Zaretskii wrote:
> (and maybe file a bug report, so this doesn't get forgotten by any
> chance).

I filed a bug report.

-- 
Joost Kremers
Life has its moments



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-02-24  8:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23 20:02 format width specifier and locale Joost Kremers
2022-02-24  6:40 ` Eli Zaretskii
2022-02-24  7:16   ` Eli Zaretskii
2022-02-24  7:54     ` Eli Zaretskii
2022-02-24  8:37       ` Joost

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).