From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#63029: [BUG?] format inconsistency in deciding string widths on different locales Date: Sun, 23 Apr 2023 17:19:10 +0300 Message-ID: <83bkjeznoh.fsf@gnu.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28388"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 63029@debbugs.gnu.org To: Ruijie Yu Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Apr 23 16:19:18 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pqaYg-0007Dh-EW for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 23 Apr 2023 16:19:18 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pqaYS-0001Y3-Os; Sun, 23 Apr 2023 10:19:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pqaYR-0001Xp-AM for bug-gnu-emacs@gnu.org; Sun, 23 Apr 2023 10:19:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pqaYR-0001i2-2m for bug-gnu-emacs@gnu.org; Sun, 23 Apr 2023 10:19:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pqaYQ-0001aj-Ir for bug-gnu-emacs@gnu.org; Sun, 23 Apr 2023 10:19:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 23 Apr 2023 14:19:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63029 X-GNU-PR-Package: emacs Original-Received: via spool by 63029-submit@debbugs.gnu.org id=B63029.16822595406108 (code B ref 63029); Sun, 23 Apr 2023 14:19:02 +0000 Original-Received: (at 63029) by debbugs.gnu.org; 23 Apr 2023 14:19:00 +0000 Original-Received: from localhost ([127.0.0.1]:46541 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pqaYO-0001aS-GO for submit@debbugs.gnu.org; Sun, 23 Apr 2023 10:19:00 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:34382) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pqaYK-0001a3-QU for 63029@debbugs.gnu.org; Sun, 23 Apr 2023 10:18:59 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pqaYF-0001h6-BB; Sun, 23 Apr 2023 10:18:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=YQHoc8EmpVsPfwBhvgMwyYBGr9ZPszkOstIY74FvtG0=; b=Zeu87623T73LVc8fS/oo bVT1Kk5r/jDw5WV+PvG8zJ4JSP4ariqiFbmIZaNhbm2ZPCzM9z9jZTj+g4Mg+LMIG51QkazJ4zY+6 2kVXVHG34iPIfq4ZxKHcO/1ONBLFYoyf6dNiFkuGhgkgAdUDmBeL3Q0/USQV2JFaTrW7Brt7y0Z6B weR4Vdf3FL5fnTb7jgks6SvqyE8p5MOMRqu4YimPSKjqlvY76V2U/x/AmmEDv6W1MGR9hxW8YZQxL LFOLVfJ9sB7hvr6Q+JgSNp72ltWywjTk/7oYDTWdODtPyVBaLJS6kOUj6Ar2cgqHAkPd+NTkzVlus LvQQYA2IpqIkdg==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pqaYD-0005P9-PL; Sun, 23 Apr 2023 10:18:50 -0400 In-Reply-To: (bug-gnu-emacs@gnu.org) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:260516 Archived-At: > Date: Sun, 23 Apr 2023 18:23:02 +0800 > From: Ruijie Yu via "Bug reports for GNU Emacs, > the Swiss army knife of text editors" > > I don't quite know yet whether this is a bug in Emacs. Here are the > observed results, and note the unicode character: > > --8<---------------cut here---------------start------------->8--- > $ for locale in {en_US,fr_FR,de_DE,zh_CN,ja_JA}.UTF-8; do > printf "$locale\t" > LANG="$locale" src/emacs -Q -batch \ > -eval '(message "%S" (format "%-5.5s" "1234…"))' > done > --8<---------------cut here---------------end--------------->8--- > > This results in the following output: > > --8<---------------cut here---------------start------------->8--- > en_US.UTF-8 "1234…" > fr_FR.UTF-8 "1234…" > de_DE.UTF-8 "1234…" > zh_CN.UTF-8 "1234 " > ja_JA.UTF-8 "1234 " > --8<---------------cut here---------------end--------------->8--- > > Notice that in zh_CN and ja_JA, we have a space instead of the expected > ellipsis character. > > > If this is expected behavior, how do we know how "wide" the `format' > function thinks any given character is? In other words, why _does_ it > think "…" should be two-character wide? This is a kludgey feature: in CJK locales some characters are always considered double-width. See code in characters.el that begins with a comment around line 1140. The function use-cjk-char-width-table defined there is invoked (via the setup-function of the language environment) when the language environment in Emacs is set to one of those CJK locales. The reason for this is that in CJK fonts these characters are supposed to be rendered using full-width glyphs. See also bug#54138 and https://lists.gnu.org/archive/html/emacs-devel/2022-02/msg00917.html. > And how do we, the elisp users, get this information? I don't understand this question. Please elaborate: what information do you want to get, besides the width of the characters (which is accessible via char-width-table).