From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.bugs Subject: bug#64420: string-width of =?UTF-8?Q?=E2=80=A6?= is 2 in CJK environments Date: Wed, 12 Jul 2023 14:11:14 -0700 Message-ID: <62DC36F6-875D-4BFD-8AE8-9F3E1B606FF5@gmail.com> References: <961e5083-ccf3-9d39-175d-5c5957130d50@gutov.dev> <83cz1ao3x0.fsf@gnu.org> <83a5weo2dz.fsf@gnu.org> <0c50468b-dec5-c269-7d71-d255ed6d76ae@gutov.dev> <83y1jm7jpb.fsf@gnu.org> <723f6663-b6fc-55f3-0fc9-881c3acdb1d7@gutov.dev> <83fs5u70de.fsf@gnu.org> <2f3f0d49-84fe-0fd3-09be-e4379343e72c@gutov.dev> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24675"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , 64420@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Jul 12 23:12:23 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qJh8I-00069O-BE for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 12 Jul 2023 23:12:22 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qJh81-0001AB-5V; Wed, 12 Jul 2023 17:12:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qJh7z-0001A1-76 for bug-gnu-emacs@gnu.org; Wed, 12 Jul 2023 17:12:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qJh7y-00049D-VL for bug-gnu-emacs@gnu.org; Wed, 12 Jul 2023 17:12:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qJh7y-0006xV-8X for bug-gnu-emacs@gnu.org; Wed, 12 Jul 2023 17:12:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Yuan Fu Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 12 Jul 2023 21:12:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64420 X-GNU-PR-Package: emacs Original-Received: via spool by 64420-submit@debbugs.gnu.org id=B64420.168919629526692 (code B ref 64420); Wed, 12 Jul 2023 21:12:02 +0000 Original-Received: (at 64420) by debbugs.gnu.org; 12 Jul 2023 21:11:35 +0000 Original-Received: from localhost ([127.0.0.1]:53030 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qJh7X-0006wS-Fd for submit@debbugs.gnu.org; Wed, 12 Jul 2023 17:11:35 -0400 Original-Received: from mail-pf1-f174.google.com ([209.85.210.174]:48297) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qJh7U-0006w7-Ux for 64420@debbugs.gnu.org; Wed, 12 Jul 2023 17:11:33 -0400 Original-Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-666ed230c81so43673b3a.0 for <64420@debbugs.gnu.org>; Wed, 12 Jul 2023 14:11:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689196287; x=1691788287; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cw5G7u33VymELGbQdEwtL8KBU3+L1ZkXzOKNxdhb0kI=; b=b1kb+Nj13q0AJaHJdYHG63vVvJsZi2Leg+tRnFnYKuae/4lxCIFZvcgx2oqwhCRxpl VCrQMXjTHwxVbKanazVllWKe/Q+A81qvy3CDVOYJDzgfU6ODGN+gELm5dflBpQ0b3e1m yJDrMTIdFHMGnhsb6oUuAWDVMINNLXdgn7tMltn7349Yt9mX87RpHuJF0F5OYU3LU+GV uk2YCe1HdKit4mLRbJ0Ni/pbPl6PsmCqcUSAEvyqN8ERLyS+jec8WweQ98QY4+yteZEH Q3VBjRzs61xWSYe4e/ve+EmhXn+spMQST9QXgnh4OiZsaV4gV9LIX4emvVg6+AEvUQe2 ch/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689196287; x=1691788287; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cw5G7u33VymELGbQdEwtL8KBU3+L1ZkXzOKNxdhb0kI=; b=E0+W+B5Erq5Pchg1pHHIcoh8y/L0WtpZS1mPS7ppuUS1ThrjZxIi/hEsVDOfFoEzG1 yIp94uyqbSUf05jUhTe37qgEejgyvVfpX8+s3Fx8rRHtzoSbOhgoo5lIfWtYfnk+wqwA ssUrhw57GpR2Jw4niAdpwggb6xfea8U2G9xw4+FUiFsoaWgaB8ddmk1E+pg9ySJnC87j PEJWhgPPTlDOrIU3IONnS0baK7QSXdzFfVWjDGMcBuRCdaY/Mtu3tQ1h6uGkjuoKl2qg sk5V6pRgWWwku6UWCd/uZgxz55ac/3l2FtKECwdwFxERmxPVE7RKVs4vrt47ZrQfQnP7 8S/g== X-Gm-Message-State: ABy/qLYue2qSB/fcency4QfSbuRaaYKrnW31XMQAgTr0PuZFyCh+V6CD EEZnIheX734qmW+fIkEQTmCeIUZATx0= X-Google-Smtp-Source: APBJJlH9Engu+m7STMs5O1843iPneZdohvWA+7T8oCCbAK4ieSuSGNpvjle0W9X2Y1hYnmohsmEABg== X-Received: by 2002:a05:6a20:8419:b0:12b:6898:2986 with SMTP id c25-20020a056a20841900b0012b68982986mr26595138pzd.1.1689196286726; Wed, 12 Jul 2023 14:11:26 -0700 (PDT) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id i3-20020aa78b43000000b0066a65d4648bsm4022782pfd.151.2023.07.12.14.11.25 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Jul 2023 14:11:26 -0700 (PDT) In-Reply-To: <2f3f0d49-84fe-0fd3-09be-e4379343e72c@gutov.dev> X-Mailer: Apple Mail (2.3731.600.7) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:264989 Archived-At: > On Jul 11, 2023, at 6:17 PM, Dmitry Gutov wrote: >=20 > On 11/07/2023 21:45, Eli Zaretskii wrote: >=20 >>>> Once again, the assumption behind this "feature" of the CJK >>>> language-environments is that whoever uses those environments has = the >>>> terminal emulators configured to use fonts where "=E2=80=A6" and = its ilk have >>>> double size. Of course, if you just switch language-environment on = a >>>> system that is otherwise configured for non-CJK locale, the = terminal >>>> emulator fonts will not magically change, and you get what you see. >>>=20 >>> Does "=E2=80=A6" actually have double width in some of their fonts? >> That's the assumption, yes. (And not only this one character, you = can >> see which characters we assume have the same width in the function I >> pointed out earlier in this thread, which we run when the >> language-environment is switched to something CJK.) It was = definitely >> correct at some point in the past, but the big question is whether it >> is still correct. I don't know who can tell us that nowadays. >=20 > Whole ranges of characters, I see. Here=E2=80=99s what I know: In a CJK =E2=80=9Ccontext=E2=80=9D, = =E2=80=9C=E2=80=A6=E2=80=9D is supposed to be one ideograph wide (like = all CJK punctuation), ie, width=3D2. However, it=E2=80=99s not as simple as =E2=80=9Cthey used the wrong = font=E2=80=9D, because both Latin and CJK use the same Unicode code = point for =E2=80=9C=E2=80=A6=E2=80=9D, but expect different glyphs. In = publication, this is solved by manually marking the text with style or = font, so the software uses the desired glyph. Terminals and editors = don=E2=80=99t have this luxury. BTW it=E2=80=99s not just ellipses, CJK and Latin shares the same code = points for quotes, em dash and middle dot while expecting different = glyphs for them. Since most terminal and editor (especially terminal) quires ASCII/Latin = font before falling back to CJK fonts, I expect most terminal and editor = to show the Latin glyph for =E2=80=9C=E2=80=A6=E2=80=9D (width=3D1) most = of the time. So practically, it would be correct most of the time if we assume the = following code points have a width of 1, regardless of locale: =E2=80=93 HORIZONTAL ELLIPSIS =E2=80=A6 =E2=80=93 LEFT/RIGHT DOUBLE QUOTATION MARK =E2=80=9C=E2=80=9D =E2=80=93 LEFT/RIGHT SINGLE QUOTATION MARK =E2=80=98=E2=80=99 =E2=80=93 EM DASH =E2=80=94 =E2=80=93 MIDDLE DOT =C2=B7 But obviously if someone configures their terminal or editor to use CJK = font first, these characters MIGHT have width =3D 2. I said MIGHT = because there are plenty CJK fonts that uses the 1-width Latin glyph for = these characters by default. It might be helpful to have a wrapper string-width that considers = heuristics like this, while string-width goes strictly by Unicode and = locale. Source: = https://www.w3.org/TR/clreq/#table_of_non-bracket_indication_punctuation_m= arks Yuan=