From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#64420: string-width of =?UTF-8?Q?=E2=80=A6?= is 2 in CJK environments Date: Thu, 27 Jul 2023 04:52:57 +0300 Message-ID: <58f0e549-6c76-5063-55ec-addc126e9abc@gutov.dev> References: <961e5083-ccf3-9d39-175d-5c5957130d50@gutov.dev> <83cz1ao3x0.fsf@gnu.org> <83a5weo2dz.fsf@gnu.org> <0c50468b-dec5-c269-7d71-d255ed6d76ae@gutov.dev> <83y1jm7jpb.fsf@gnu.org> <723f6663-b6fc-55f3-0fc9-881c3acdb1d7@gutov.dev> <83fs5u70de.fsf@gnu.org> <2f3f0d49-84fe-0fd3-09be-e4379343e72c@gutov.dev> <62DC36F6-875D-4BFD-8AE8-9F3E1B606FF5@gmail.com> <83ilao5qqo.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19400"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: 64420@debbugs.gnu.org To: Eli Zaretskii , Yuan Fu Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jul 27 04:16:53 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qOqYe-0004q1-PZ for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 27 Jul 2023 04:16:53 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qOqCZ-00066G-Ej; Wed, 26 Jul 2023 21:54:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qOqCY-000668-GG for bug-gnu-emacs@gnu.org; Wed, 26 Jul 2023 21:54:02 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qOqCY-0006T0-91 for bug-gnu-emacs@gnu.org; Wed, 26 Jul 2023 21:54:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qOqCY-00084g-5d for bug-gnu-emacs@gnu.org; Wed, 26 Jul 2023 21:54:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 27 Jul 2023 01:54:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64420 X-GNU-PR-Package: emacs Original-Received: via spool by 64420-submit@debbugs.gnu.org id=B64420.169042278830978 (code B ref 64420); Thu, 27 Jul 2023 01:54:02 +0000 Original-Received: (at 64420) by debbugs.gnu.org; 27 Jul 2023 01:53:08 +0000 Original-Received: from localhost ([127.0.0.1]:40409 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qOqBf-00083a-G2 for submit@debbugs.gnu.org; Wed, 26 Jul 2023 21:53:07 -0400 Original-Received: from wout3-smtp.messagingengine.com ([64.147.123.19]:43243) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qOqBd-000836-RI for 64420@debbugs.gnu.org; Wed, 26 Jul 2023 21:53:06 -0400 Original-Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 1C7983200681; Wed, 26 Jul 2023 21:53:00 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Wed, 26 Jul 2023 21:53:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm2; t= 1690422779; x=1690509179; bh=KsTCsoN5BBj+gOplgwMf36iHWL4RhODqiOF WoTpnEws=; b=G1kVPmXtio1RbQRVkqyu1MHEmWdNXNjYKaOGP2nPyec8mPWM33z iCd12x4f5oLrL+qPX8DNAKkWQyypQp/A4R2a1LwdlfqlwecFa7G+WT9Xu9rou1HV B3puOkTnCK4D+Yr53tGL6w1lS8B3SoV549i4sXR/2F5au5wEMUU9B2DXplYDiBe4 PnhVPKKTozakipcVhaG3bFHZMhtEDIujmV28UJxSfU/hWE+T832LRNwqUNQ6VvpX Afn4bBZV7IUSuprJlV9gjzKShuzc3+XmtdviahPuX0PI8vhn2TDoRsPv/yOyolNr I8xJY706IVjumxdi6FLx3TSNhea9+BLoIhA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1690422779; x=1690509179; bh=KsTCsoN5BBj+gOplgwMf36iHWL4RhODqiOF WoTpnEws=; b=cC+sIWIFGmH97d0QuLzLPcWdMHCNa+9DcTP3fM0e+AcsyxFqGJd pMdllm/ONddGBa8+CtUG+nfyxUCnGIHDE/TBuGaECY+DTO7Z+NIjL++BNWSs7qoM Q6ReirbQo8A1mj+nDVJMKmVmTSqRMUy2mbhqB10AfESs1zHryjZUPxeYGYAJzsK/ gkFNRelOqM9rfAy2GbSmTPw3RoniV77FoBmSrnYR6Yh6DBd1GEP65FwVF9uOmQdo 37nopjxgcClV+uVR8egOPq+TLM6CS4oPcvox6cAqWeom99Vru+ukY0DKxHadffv1 lp24TJE6bJdLCXugXlVYZ99QVz37pzZQW5g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrieefgdehtdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefkffggfgfuvfevfhfhjggtgfesthekredttdefjeenucfhrhhomhepffhmihht rhihucfiuhhtohhvuceoughmihhtrhihsehguhhtohhvrdguvghvqeenucggtffrrghtth gvrhhnpefhffehleejffegffeugefhkeektdffgfehjedvgeejtedtudehueffgffgfeej heenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegumh hithhrhiesghhuthhovhdruggvvh X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 26 Jul 2023 21:52:58 -0400 (EDT) Content-Language: en-US In-Reply-To: <83ilao5qqo.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:266153 Archived-At: On 13/07/2023 08:23, Eli Zaretskii wrote: >> From: Yuan Fu >> Date: Wed, 12 Jul 2023 14:11:14 -0700 >> Cc: Eli Zaretskii, >> 64420@debbugs.gnu.org >> >> Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph wide (like all CJK punctuation), ie, width=2. >> >> However, it’s not as simple as “they used the wrong font”, because both Latin and CJK use the same Unicode code point for “…”, but expect different glyphs. In publication, this is solved by manually marking the text with style or font, so the software uses the desired glyph. Terminals and editors don’t have this luxury. >> >> BTW it’s not just ellipses, CJK and Latin shares the same code points for quotes, em dash and middle dot while expecting different glyphs for them. >> >> Since most terminal and editor (especially terminal) quires ASCII/Latin font before falling back to CJK fonts, I expect most terminal and editor to show the Latin glyph for “…” (width=1) most of the time. >> >> So practically, it would be correct most of the time if we assume the following code points have a width of 1, regardless of locale: >> >> – HORIZONTAL ELLIPSIS … >> – LEFT/RIGHT DOUBLE QUOTATION MARK “” >> – LEFT/RIGHT SINGLE QUOTATION MARK ‘’ >> – EM DASH — >> – MIDDLE DOT · >> >> But obviously if someone configures their terminal or editor to use CJK font first, these characters MIGHT have width = 2. I said MIGHT because there are plenty CJK fonts that uses the 1-width Latin glyph for these characters by default. >> >> It might be helpful to have a wrapper string-width that considers heuristics like this, while string-width goes strictly by Unicode and locale. > Thanks. My conclusion from the above is a bit different: we should > introduce a user option to modify the behavior of > use-cjk-char-width-table, such that users who have fonts where these > characters are not double-width could have the width of these > characters left at their Unicode values. We could add an option, and then go with the default value which corresponds to whatever seems the common opinion here. Anyway, it doesn't seem like anybody else in this discussion is better equipped to choose that user option's name, or write the rest of the patch.