From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daphne Preston-Kendal Newsgroups: gmane.emacs.bugs Subject: bug#48192: forward-word and friends have inconsistent behaviour with Unicode and ASCII punctuation Date: Mon, 3 May 2021 16:37:51 +0200 Message-ID: <6D537AD9-6B73-42C6-BA7D-D10071135E66@nonceword.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.80.0.2.43\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34754"; mail-complaints-to="usenet@ciao.gmane.io" To: 48192@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon May 03 17:02:11 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lda5K-0008w2-Qn for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 03 May 2021 17:02:10 +0200 Original-Received: from localhost ([::1]:36912 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lda5J-00054x-Tz for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 03 May 2021 11:02:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40746) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lda5C-00054Y-KL for bug-gnu-emacs@gnu.org; Mon, 03 May 2021 11:02:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:35292) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lda5C-0008B3-Cm for bug-gnu-emacs@gnu.org; Mon, 03 May 2021 11:02:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lda5C-0005aW-Ay for bug-gnu-emacs@gnu.org; Mon, 03 May 2021 11:02:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Daphne Preston-Kendal Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 03 May 2021 15:02:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 48192 X-GNU-PR-Package: emacs X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.162005409921471 (code B ref -1); Mon, 03 May 2021 15:02:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 3 May 2021 15:01:39 +0000 Original-Received: from localhost ([127.0.0.1]:46836 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lda4o-0005aF-W1 for submit@debbugs.gnu.org; Mon, 03 May 2021 11:01:39 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:45038) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldZhw-0005MR-Iu for submit@debbugs.gnu.org; Mon, 03 May 2021 10:38:01 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:33594) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ldZhw-0002CO-C5 for bug-gnu-emacs@gnu.org; Mon, 03 May 2021 10:38:00 -0400 Original-Received: from wout3-smtp.messagingengine.com ([64.147.123.19]:34449) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ldZhu-0000yP-A7 for bug-gnu-emacs@gnu.org; Mon, 03 May 2021 10:38:00 -0400 Original-Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 22C10183D for ; Mon, 3 May 2021 10:37:55 -0400 (EDT) Original-Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Mon, 03 May 2021 10:37:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:message-id:mime-version:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=U0L50r XDZeWM6dTskXXuR7tezmVPZjYVrh9RxeiToLQ=; b=Qyud9GXQ+xQqKEbo7Bsdcz mzoxEAP0YZUGAKtrGxawBhXVAGY25V0fExwIJW6uqi7t6fqoMfSgsE/daimZq9jw ITJRsoEJcMXzWfk4x0TbuLFa+O6bNAYvwdFUcuCHdZ9i6YeF69laUwpv29B9scUD HR8zbKs5e9rxAU43h+QCQUHC7DDeXzNdN5Z5KWZ+72Gy/9zBKntEUAC6VVYLnFrL n20S7xYOcm+O/AIQT8a3eBNDtV4XKUSKufkcTQjfNBI3WOH4L1OMBhaubRxA4pH5 iUwxJTBeyLOgbeucXtLpp2bJrO3kQkKqlCBWJUHtW5dV80nA2XdwQqbqjY/vOejA == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefgedgkedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhtgfgggfukfffvffosehtqhhmtd hhtdejnecuhfhrohhmpeffrghphhhnvgcurfhrvghsthhonhdqmfgvnhgurghluceoughp khesnhhonhgtvgifohhrugdrohhrgheqnecuggftrfgrthhtvghrnhephfefvddugedtle evjeekhfevvdeikefhuedvueeitdfhudejhffgieetjeekveeknecuffhomhgrihhnpehu nhhitghouggvrdhorhhgpdhgihhtlhgrsgdrtghomhenucfkphepleehrdeltddrvdefge drudefleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhm peguphhksehnohhntggvfihorhgurdhorhhg X-ME-Proxy: Original-Received: from smtpclient.apple (ip5f5aea8b.dynamic.kabel-deutschland.de [95.90.234.139]) by mail.messagingengine.com (Postfix) with ESMTPA for ; Mon, 3 May 2021 10:37:54 -0400 (EDT) X-Mailer: Apple Mail (2.3654.80.0.2.43) Received-SPF: none client-ip=64.147.123.19; envelope-from=dpk@nonceword.org; helo=wout3-smtp.messagingengine.com X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Mon, 03 May 2021 11:01:38 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:205517 Archived-At: forward-word, backward-word etc. have inconsistent behaviour when applied to text containing ASCII straight quotation marks vs. Unicode quotation marks. The word don't with a straight quote (U+0027) counts as a single word, and forward-word and backward-word will move over the whole thing. Meanwhile, don=E2=80=99t with a curly quote (U+2019) counts as two words, and the cursor will stop at =E2=80=98don=E2=80=99 and =E2=80=98t=E2=80=99 separately. = (Fundamental mode, Emacs 27.2.) This also means count-words/count-words-region give surprising results when applied to text containing Unicode curly apostrophes, since they work by counting the number of times the cursor can move forward-word-strictly between given start and end points. (Since it uses forward-word-strictly and not forward-word, the problem can=E2=80=99t be = solved by customizing find-word-boundary-function-table.) The Right Thing in my view would be for Emacs to use the Unicode TR29 word boundary rules to work out where to put the cursor when forward-word and backward-word are invoked. They handle punctuation characters correctly, and rules are not too complicated. However, how this would interact with the existing find-word-boundary-function-table customization method, I don=E2=80=99t = know. CLDR makes customizations of the rules for specific (human) languages; perhaps they could be ported into Emacs somehow. As a temporary workaround to get correct-ish word counts for my documents, I=E2=80=99ve hacked up a function that uses how-many instead = of forward-word to count the number of words in a region.