From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Kangas Newsgroups: gmane.emacs.bugs Subject: bug#29871: 25.3; ZWJ word-boundaries in regexps Date: Sun, 29 Sep 2019 01:28:02 +0200 Message-ID: References: <87k1x8f0qr.fsf@nagas.meson.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="162014"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Mark Shoulson , 29871@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Sep 29 01:29:14 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEM9K-000g14-Gh for geb-bug-gnu-emacs@m.gmane.org; Sun, 29 Sep 2019 01:29:14 +0200 Original-Received: from localhost ([::1]:35388 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iEM9J-0001gC-9J for geb-bug-gnu-emacs@m.gmane.org; Sat, 28 Sep 2019 19:29:13 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44773) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iEM99-0001fm-IW for bug-gnu-emacs@gnu.org; Sat, 28 Sep 2019 19:29:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iEM98-0007eH-Eg for bug-gnu-emacs@gnu.org; Sat, 28 Sep 2019 19:29:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:42710) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iEM98-0007df-B4 for bug-gnu-emacs@gnu.org; Sat, 28 Sep 2019 19:29:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1iEM98-0003Fi-5b for bug-gnu-emacs@gnu.org; Sat, 28 Sep 2019 19:29:02 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <87k1x8f0qr.fsf@nagas.meson.org> Resent-From: Stefan Kangas Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 28 Sep 2019 23:29:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 29871 X-GNU-PR-Package: emacs Original-Received: via spool by 29871-submit@debbugs.gnu.org id=B29871.156971330212432 (code B ref 29871); Sat, 28 Sep 2019 23:29:02 +0000 Original-Received: (at 29871) by debbugs.gnu.org; 28 Sep 2019 23:28:22 +0000 Original-Received: from localhost ([127.0.0.1]:51528 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEM8T-0003EN-L4 for submit@debbugs.gnu.org; Sat, 28 Sep 2019 19:28:21 -0400 Original-Received: from mail-pl1-f175.google.com ([209.85.214.175]:33526) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEM8R-0003E0-6g for 29871@debbugs.gnu.org; Sat, 28 Sep 2019 19:28:20 -0400 Original-Received: by mail-pl1-f175.google.com with SMTP id d22so2446784pls.0 for <29871@debbugs.gnu.org>; Sat, 28 Sep 2019 16:28:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=sKLUHbbMVo4mUfA+DvAHB0O6zidHtRl+UK4ggVPXm0E=; b=oqbT28CcEgF7z2OpNJKeNrfprdoJNdhNZd6SsLc9ldiTtEHHR3wwINtd5Y6kNmxroL jjmFdguJAOHVnBVdGRQYuNxKiv1zctCnwmBRdzPDihdpqo78U33Oae+Lkerj6ZAGBs7h KGCgeo//Rdh2IC3WsR5h3Qw5LwK8bvY3fBUvUJGVpq06NifgBhWoh9fFGikv1DaefmD4 KiZaFUQHjV8PTog2ZBU7UED6irrK9cwaJZmZtmIwi6ssaDKKOmU/e4PPAjmsbAlN3Bzq KlWXZ/lb0wFvLQdFq9iHMjZohP576Bw6MpbiolHgTau7r2oowg+dR5KZKQF388ltLJoh pYHQ== X-Gm-Message-State: APjAAAWWGUXc5R0WH1UTZ7WMkTSek/+deojQlvpCJLz+97x90RUwTEk/ QQXy9qDGLbusPIU/QfANdvbVKeypsoxqBmuXFQc= X-Google-Smtp-Source: APXvYqzah5DVPr8rzPi5qpZ/zuufdTx+NXaU3o1jTrkmqbqphLhaKY68KwwWnVGBg+czZVO0MRP+lF3bGqoYBZivqF4= X-Received: by 2002:a17:902:d88f:: with SMTP id b15mr12133248plz.251.1569713293483; Sat, 28 Sep 2019 16:28:13 -0700 (PDT) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:167544 Archived-At: tags 29871 + notabug close 29871 quit Eli Zaretskii writes: >> From: "Mark Shoulson" >> Date: Wed, 27 Dec 2017 14:07:40 -0500 >> >> According to http://unicode.org/reports/tr29/#Word_Boundaries rule WB4, >> it would seem that a ZWJ character (U+200D ZERO WIDTH JOINER) between >> two "word" characters should not constitute a word boundary. And yet: >> >> (string-match "\\<" "foo\u200Dfbar" 1) >> >> evaluates to 4 (the 1 is to skip the word-beginning at the start of the >> string). Or you can search for "\\b" or "\\>" and get 3. Either way, >> indicative of a word-break at the ZWJ character. Is this correct? > > Emacs considers a change of script as a word break, and U+200D's > script is 'symbol', which is different from 'latin', the script of the > ASCII characters. According to the above explananation, this behaviour is expected. I'm therefore closing this as notabug. Best regards, Stefan Kangas