From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.bugs Subject: bug#25366: 26.0.50; [:blank:] character class should match all Unicode horizontal whitespace Date: Fri, 06 Jan 2017 19:21:05 +0000 Message-ID: References: <838tqpecaq.fsf@gnu.org> <83bmvkcjez.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a1142d768fd46f0054571eb0f X-Trace: blaine.gmane.org 1483730548 6446 195.159.176.226 (6 Jan 2017 19:22:28 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 6 Jan 2017 19:22:28 +0000 (UTC) Cc: 25366-done@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Jan 06 20:22:23 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cPa5c-00082r-99 for geb-bug-gnu-emacs@m.gmane.org; Fri, 06 Jan 2017 20:22:12 +0100 Original-Received: from localhost ([::1]:54620 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cPa5e-0003WQ-Of for geb-bug-gnu-emacs@m.gmane.org; Fri, 06 Jan 2017 14:22:14 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38557) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cPa5W-0003Vf-Ps for bug-gnu-emacs@gnu.org; Fri, 06 Jan 2017 14:22:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cPa5S-0003Jn-OI for bug-gnu-emacs@gnu.org; Fri, 06 Jan 2017 14:22:06 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:58247) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cPa5S-0003Jh-Kt for bug-gnu-emacs@gnu.org; Fri, 06 Jan 2017 14:22:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cPa5S-0001Er-DA for bug-gnu-emacs@gnu.org; Fri, 06 Jan 2017 14:22:02 -0500 Resent-From: Philipp Stephani Original-Sender: "Debbugs-submit" Resent-To: bug-gnu-emacs@gnu.org Resent-Date: Fri, 06 Jan 2017 19:22:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: cc-closed 25366 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: confirmed Mail-Followup-To: 25366@debbugs.gnu.org, p.stephani2@gmail.com, p.stephani2@gmail.com Original-Received: via spool by 25366-done@debbugs.gnu.org id=D25366.14837304834681 (code D ref 25366); Fri, 06 Jan 2017 19:22:02 +0000 Original-Received: (at 25366-done) by debbugs.gnu.org; 6 Jan 2017 19:21:23 +0000 Original-Received: from localhost ([127.0.0.1]:45413 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cPa4p-0001DR-At for submit@debbugs.gnu.org; Fri, 06 Jan 2017 14:21:23 -0500 Original-Received: from mail-oi0-f54.google.com ([209.85.218.54]:32920) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cPa4o-0001DD-6z for 25366-done@debbugs.gnu.org; Fri, 06 Jan 2017 14:21:22 -0500 Original-Received: by mail-oi0-f54.google.com with SMTP id 128so431472243oig.0 for <25366-done@debbugs.gnu.org>; Fri, 06 Jan 2017 11:21:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gTlP+i3pcqgLDzVnKWcodyQ9nfeD8DtBUJXR6925rQg=; b=S6SpmJ0J8aR503TIbbFN5NgYHD4uHsu21ERf2oWto5Dn1ew8YURcaEfm5/KMnUdpcw CcrkHDlFRiDrEe20nnIm6WsDsx3Ooy5O6XUp0n91G4x6U6TcgH+zeKECkM5tVo3mjjow lXzWhudfg+YOH/CkCsJSPXCOgDc2CKJdkRKLBiyizSP5PA29Y4i2D8GBdfTIEzQzEhdi esGhhAX0nIV0FWaPEkE1QYYCxPUJMou6ikKgNGMaFYARjfOukc+6bpcb8SLkPXIc3G9M Zn6UpOtK+CE8qTd5hxcmY/XCOTrX7tDZwxysfYEbqWQsTL8Bm5LGR7640jiB2NgZlTf0 iuLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gTlP+i3pcqgLDzVnKWcodyQ9nfeD8DtBUJXR6925rQg=; b=MlE0wjUut+vmDz4sXn8pbibqEaxZO5H8jwA2L2kXWP7jI+/ZQa5xETDccoZ99kozhx l86msfXz6bcTn08/6iLSJu3tdeVc5BjhGm5C/tPlZJa9P1japoSA+MsZmSttlEnZDZN3 ByKK7N2g1iFQcEGrwYzMrg5PeomAFXNTfXQkToiPNKVQpnRr05QYOK/u3EVi6UUHYn32 YUYmkdOYXPIqp/106SW3U/BTSiPBK6lJuskejwwhSdC418oIU7OAT0UEFJnLef0DVWZG nNbdLjfMoov3OYuIrB4p3WWXDHfgLCNKc6Hz30fAa855s1m5QeUxCfMUgrlAvGX7wNqn Td+g== X-Gm-Message-State: AIkVDXJVpv4F1WV5r6qFxDioNECFwn+N6RlRwsfF/9xGCgJrRuECEgY8pGT3l/9dUhqs3ALezRIDEn6LUsPgdQ== X-Received: by 10.157.40.121 with SMTP id h54mr1612460otd.179.1483730476417; Fri, 06 Jan 2017 11:21:16 -0800 (PST) In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:127856 Archived-At: --001a1142d768fd46f0054571eb0f Content-Type: text/plain; charset=UTF-8 Philipp Stephani schrieb am Fr., 6. Jan. 2017 um 20:10 Uhr: > Eli Zaretskii schrieb am Fr., 6. Jan. 2017 um 16:11 Uhr: > > > From: Philipp Stephani > > Date: Fri, 06 Jan 2017 15:00:22 +0000 > > Cc: 25366@debbugs.gnu.org > > > > > http://www.unicode.org/reports/tr18/tr18-19.html#Compatibility_Properties > > > > Patches to that effect are welcome. > > > > Here's a patch. > > Thanks. A few minor comments below. > > > +/* Return true if C is a horizontal whitespace character, as defined > > + by http://www.unicode.org/reports/tr18/tr18-19.html#blank. */ > > +bool > > +blankp (int c) > > +{ > > + if (c == '\t') > > + return true; > > Why does this test explicitly only for a TAB? What about SPC, for > example? > > > Because TAB is the only character that is blank, but doesn't have the > general category Zs. > I've now also included space and added a comment. The risk that the > general category of space will ever be changed seems very small. > > > > > --- a/doc/lispref/searching.texi > > +++ b/doc/lispref/searching.texi > > @@ -553,7 +553,10 @@ Char Classes > > (@pxref{Character Properties}) indicates they are alphabetic > > characters. > > @item [:blank:] > > -This matches space and tab only. > > +This matches horizontal whitespace, as defined by Unicode Technical > > +Standard #18. In particular, it matches tabs and characters whose > > +Unicode @samp{general-category} property (@pxref{Character > > +Properties}) indicates they are spacing separators. > > Similarly here: I find the lack of reference to a space potentially > confusing. > > > Added. > > > > > +** The regular expression character class [:blank:] now matches > > +Unicode horizontal whitespace as defined in > > +http://www.unicode.org/reports/tr18/tr18-19.html#blank. > > The reference to a particular version of UTS#18 might become obsolete > when a new version is released. So I suggest to provide a general > reference to the report and its section, not an exact URL. > > > Done. > Pushed to master as 512e9886be. --001a1142d768fd46f0054571eb0f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


Philip= p Stephani <p.stephani2@gmail.c= om> schrieb am Fr., 6. Jan. 2017 um 20:10=C2=A0Uhr:
Eli Zaretskii &= lt;el= iz@gnu.org> schrieb am Fr., 6. Jan. 2017 um 16:11=C2=A0Uhr:
> From: Ph= ilipp Stephani <p.stephani2@gmail.com>
> Date: Fri, 06 Jan 2017 15:00:22 +0000
> Cc: 25366@debbugs.gnu.org
>
>=C2=A0 http://www.unicode.org/reports/tr18/tr18-19.html#Compatibility_Properti= es
>
>=C2=A0 Patches to that effect are welcome.
>
> Here's a patch.

Thanks.=C2=A0 A few minor comments below.

> +/* Return true if C is a horizontal whitespace character, as defined<= br class=3D"gmail_msg"> > +=C2=A0 =C2=A0by http= ://www.unicode.org/reports/tr18/tr18-19.html#blank.=C2=A0 */
> +bool
> +blankp (int c)
> +{
> +=C2=A0 if (c =3D=3D '\t')
> +=C2=A0 =C2=A0 return true;

Why does this test explicitly only for a TAB?=C2=A0 What about SPC, for
example?

<= div class=3D"gmail_quote gmail_msg">
Because TAB is= the only character that is blank, but doesn't have the general categor= y Zs.
I've now also included space and ad= ded a comment. The risk that the general category of space will ever be cha= nged seems very small.
=C2=A0

> --- a/doc/lispref/searching.texi
> +++ b/doc/lispref/searching.texi
> @@ -553,7 +553,10 @@ Char Classes
>=C2=A0 (@pxref{Character Properties}) indicates they are alphabetic
>=C2=A0 characters.
>=C2=A0 @item [:blank:]
> -This matches space and tab only.
> +This matches horizontal whitespace, as defined by Unicode Technical > +Standard #18.=C2=A0 In particular, it matches tabs and characters who= se
> +Unicode @samp{general-category} property (@pxref{Character
> +Properties}) indicates they are spacing separators.

Similarly here: I find the lack of reference to a space potentially
confusing.
Added.
=
=C2=A0

> +** The regular expression character class [:blank:] now matches
> +Unicode horizontal whitespace as defined in
> +http://www.unicode.o= rg/reports/tr18/tr18-19.html#blank.

The reference to a particular version of UTS#18 might become obsolete
when a new version is released.=C2=A0 So I suggest to provide a general
reference to the report and its section, not an exact URL.

=
Done.=C2=A0


Pushed to master as 512e9886be.=C2=A0=
--001a1142d768fd46f0054571eb0f--