From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Lars Ingebrigtsen Newsgroups: gmane.emacs.bugs Subject: bug#1877: Request: Regular expressions that can match Unicode general categories Date: Mon, 30 Sep 2019 09:45:15 +0200 Message-ID: <87zhimfcs4.fsf@gnus.org> References: <1231792692.22467.115.camel@eep> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="23300"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: 1877@debbugs.gnu.org To: Derick Eddington Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Sep 30 09:47:51 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEqPO-0005xB-Rn for geb-bug-gnu-emacs@m.gmane.org; Mon, 30 Sep 2019 09:47:50 +0200 Original-Received: from localhost ([::1]:46856 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iEqPN-0000pO-Hd for geb-bug-gnu-emacs@m.gmane.org; Mon, 30 Sep 2019 03:47:49 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:45983) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iEqNf-0000J9-SB for bug-gnu-emacs@gnu.org; Mon, 30 Sep 2019 03:46:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iEqNe-0003WR-Nc for bug-gnu-emacs@gnu.org; Mon, 30 Sep 2019 03:46:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:47986) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iEqNe-0003WL-Kg for bug-gnu-emacs@gnu.org; Mon, 30 Sep 2019 03:46:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1iEqNe-0004rh-Ho for bug-gnu-emacs@gnu.org; Mon, 30 Sep 2019 03:46:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Lars Ingebrigtsen Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 30 Sep 2019 07:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 1877 X-GNU-PR-Package: emacs Original-Received: via spool by 1877-submit@debbugs.gnu.org id=B1877.156982952116057 (code B ref 1877); Mon, 30 Sep 2019 07:46:02 +0000 Original-Received: (at 1877) by debbugs.gnu.org; 30 Sep 2019 07:45:21 +0000 Original-Received: from localhost ([127.0.0.1]:56803 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEqMy-0004Ac-Nr for submit@debbugs.gnu.org; Mon, 30 Sep 2019 03:45:21 -0400 Original-Received: from quimby.gnus.org ([80.91.231.51]:46652) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEqMw-00048G-Ve for 1877@debbugs.gnu.org; Mon, 30 Sep 2019 03:45:19 -0400 Original-Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEqMt-0004sw-Gf; Mon, 30 Sep 2019 09:45:17 +0200 In-Reply-To: <1231792692.22467.115.camel@eep> (Derick Eddington's message of "Mon, 12 Jan 2009 12:38:12 -0800") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:167844 Archived-At: Derick Eddington writes: > A new Scheme major mode I've made [1] requires regular expressions that > can match characters by their Unicode general categories. It seems > Emacs regular expressions do not provide a way to do that directly (I'm > using GNU Emacs 23.0.60.1) (I'm going through old bug reports that unfortunately didn't get any response at the time.) I'm not quite sure what Unicode general categories you're referring to, but the Emacs regexp matcher has gained a bunch of categories in the ten years since you made the request. Are the categories below what you were thinking of? =E2=80=98[:print:]=E2=80=99 This matches any printing character=E2=80=94either whitespace, or a gr= aphic character matched by =E2=80=98[:graph:]=E2=80=99. =E2=80=98[:punct:]=E2=80=99 This matches any punctuation character. (At present, for multibyte characters, it matches anything that has non-word syntax.) =E2=80=98[:space:]=E2=80=99 This matches any character that has whitespace syntax (*note Syntax Class Table::). =E2=80=98[:upper:]=E2=80=99 This matches any upper-case letter, as determined by the current case table (*note Case Tables::). If =E2=80=98case-fold-search=E2=80= =99 is non-=E2=80=98nil=E2=80=99, this also matches any lower-case letter. =E2=80=98[:word:]=E2=80=99 This matches any character that has word syntax (*note Syntax Class Table::). (etc) --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no