From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Char-folding: how can we implement matching multiple characters as a single "thing"? Date: Mon, 30 Nov 2015 08:12:50 -0800 Organization: UCLA Computer Science Department Message-ID: <565C7582.8030206@cs.ucla.edu> References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1448900019 21877 80.91.229.3 (30 Nov 2015 16:13:39 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 30 Nov 2015 16:13:39 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Nov 30 17:13:31 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a3R51-00047m-D4 for ged-emacs-devel@m.gmane.org; Mon, 30 Nov 2015 17:13:31 +0100 Original-Received: from localhost ([::1]:41776 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3R50-0006Wj-Pd for ged-emacs-devel@m.gmane.org; Mon, 30 Nov 2015 11:13:30 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56644) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3R4T-0005tR-Ab for emacs-devel@gnu.org; Mon, 30 Nov 2015 11:12:58 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a3R4P-0003c2-Hg for emacs-devel@gnu.org; Mon, 30 Nov 2015 11:12:57 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:37742) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3R4P-0003bn-CE for emacs-devel@gnu.org; Mon, 30 Nov 2015 11:12:53 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id BE5C61601AA for ; Mon, 30 Nov 2015 08:12:51 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 21ZKcQqwLw0t for ; Mon, 30 Nov 2015 08:12:51 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1DA9F160E3D for ; Mon, 30 Nov 2015 08:12:51 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id u3zTD0n29ZPl for ; Mon, 30 Nov 2015 08:12:51 -0800 (PST) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 07FF41601AA for ; Mon, 30 Nov 2015 08:12:51 -0800 (PST) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:195618 Archived-At: On 11/30/2015 07:54 AM, Artur Malabarba wrote: > Does anyone have alternative ideas? Sure, scan the pattern greedily for possible sequences, left-to-right. =20 In your example "fix" should expand to the regexp "\\([f][i]\\|=EF=AC=81\= \)x"=20 (where the "=EF=AC=81" is the ligature character), because once the "fi" = is=20 found, the scanner won't look for "ix" as a single character. This=20 should cause the regexp to grow only polynomially rather than=20 exponentially. The polynomial version won't match as many strings as the=20 exponential version, but in practice it should be good enough.