From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Character group folding in searches Date: Sat, 07 Feb 2015 10:02:52 -0500 Message-ID: References: <83zj8rcdpi.fsf@gnu.org> <83k2zudfqk.fsf@gnu.org> <83bnl6d8d6.fsf@gnu.org> <831tm2cdm3.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1423321386 14236 80.91.229.3 (7 Feb 2015 15:03:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 7 Feb 2015 15:03:06 +0000 (UTC) Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 07 16:03:06 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YK6uX-0002vs-AB for ged-emacs-devel@m.gmane.org; Sat, 07 Feb 2015 16:03:05 +0100 Original-Received: from localhost ([::1]:53406 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YK6uW-0003BJ-3r for ged-emacs-devel@m.gmane.org; Sat, 07 Feb 2015 10:03:04 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56242) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YK6uS-0003BE-VV for emacs-devel@gnu.org; Sat, 07 Feb 2015 10:03:01 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YK6uP-0003J3-Pj for emacs-devel@gnu.org; Sat, 07 Feb 2015 10:03:00 -0500 Original-Received: from chene.dit.umontreal.ca ([132.204.246.20]:36628) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YK6uP-0003Iq-KX; Sat, 07 Feb 2015 10:02:57 -0500 Original-Received: from pastel.home (lechon.iro.umontreal.ca [132.204.27.242]) by chene.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id t17F2rjb017246; Sat, 7 Feb 2015 10:02:53 -0500 Original-Received: by pastel.home (Postfix, from userid 20848) id D5DB1FAA; Sat, 7 Feb 2015 10:02:52 -0500 (EST) In-Reply-To: <831tm2cdm3.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 07 Feb 2015 10:47:16 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-NAI-Spam-Flag: NO X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV5210=0 X-NAI-Spam-Version: 2.3.0.9393 : core <5210> : inlines <2048> : streams <1386304> : uri <1849055> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 132.204.246.20 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:182608 Archived-At: >> Could it handle for example an equivalence class which includes >> =E2=86=92, =E2=87=92, ->, and =3D> ? > These are our application-specific equivalences, they are not in > Unicode. I know. > So we will need to have a list of equivalences in addition > to Unicode, that we will want to add to the "folding" char-table, > either permanently or as buffer-local customizations. To me the simplest option is to have a DFA which returns an integer (this integer being "the equivalence class number", and which will usually be one of the characters in the equivalence class). Each DFA node could be a char-table. So if all equivalence classes are made up of single-chars, the DFA collapses is just a plain-old char-table mapping chars to the canonical element of their equivalence classes. For 2-char elements, we'll arrange for the entry for the first char (in the main char-table) to be not an integer but another char-table. Being a DFA, this could easily handle complex elements (matching arbitrary regular expressions), tho whether we'd make much use of this particular feature is not very important. Since some of the nodes in the DFA would likely only handle a very few chars specially, we could later improve the representation so that those nodes don't use up a whole char-table. Stefan PS: And this same kind of "char-table extended into a DFA" could be useful for syntax-tables in order to provide much more flexible support for multi-character comment markers or "paren-like nested elements".