From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Drew Adams Newsgroups: gmane.emacs.devel Subject: RE: char equivalence classes in search - why not symmetric? Date: Wed, 9 Sep 2015 20:12:37 -0700 (PDT) Message-ID: <116512ec-bdec-43de-afa9-dc01a57715e8@default> References: <2a7b9134-af2a-462d-af6c-d02bad60bbe8@default> <834mjecdy7.fsf@gnu.org> <38061f42-eaf1-47c6-b74d-f676ac952b18@default> <83r3miatvl.fsf@gnu.org> <21998.29683.916211.867479@a1i15.kph.uni-mainz.de> <9A972800-D8F0-4DA8-877E-07D5BDC2E1F9@gmail.com> <87oahd11i9.fsf@uwakimon.sk.tsukuba.ac.jp> <8cf269bc-69d8-4752-8506-de8d992512e1@default> <87mvwx0wdq.fsf@uwakimon.sk.tsukuba.ac.jp> <42be0ab7-f1e0-4fac-8b80-0e1686e88445@default> <87fv2o24mf.fsf@uwakimon.sk.tsukuba.ac.jp> <4bf04d46-418d-4950-9de3-d9f9130ce8bf@default> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="__1441854759592180314abhmp0005.oracle.com" X-Trace: ger.gmane.org 1441854802 20414 80.91.229.3 (10 Sep 2015 03:13:22 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 10 Sep 2015 03:13:22 +0000 (UTC) Cc: emacs-devel@gnu.org To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Sep 10 05:13:09 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZZsIL-0001zO-4V for ged-emacs-devel@m.gmane.org; Thu, 10 Sep 2015 05:13:05 +0200 Original-Received: from localhost ([::1]:46609 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZsIK-0007nK-EB for ged-emacs-devel@m.gmane.org; Wed, 09 Sep 2015 23:13:04 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47045) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZsI4-0007nA-4R for emacs-devel@gnu.org; Wed, 09 Sep 2015 23:12:48 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZZsHz-00010M-0P for emacs-devel@gnu.org; Wed, 09 Sep 2015 23:12:48 -0400 Original-Received: from aserp1040.oracle.com ([141.146.126.69]:18729) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZsHy-00010D-PH for emacs-devel@gnu.org; Wed, 09 Sep 2015 23:12:42 -0400 Original-Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t8A3CeWR022417 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 10 Sep 2015 03:12:41 GMT Original-Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id t8A3CerN032639 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Thu, 10 Sep 2015 03:12:40 GMT Original-Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by userv0122.oracle.com (8.13.8/8.13.8) with ESMTP id t8A3CdTe011410; Thu, 10 Sep 2015 03:12:39 GMT In-Reply-To: <4bf04d46-418d-4950-9de3-d9f9130ce8bf@default> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.9 (901082) [OL 12.0.6691.5000 (x86)] X-Source-IP: aserv0022.oracle.com [141.146.126.234] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 141.146.126.69 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:189784 Archived-At: --__1441854759592180314abhmp0005.oracle.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > AFAICT, this (or similar) is the only code needed. Sorry, I spoke too soon. 1. The following two lines are needed, before evaluating the code I sent earlier. (I've attached an update that includes them, so you can just load/evaluate it.) (setq character-fold-search t) (load-library "character-fold") This is due to the way the vanilla code is at the moment. This also means that for this testing char folding will be on, to start with. 2. The code I have is not sufficient for everything. You can use it to see what the behavior is for single-char entries in the char table, which includes accented chars (chars with diacritics). But it does not also handle multiple-char entries in the table. For instance, you can search for "=C3=A9" and get char folding, but you cannot search for "e=CC=81" and get char folding. The first of these is just the char named LATIN SMALL LETTER E WITH ACUTE. The second is plain "e" composed with "=CC=81" (the char named COMBINING ACUTE ACCENT). Some more work would be needed to make such combinations work too. As I said, I'm no expert on char tables. But the attached code should give you a good idea of what is involved. At the end of the file I included some commented-out e chars to search for. (Use `C-u C-x =3D' on a char to see what it really is.) --__1441854759592180314abhmp0005.oracle.com Content-Type: application/octet-stream; name="symmetric-char-fold.el" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="symmetric-char-fold.el" KHNldHEgY2hhcmFjdGVyLWZvbGQtc2VhcmNoIHQpCihsb2FkLWxpYnJhcnkgImNoYXJhY3Rlci1m b2xkIikKCihkZWZ1biB1cGRhdGUtY2hhci1mb2xkLXRhYmxlICgpCiAgIlVwZGF0ZSB0aGUgdmFs dWUgb2YgdmFyaWFibGUgYGNoYXJhY3Rlci1mb2xkLXRhYmxlJy4KVGhlIG5ldyB2YWx1ZSByZWZs ZWN0cyB0aGUgY3VycmVudCB2YWx1ZSBvZiBgY2hhci1mb2xkLXN5bW1ldHJpYycuIgogIChzZXRx IGNoYXJhY3Rlci1mb2xkLXRhYmxlCiAgICAgICAgKGxldCogKChlcXVpdiAobWFrZS1jaGFyLXRh YmxlICdjaGFyYWN0ZXItZm9sZC10YWJsZSkpCiAgICAgICAgICAgICAgICh0YWJsZSAodW5pY29k ZS1wcm9wZXJ0eS10YWJsZS1pbnRlcm5hbCAnZGVjb21wb3NpdGlvbikpCiAgICAgICAgICAgICAg IChmdW5jIChjaGFyLXRhYmxlLWV4dHJhLXNsb3QgdGFibGUgMSkpKQogICAgICAgICAgOzsgRW5z dXJlIHRoZSB0YWJsZSBpcyBwb3B1bGF0ZWQuCiAgICAgICAgICAobWFwLWNoYXItdGFibGUKICAg ICAgICAgICAobGFtYmRhIChpIHYpICh3aGVuIChjb25zcCBpKSAoZnVuY2FsbCBmdW5jIChjYXIg aSkgdiB0YWJsZSkpKQogICAgICAgICAgIHRhYmxlKQogICAgICAgICAgOzsgQ29tcGlsZSBhIGxp c3Qgb2YgYWxsIGNvbXBsZXggY2hhcnMgdGhhdCBlYWNoIHNpbXBsZSBjaGFyIHNob3VsZCBtYXRj aC4KICAgICAgICAgIChtYXAtY2hhci10YWJsZQogICAgICAgICAgIChsYW1iZGEgKGkgZGVjKQog ICAgICAgICAgICAgKHdoZW4gKGNvbnNwIGRlYykKICAgICAgICAgICAgICAgOzsgRGlzY2FyZCBh IHBvc3NpYmxlIGZvcm1hdHRpbmcgdGFnLgogICAgICAgICAgICAgICAod2hlbiAoc3ltYm9scCAo Y2FyIGRlYykpCiAgICAgICAgICAgICAgICAgKHNldHEgZGVjIChjZHIgZGVjKSkpCiAgICAgICAg ICAgICAgIDs7IFNraXAgdHJpdmlhbCBjYXNlcyBsaWtlID9hIGRlY29tcG9zaW5nIHRvICg/YSku CiAgICAgICAgICAgICAgICh1bmxlc3MgKG9yIChhbmQgKGVxIGkgKGNhciBkZWMpKSAgKG5vdCAg KGNkciBkZWMpKSkpCiAgICAgICAgICAgICAgICAgKGxldCAoKGQgZGVjKQogICAgICAgICAgICAg ICAgICAgICAgIChmb2xkLWRlY29tcCB0KQogICAgICAgICAgICAgICAgICAgICAgIGsgZm91bmQp CiAgICAgICAgICAgICAgICAgICAod2hpbGUgKGFuZCBkIChub3QgZm91bmQpKQogICAgICAgICAg ICAgICAgICAgICAoc2V0cSBrIChwb3AgZCkpCiAgICAgICAgICAgICAgICAgICAgIDs7IElzIGsg YSBudW1iZXIgb3IgbGV0dGVyLCBwZXIgdW5pY29kZSBzdGFuZGFyZD8KICAgICAgICAgICAgICAg ICAgICAgKHNldHEgZm91bmQgKG1lbXEgKGdldC1jaGFyLWNvZGUtcHJvcGVydHkgayAnZ2VuZXJh bC1jYXRlZ29yeSkKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJyhMdSBM bCBMdCBMbSBMbyBOZCBObCBObykpKSkKICAgICAgICAgICAgICAgICAgIChpZiBmb3VuZAogICAg ICAgICAgICAgICAgICAgICAgIDs7IENoZWNrIGlmIHRoZSBkZWNvbXBvc2l0aW9uIGhhcyBtb3Jl IHRoYW4gb25lIGxldHRlciwKICAgICAgICAgICAgICAgICAgICAgICA7OyBiZWNhdXNlIHRoZW4g d2UgZG9uJ3Qgd2FudCB0aGUgZmlyc3QgbGV0dGVyIHRvIG1hdGNoCiAgICAgICAgICAgICAgICAg ICAgICAgOzsgdGhlIGRlY29tcG9zaXRpb24uCiAgICAgICAgICAgICAgICAgICAgICAgKGRvbGlz dCAoayBkKQogICAgICAgICAgICAgICAgICAgICAgICAgKHdoZW4gKGFuZCBmb2xkLWRlY29tcAog ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAobWVtcSAoZ2V0LWNoYXItY29kZS1w cm9wZXJ0eSBrICdnZW5lcmFsLWNhdGVnb3J5KQogICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAnKEx1IExsIEx0IExtIExvIE5kIE5sIE5vKSkpCiAgICAgICAgICAgICAg ICAgICAgICAgICAgIChzZXRxIGZvbGQtZGVjb21wIG5pbCkpKQogICAgICAgICAgICAgICAgICAg ICA7OyBJZiB0aGVyZSdzIG5vIG51bWJlciBvciBsZXR0ZXIgb24gdGhlCiAgICAgICAgICAgICAg ICAgICAgIDs7IGRlY29tcG9zaXRpb24sIHRha2UgdGhlIGZpcnN0IGNoYXJhY3RlciBpbiBpdC4K ICAgICAgICAgICAgICAgICAgICAgKHNldHEgZm91bmQgKGNhci1zYWZlIGRlYykpKQogICAgICAg ICAgICAgICAgICAgOzsgRmluYWxseSwgd2Ugb25seSBmb2xkIG11bHRpLWNoYXIgZGVjb21wb3Np dGlvbiBpZiBhdAogICAgICAgICAgICAgICAgICAgOzsgbGVhc3Qgb25lIG9mIHRoZSBjaGFycyBp cyBub24tc3BhY2luZyAoY29tYmluaW5nKS4KICAgICAgICAgICAgICAgICAgICh3aGVuIGZvbGQt ZGVjb21wCiAgICAgICAgICAgICAgICAgICAgIChzZXRxIGZvbGQtZGVjb21wIG5pbCkKICAgICAg ICAgICAgICAgICAgICAgKGRvbGlzdCAoayBkZWMpCiAgICAgICAgICAgICAgICAgICAgICAgKHdo ZW4gKGFuZCAobm90IGZvbGQtZGVjb21wKQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgKD4gKGdldC1jaGFyLWNvZGUtcHJvcGVydHkgayAnY2Fub25pY2FsLWNvbWJpbmluZy1jbGFz cykgMCkpCiAgICAgICAgICAgICAgICAgICAgICAgICAoc2V0cSBmb2xkLWRlY29tcCB0KSkpKQog ICAgICAgICAgICAgICAgICAgOzsgQWRkIGkgdG8gdGhlIGxpc3Qgb2YgY2hhcmFjdGVycyB0aGF0 IGsgY2FuCiAgICAgICAgICAgICAgICAgICA7OyByZXByZXNlbnQuIEFsc28gcG9zc2libHkgYWRk IGl0cyBkZWNvbXBvc2l0aW9uLCBzbyB3ZSBjYW4KICAgICAgICAgICAgICAgICAgIDs7IG1hdGNo IG11bHRpLWNoYXIgcmVwcmVzZW50YXRpb25zIGxpa2UgKGZvcm1hdCAiYSVjIiA3NjkpCiAgICAg ICAgICAgICAgICAgICAod2hlbiAoYW5kIGZvdW5kIChub3QgKGVxIGkgaykpKQogICAgICAgICAg ICAgICAgICAgICAobGV0ICgoY2hhcnMgKGNvbnMgKGNoYXItdG8tc3RyaW5nIGkpIChhcmVmIGVx dWl2IGspKSkpCiAgICAgICAgICAgICAgICAgICAgICAgKGFzZXQgZXF1aXYgayAoaWYgZm9sZC1k ZWNvbXAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAoY29ucyAoYXBw bHkgIydzdHJpbmcgZGVjKSBjaGFycykKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgY2hhcnMpKSkpKSkpKQogICAgICAgICAgIHRhYmxlKQogICAgICAgICAgOzsgQWRkIHNv bWUgbWFudWFsIGVudHJpZXMuCiAgICAgICAgICAoZG9saXN0IChpdCAnKCg/XCIgIu+8giIgIuKA nCIgIuKAnSIgIuKAnSIgIuKAniIgIuK5giIgIuOAniIgIuKAnyIgIuKAnyIgIuKdniIgIuKdnSIK ICAgICAgICAgICAgICAgICAgICAgICAgICLinaAiICLigJwiICLigJ4iICLjgJ0iICLjgJ8iICLw n5m3IiAi8J+ZtiIgIvCfmbgiICLCqyIgIsK7IikKICAgICAgICAgICAgICAgICAgICAgICAgKD8n ICLinZ8iICLinZsiICLinZwiICLigJgiICLigJkiICLigJoiICLigJsiICLigJoiICLzoICiIiAi 4p2uIiAi4p2vIiAi4oC5IiAi4oC6IikKICAgICAgICAgICAgICAgICAgICAgICAgKD9gICLinZsi ICLigJgiICLigJsiICLzoICiIiAi4p2uIiAi4oC5IikpKQogICAgICAgICAgICAobGV0ICgoaWR4 IChjYXIgaXQpKQogICAgICAgICAgICAgICAgICAoY2hhcnMgKGNkciBpdCkpKQogICAgICAgICAg ICAgIChhc2V0IGVxdWl2IGlkeCAoYXBwZW5kIGNoYXJzIChhcmVmIGVxdWl2IGlkeCkpKSkpCgog ICAgICAgICAgOzsgLS0tLS0tLS04PC0tLS0tLXRoZSBvbmx5IGFkZGl0aW9uLS0tLS0tLS0tLS0t LS0tLQogICAgICAgICAgKHdoZW4gY2hhci1mb2xkLXN5bW1ldHJpYwogICAgICAgICAgICA7OyBB ZGQgYW4gZW50cnkgZm9yIGVhY2ggZXF1aXZhbGVudCBjaGFyLgogICAgICAgICAgICAobGV0ICgo b3RoZXJzICAoKSkpCiAgICAgICAgICAgICAgKG1hcC1jaGFyLXRhYmxlCiAgICAgICAgICAgICAg IChsYW1iZGEgKGJhc2UgdikKICAgICAgICAgICAgICAgICAobGV0ICgoY2hycyAgKGFyZWYgZXF1 aXYgYmFzZSkpKQogICAgICAgICAgICAgICAgICAgKHdoZW4gKGNvbnNwIGNocnMpCiAgICAgICAg ICAgICAgICAgICAgIChkb2xpc3QgKGNociAgKGNkciBjaHJzKSkKICAgICAgICAgICAgICAgICAg ICAgICAocHVzaCAoY29ucyAoc3RyaW5nLXRvLWNoYXIgY2hyKSAocmVtb3ZlIGNociBjaHJzKSkg b3RoZXJzKSkpKSkKICAgICAgICAgICAgICAgZXF1aXYpCiAgICAgICAgICAgICAgKGRvbGlzdCAo aXQgIG90aGVycykKICAgICAgICAgICAgICAgIChsZXQgKChiYXNlICAgKGNhciBpdCkpCiAgICAg ICAgICAgICAgICAgICAgICAoY2hhcnMgIChjZHIgaXQpKSkKICAgICAgICAgICAgICAgICAgKGFz ZXQgZXF1aXYgYmFzZSAoYXBwZW5kIGNoYXJzIChhcmVmIGVxdWl2IGJhc2UpKSkpKSkpCiAgICAg ICAgICA7OyAtLS0tLS0tLTg8LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t CgogICAgICAgICAgOzsgQ29udmVydCB0aGUgbGlzdHMgb2YgY2hhcmFjdGVycyB3ZSBjb21waWxl ZCBpbnRvIHJlZ2V4cHMuCiAgICAgICAgICAobWFwLWNoYXItdGFibGUKICAgICAgICAgICAobGFt YmRhIChpIHYpIChsZXQgKChyZSAocmVnZXhwLW9wdCAoY29ucyAoY2hhci10by1zdHJpbmcgaSkg dikpKSkKICAgICAgICAgICAgICAgICAgICAgIChpZiAoY29uc3AgaSkKICAgICAgICAgICAgICAg ICAgICAgICAgICAoc2V0LWNoYXItdGFibGUtcmFuZ2UgZXF1aXYgaSByZSkKICAgICAgICAgICAg ICAgICAgICAgICAgKGFzZXQgZXF1aXYgaSByZSkpKSkKICAgICAgICAgICBlcXVpdikKICAgICAg ICAgIGVxdWl2KSkpCgooZGVmY3VzdG9tIGNoYXItZm9sZC1zeW1tZXRyaWMgdAogICJOb24tbmls IG1lYW5zIGNoYXItZm9sZCBzZWFyY2hpbmcgdHJlYXRzIGVxdWl2YWxlbnQgY2hhcnMgdGhlIHNh bWUuClRoYXQgaXMsIHVzZSBvZiBhbnkgb2YgYSBzZXQgb2YgY2hhci1mb2xkIGVxdWl2YWxlbnQg Y2hhcnMgaW4gYSBzZWFyY2gKc3RyaW5nIGZpbmRzIGFueSBvZiB0aGVtIGluIHRoZSB0ZXh0IGJl aW5nIHNlYXJjaGVkLgoKSWYgbmlsIHRoZW4gb25seSB0aGUgXCJiYXNlXCIgb3IgXCJjYW5vbmlj YWxcIiBjaGFyIG9mIHRoZSBzZXQgbWF0Y2hlcwphbnkgb2YgdGhlbS4gIFRoZSBvdGhlcnMgbWF0 Y2ggb25seSB0aGVtc2VsdmVzLCBldmVuIHdoZW4gY2hhci1mb2xkaW5nCmlzIHR1cm5lZCBvbi4i CiAgOnNldCAobGFtYmRhIChzeW0gZGVmcykKICAgICAgICAgKGN1c3RvbS1zZXQtZGVmYXVsdCBz eW0gZGVmcykKICAgICAgICAgKHVwZGF0ZS1jaGFyLWZvbGQtdGFibGUpKQogIDp0eXBlICdib29s ZWFuIDpncm91cCAnaXNlYXJjaCkKCjs7ICgi8J2ajiIgIvCdmZoiICLwnZimIiAi8J2XsiIgIvCd lr4iICLwnZaKIiAi8J2VliIgIvCdlKIiICLwnZOuIiAi8J2ShiIgIvCdkZIiICLwnZCeIiAi772F IiAi44uOIiAi44uNIiAi4pOUIiAi4pKgIgo7OyAgIuKFhyIgIuKEryIgIuKCkSIgImXMgyIgIuG6 vSIgImXMiSIgIuG6uyIgImXMoyIgIuG6uSIgImXMsCIgIuG4myIgImXMrSIgIuG4mSIgIuG1iSIg ImXMpyIgIsipIiAiZcyRIiAiyIciCjs7ICJlzI8iICLIhSIgImXMjCIgIsSbIiAiZcyoIiAixJki ICJlzIciICLElyIgImXMhiIgIsSVIiAiZcyEIiAixJMiICJlzIgiICLDqyIgImXMgiIgIsOqIiAi ZcyBIiAiw6kiICJlzIAiICLDqCIpCgo7OyBObyBnb29kIHlldDogIvCdmo4iICJlzIMiICJlzIki ICJlzKMiICJlzLAiICJlzK0iICJlzKciICJlzJEiICJlzI8iCjs7ICAgICAgICAgICAgICAiZcyM IiAiZcyoIiAiZcyHIiAiZcyGIiAiZcyEIiAiZcyIIiAiZcyCIiAiZcyBIiAiZcyAIgo= --__1441854759592180314abhmp0005.oracle.com--