From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.devel Subject: Re: On language-dependent defaults for character-folding Date: Tue, 23 Feb 2016 02:14:55 +0200 Organization: LINKOV.NET Message-ID: <87io1gz3i8.fsf@mail.linkov.net> References: <834mdc5w6o.fsf@gnu.org> <838u2hu6aq.fsf@gnu.org> <871t899tde.fsf@gnus.org> <83y4ahru04.fsf@gnu.org> <83fuwproyf.fsf@gnu.org> <837fi0sz29.fsf@gnu.org> <83egc8qzjh.fsf@gnu.org> <87egc7evu3.fsf@gnus.org> <83io1jpt4u.fsf@gnu.org> <87povqhj25.fsf@gnus.org> <83povqm3dw.fsf@gnu.org> <831t84lgsa.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1456186907 18088 80.91.229.3 (23 Feb 2016 00:21:47 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 23 Feb 2016 00:21:47 +0000 (UTC) Cc: larsi@gnus.org, lokedhs@gmail.com, rms@gnu.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Feb 23 01:21:37 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aY0jP-0006gZ-AI for ged-emacs-devel@m.gmane.org; Tue, 23 Feb 2016 01:21:35 +0100 Original-Received: from localhost ([::1]:52806 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aY0jO-0005Bk-Cv for ged-emacs-devel@m.gmane.org; Mon, 22 Feb 2016 19:21:34 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59539) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aY0jK-0005Bb-O5 for emacs-devel@gnu.org; Mon, 22 Feb 2016 19:21:31 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aY0jJ-000575-OX for emacs-devel@gnu.org; Mon, 22 Feb 2016 19:21:30 -0500 Original-Received: from sub3.mail.dreamhost.com ([69.163.253.7]:50048 helo=homiemail-a11.g.dreamhost.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aY0jI-00055J-2o; Mon, 22 Feb 2016 19:21:28 -0500 Original-Received: from homiemail-a11.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a11.g.dreamhost.com (Postfix) with ESMTP id 18DF06E071; Mon, 22 Feb 2016 16:21:25 -0800 (PST) Original-Received: from localhost.linkov.net (62.65.219.226.cable.starman.ee [62.65.219.226]) (Authenticated sender: jurta@jurta.org) by homiemail-a11.g.dreamhost.com (Postfix) with ESMTPA id B47086E070; Mon, 22 Feb 2016 16:21:23 -0800 (PST) In-Reply-To: <831t84lgsa.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 22 Feb 2016 20:51:49 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.91 (x86_64-pc-linux-gnu) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-Received-From: 69.163.253.7 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:200513 Archived-At: > But the most basic issue is that any significant development in these > directions require to re-implement the feature on the C level, and use > char-tables for folding, like we do with case-mapping. So until > someone steps forward for the job, all we can do is small corrections > to the existing implementation. Do I understand correctly that essentially what is necessary to do on the C level is to extend char-tables with character insertions and deletions, so in addition to canonical equivalence mappings (like are used for the existing case-mappings) char-tables should also support matching of multi-character additions (like combining accents in the search string) and deletions (like combining accents from the search string missing in the search text)? > For example, the default state of character-folding might depend on > the locale's language -- we could turn it off by default for languages > whose users expressed dissatisfaction with the feature. We could also > augment the regular expressions created for folding the search string > by filtering out variants that users of a particular language don't > want. If people think these ideas will make more users happy, we can > work on that. It seems two user variables are necessary for customization: 1. inclusive folding groups that will include by default such pairs as o - =C3=B8, l - =C5=82 added to the Unicode decomposition-based rul= es, and allow the users to add more rules; 2. exclusive folding groups to exclude locale/language-dependent rules fr= om the default mappings above, e.g. removing n - =C3=B1 for the "es" loca= le.