From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: On language-dependent defaults for character-folding Date: Mon, 29 Feb 2016 18:27:06 +0200 Message-ID: <83k2ln8oth.fsf@gnu.org> References: <83fuwproyf.fsf@gnu.org> <837fi0sz29.fsf@gnu.org> <83egc8qzjh.fsf@gnu.org> <87egc7evu3.fsf@gnus.org> <83io1jpt4u.fsf@gnu.org> <87povqhj25.fsf@gnus.org> <83povqm3dw.fsf@gnu.org> <831t84lgsa.fsf@gnu.org> <87io1gz3i8.fsf@mail.linkov.net> <83wppvic6f.fsf@gnu.org> <8737sjufmw.fsf@mail.linkov.net> <83fuwigdft.fsf@gnu.org> <87h9gxfx9k.fsf@mail.linkov.net> <83mvqog3mf.fsf@gnu.org> <87oab0fk8d.fsf@mail.linkov.net> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1456763280 32337 80.91.229.3 (29 Feb 2016 16:28:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 29 Feb 2016 16:28:00 +0000 (UTC) Cc: larsi@gnus.org, lokedhs@gmail.com, rms@gnu.org, emacs-devel@gnu.org To: Juri Linkov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 29 17:27:46 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aaQfg-0007fI-QT for ged-emacs-devel@m.gmane.org; Mon, 29 Feb 2016 17:27:44 +0100 Original-Received: from localhost ([::1]:37583 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aaQfg-0002PT-8x for ged-emacs-devel@m.gmane.org; Mon, 29 Feb 2016 11:27:44 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49997) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aaQfa-0002PB-5L for emacs-devel@gnu.org; Mon, 29 Feb 2016 11:27:41 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aaQfV-0005Je-1i for emacs-devel@gnu.org; Mon, 29 Feb 2016 11:27:38 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:33399) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aaQfU-0005Ja-UZ; Mon, 29 Feb 2016 11:27:32 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4744 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1aaQfH-0001s8-UL; Mon, 29 Feb 2016 11:27:20 -0500 In-reply-to: <87oab0fk8d.fsf@mail.linkov.net> (message from Juri Linkov on Mon, 29 Feb 2016 02:22:02 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:200817 Archived-At: > From: Juri Linkov > Cc: larsi@gnus.org, lokedhs@gmail.com, rms@gnu.org, emacs-devel@gnu.org > Date: Mon, 29 Feb 2016 02:22:02 +0200 > > > What I envisioned is a single variable that holds a list of folding > > sub-features. Examples include ignoring diacritics, matching > > ligatures and their decompositions, "controversial" foldings that > > users of specific languages might not want, etc. The default value > > will hold all of the sub-features; users that don't want some of them > > will be able to remove them from the list, which will affect the > > mapping at search time. We could also have a setting that means "DTRT > > for my locale", which will remove the sub-features inappropriate for > > the locale's language. Stuff like that. > > Like (defcustom char-fold-defaults '(ignore-diacritics match-ligatures ...? Yes. > Not sure if such terms are self-descriptive. At least plain pairs like > '((o ø) (l ł) ...) should be enough to customize at the base character level, > and later we might consider grouping such pairs into a more high-level > features like ‘spanish-diacritics’, ‘swedish-diacritics’, etc. Such grouping is what I had in mind. I don't expect users to remember these characters by heart. > > I think we need to create another uni-*.el file which defines a > > decomposition char-table populated from decomps.txt. > > The name of the currently used Unicode character property is “decomposition”. > What would be a good name for the property from decomps.txt? “decomposition2”? I'm not good at naming stuff, but how about collating-decomposition or decomposition-for-collation?