From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: On language-dependent defaults for character-folding Date: Tue, 23 Feb 2016 20:14:55 +0200 Message-ID: <83k2lvi99c.fsf@gnu.org> References: <834mdc5w6o.fsf@gnu.org> <838u2hu6aq.fsf@gnu.org> <871t899tde.fsf@gnus.org> <83y4ahru04.fsf@gnu.org> <83fuwproyf.fsf@gnu.org> <837fi0sz29.fsf@gnu.org> <83egc8qzjh.fsf@gnu.org> <87egc7evu3.fsf@gnus.org> <83io1jpt4u.fsf@gnu.org> <87povqhj25.fsf@gnus.org> <87povqe5tr.fsf@gnus.org> <87ziuta4l4.fsf@gnus.org> <87y4adzcia.fsf@gnus.org> <83twl0k1k5.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1456251325 8401 80.91.229.3 (23 Feb 2016 18:15:25 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 23 Feb 2016 18:15:25 +0000 (UTC) Cc: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org To: rms@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Feb 23 19:15:25 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aYHUa-0000hE-2P for ged-emacs-devel@m.gmane.org; Tue, 23 Feb 2016 19:15:24 +0100 Original-Received: from localhost ([::1]:59084 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYHUZ-0002cz-HL for ged-emacs-devel@m.gmane.org; Tue, 23 Feb 2016 13:15:23 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60287) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYHUS-0002ci-OR for emacs-devel@gnu.org; Tue, 23 Feb 2016 13:15:20 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aYHUO-0004KV-Nr for emacs-devel@gnu.org; Tue, 23 Feb 2016 13:15:16 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:46615) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYHUO-0004KK-KR; Tue, 23 Feb 2016 13:15:12 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2698 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1aYHUD-0000Qc-49; Tue, 23 Feb 2016 13:15:02 -0500 In-reply-to: (message from Richard Stallman on Tue, 23 Feb 2016 12:43:56 -0500) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:200564 Archived-At: > From: Richard Stallman > CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org > Date: Tue, 23 Feb 2016 12:43:56 -0500 > > That is interesting. It means we need several levels of folding: > > * Different appearances of the same letter+decorations: > as a single code point, or as a composition. > > * Identical-looking distinct code points (Latin a and Cyrillic a). This one is a very specialized feature needed only in some marginal use cases (like looking for the so-called "confusables" -- characters that look the same and could be used for deception, e.g. in URLs). > * The same letter with different decorations (o and ö in English). > > * Equivalent letters (ö and ø in Swedish). Not just letters -- sequences of characters. For example, å vs aa in Danish, or ffi vs ffi. > Is there any need, ever, to disable the first level? One could imagine a use case when you want to find only precomposed characters, not their decomposed equivalents. But it should be rare indeed. > The other levels are language-specific, and the user might want to > enable or disable them. Not all of them are language-specific. Some are valid in any language.