From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: On language-dependent defaults for character-folding
Date: Tue, 23 Feb 2016 18:56:36 +0200
Message-ID: <83ziuricvv.fsf@gnu.org>
References: <CAAdUY-KRpbjDJ6h=QOsWBpOJyJ-GP1ia70YyjwYsNe5i1S=mXg@mail.gmail.com>
	<83d1s17npz.fsf@gnu.org> <87oablfpn3.fsf@mail.linkov.net>
	<834mdd6llx.fsf@gnu.org>
	<7fbb8bc7-9a97-4bad-a103-a6690a35241d@default>
	<834mdc5w6o.fsf@gnu.org> <m2ziuxltit.fsf@newartisans.com>
	<838u2hu6aq.fsf@gnu.org> <871t899tde.fsf@gnus.org>
	<83y4ahru04.fsf@gnu.org>
	<CADtN0W+B=JZ_LKis9opETfr5q8K=rC+Xt6jGijMC3GwiGbF2RA@mail.gmail.com>
	<83fuwproyf.fsf@gnu.org>
	<CADtN0W+2CjROLMnuC8N3X3TrwvsZOmidviFjM_-AF0DKN-Wvsg@mail.gmail.com>
	<837fi0sz29.fsf@gnu.org>
	<CADtN0W+93LH5d3=joVj2xe40rramMOcURKw7QKdv_OefYCm8Ug@mail.gmail.com>
	<83egc8qzjh.fsf@gnu.org>
	<CADtN0WL-rX5xzw75P=qLEYFYzLWkuCuntE+gf2BAhn981_jWBg@mail.gmail.com>
	<87egc7evu3.fsf@gnus.org> <83io1jpt4u.fsf@gnu.org>
	<87povqhj25.fsf@gnus.org>
	<CADtN0W+qyRZFwDR+MtLxBdayLbzajwbS1_ykufSg1OQLU8yY8w@mail.gmail.com>
	<87egc68opx.fsf@Rainer.invalid>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Trace: ger.gmane.org 1456246619 23564 80.91.229.3 (23 Feb 2016 16:56:59 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 23 Feb 2016 16:56:59 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Achim Gratz <Stromeko@nexgo.de>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Feb 23 17:56:52 2016
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1aYGGZ-0001A4-NM
	for ged-emacs-devel@m.gmane.org; Tue, 23 Feb 2016 17:56:51 +0100
Original-Received: from localhost ([::1]:58450 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1aYGGZ-0001zb-7o
	for ged-emacs-devel@m.gmane.org; Tue, 23 Feb 2016 11:56:51 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49947)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1aYGGV-0001zN-Ji
	for emacs-devel@gnu.org; Tue, 23 Feb 2016 11:56:48 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1aYGGR-0001j8-L2
	for emacs-devel@gnu.org; Tue, 23 Feb 2016 11:56:47 -0500
Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:44110)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1aYGGR-0001j4-HX; Tue, 23 Feb 2016 11:56:43 -0500
Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2567
	helo=home-c4e4a596f7)
	by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128)
	(Exim 4.82) (envelope-from <eliz@gnu.org>)
	id 1aYGGQ-0005Cv-Ti; Tue, 23 Feb 2016 11:56:43 -0500
In-reply-to: <87egc68opx.fsf@Rainer.invalid> (message from Achim Gratz on Sun, 
	21 Feb 2016 09:14:18 +0100)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:200545
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/200545>

> From: Achim Gratz <Stromeko@nexgo.de>
> Date: Sun, 21 Feb 2016 09:14:18 +0100
> 
> Elias Mårtenson writes:
> > Because under the Unicode decomposition rules, ø is not decomposable. I
> > can't explain why that is the case (probably because there is no reason to
> > have a combining /. After all, the only languages that use ø are languages
> > that use it as a character of its own).
> 
> AFAIK, for combining characters to be composable/decomposable the glyphs
> must not overlap.  This is the same issue as with the polish »ł« to the
> best of my knowledge.

The definitive answer is here, for those interested:

  http://www.unicode.org/mail-arch/unicode-ml/y2016-m02/0106.html

> In other words, unicode composition/decomposition rules tell you more
> about the glyph construction than they do about useful strategies to
> search for multiple characters.

That conclusion is too radical, IMO.  You will see in the above
message that the criterion you describe was just a means for the UTC
to draw a line somewhere, i.e. it was an ad-hoc rule more than
anything else.

> The idea of using the base character of the canonical decomposition
> in the search might still yield a useful shortcut in most cases, but
> I'm not sure it is correct in all languages even when that
> decomposition exists and, as the examples show, there are cases
> where the non-decomposed character has to be treated specially.

Language-specific tailoring is indeed needed for best results, but the
language-independent decompositions have their place.  E.g., you will
see in the Unicode collation database (UCA) a file named decomps.txt
that is basically a list of decompositions from UnicodeData.txt with
additions specifically for collation, searching, and matching
(including ł, btw).  Which tells me that the decomposition data in
UnicodeData.txt is a good basis for these features, it is not just
about glyph constructions.