From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "Roland Winkler" Newsgroups: gmane.emacs.devel Subject: Re: strip accents and sorting [was: BibTeX issues] Date: Fri, 30 Aug 2019 11:27:33 -0500 Message-ID: <20085.68375.750044.23913@gargle.gargle.HOWL> References: <87mufv2e9s.fsf@uni-bielefeld.de> <87ftllji9u.fsf@gnu.org> <83tva1b02r.fsf@gnu.org> <17902.3833.825923.23911@gargle.gargle.HOWL> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="126662"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Eli Zaretskii , emacs-devel@gnu.org To: martin rudalics Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Aug 30 18:33:01 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1i3jpd-000Won-Gu for ged-emacs-devel@m.gmane.org; Fri, 30 Aug 2019 18:33:01 +0200 Original-Received: from localhost ([::1]:36478 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i3jpc-00084p-4u for ged-emacs-devel@m.gmane.org; Fri, 30 Aug 2019 12:33:00 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:56334) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i3jkc-0004J1-C8 for emacs-devel@gnu.org; Fri, 30 Aug 2019 12:27:52 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:45933) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1i3jkZ-0002Bs-Lm; Fri, 30 Aug 2019 12:27:49 -0400 Original-Received: from [2602:30a:2e52:d720:65b7:1416:12e7:8bfb] (port=35120 helo=regnitz) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1i3jkO-0006ex-9t; Fri, 30 Aug 2019 12:27:39 -0400 In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:239706 Archived-At: On Thu Aug 29 2019 martin rudalics wrote: > > But (string-lessp "=C3=A4-umlaut" "o=CC=88-combine") gives nil >=20 > But (string-collate-lessp "=C3=A4-umlaut" "o=CC=88-combine") gives t ...not for me, which is likely due to my locale LC_COLLATE=3DC I could use instead, say, LC_COLLATE=3Den_US.utf8. Then the above call of string-collate-lessp yields t. But this also implies case folding and ignoring dots in directory listings, which is not what I want. In other words, these locales have too many features bundled together. Maybe these feature sets of different locales are documented *somewhere* in a neat way, and there is a locale with a feature set that does exactly what I want. But to the best of my knowledge this documentation resides outside emacs so that things get rather complicated when this affects an emacs session in important or possibly subtle ways. > so it should be fairly easy to fix `sort-lines' and friends > accordingly. In that sense I am not sure I would like to see `sort-lines' and friends be fixed "accordingly". If at all, I'd vote for a user option that likely I'd use to disable such things. On the other hand, as Eli pointed out in his reply about accented characters being represented via a single character as compared to using combining characters > The Unicode Standard mandates that they be handled identically, > including in searching and sorting. We don't yet implement that > 100%, but see char-fold.el for a partial (and not very efficient) > implementation during search. So I would assume that the locale should not matter at all in the context of unicode combining characters. (Or there should be a way to control exactly this aspect of unicode combining characters with no additional (mis)features bundled with it.) I understand that it is a different matter how accented characters are sorted relative to each other and also relative to un-accented characters. So it can make a lot of sense to have different locales for that aspect. Maybe I am missing something here. (And I have not yet looked in more detail at char-fold.el mentioned by Eli, which could be a better way to go within the emacs world.) Roland