From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: strip accents and sorting [was: BibTeX issues] Date: Fri, 30 Aug 2019 20:51:32 +0300 Message-ID: <838sraa6e3.fsf@gnu.org> References: <87mufv2e9s.fsf@uni-bielefeld.de> <87ftllji9u.fsf@gnu.org> <83tva1b02r.fsf@gnu.org> <17902.3833.825923.23911@gargle.gargle.HOWL> <20085.68375.750044.23913@gargle.gargle.HOWL> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="204562"; mail-complaints-to="usenet@blaine.gmane.org" Cc: rudalics@gmx.at, emacs-devel@gnu.org To: "Roland Winkler" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Aug 30 19:53:18 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1i3l5K-000r5c-6e for ged-emacs-devel@m.gmane.org; Fri, 30 Aug 2019 19:53:18 +0200 Original-Received: from localhost ([::1]:39074 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i3l5I-0001sF-Rk for ged-emacs-devel@m.gmane.org; Fri, 30 Aug 2019 13:53:16 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41923) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i3l3V-0001l7-CM for emacs-devel@gnu.org; Fri, 30 Aug 2019 13:51:26 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:47696) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1i3l3T-0000kh-To; Fri, 30 Aug 2019 13:51:23 -0400 Original-Received: from [176.228.60.248] (port=1086 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1i3l3S-0007C5-WF; Fri, 30 Aug 2019 13:51:23 -0400 In-reply-to: <20085.68375.750044.23913@gargle.gargle.HOWL> (winkler@gnu.org) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:239708 Archived-At: > Date: Fri, 30 Aug 2019 11:27:33 -0500 > From: "Roland Winkler" > Cc: Eli Zaretskii , > emacs-devel@gnu.org > > > But (string-collate-lessp "ä-umlaut" "ö-combine") gives t > > ...not for me, which is likely due to my locale LC_COLLATE=C > > I could use instead, say, LC_COLLATE=en_US.utf8. Then the above > call of string-collate-lessp yields t. But this also implies case > folding and ignoring dots in directory listings, which is not what I > want. In other words, these locales have too many features bundled > together. You could set LC_COLLATE=en_US.utf8 inside Emacs, or even bind it around the call to string-collate-lessp. I think we support that on GNU/Linux. > > The Unicode Standard mandates that they be handled identically, > > including in searching and sorting. We don't yet implement that > > 100%, but see char-fold.el for a partial (and not very efficient) > > implementation during search. > > So I would assume that the locale should not matter at all in the > context of unicode combining characters. Not entirely true, as some aspects of this equivalence can be locale-dependent. See UAX#10 (http://www.unicode.org/reports/tr10/) for more about that.