From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "Roland Winkler" Newsgroups: gmane.emacs.devel Subject: Re: strip accents and sorting [was: BibTeX issues] Date: Fri, 30 Aug 2019 11:29:29 -0500 Message-ID: <20201.44841.652990.23913@gargle.gargle.HOWL> References: <87mufv2e9s.fsf@uni-bielefeld.de> <87ftllji9u.fsf@gnu.org> <83tva1b02r.fsf@gnu.org> <17902.3833.825923.23911@gargle.gargle.HOWL> <83lfvcbg5u.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="130255"; mail-complaints-to="usenet@blaine.gmane.org" Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Aug 30 18:33:56 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1i3jqV-000Xm3-8b for ged-emacs-devel@m.gmane.org; Fri, 30 Aug 2019 18:33:55 +0200 Original-Received: from localhost ([::1]:36532 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i3jqS-0000AX-29 for ged-emacs-devel@m.gmane.org; Fri, 30 Aug 2019 12:33:52 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:33277) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i3jmP-0005tY-TR for emacs-devel@gnu.org; Fri, 30 Aug 2019 12:29:42 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:45963) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1i3jmN-0007nS-PD for emacs-devel@gnu.org; Fri, 30 Aug 2019 12:29:39 -0400 Original-Received: from [2602:30a:2e52:d720:65b7:1416:12e7:8bfb] (port=35124 helo=regnitz) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1i3jmL-00022e-Vm; Fri, 30 Aug 2019 12:29:38 -0400 In-Reply-To: <83lfvcbg5u.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:239707 Archived-At: On Thu Aug 29 2019 Eli Zaretskii wrote: > > From: "Roland Winkler" > > Now, one solution would be to simply strip off the combining > > characters by decomposing the characters. Or is there a possibility > > to teach a sorting algorithm that the first letter of a=CC=88-combine is > > "the same" as the first letter of =C3=A4-umlaut and all this should > > appear near a-plain instead of past o-plain? >=20 > Both should be possible. To entirely strip the combining accents, you > can use ucs-normalize, and then filter out all characters whose > canonical combining class is non-zero. Thanks, I need to look at this more carefully.