From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: string-collate-lessp in elisp Date: Sun, 21 Jul 2024 09:35:56 +0300 Message-ID: <86ed7nckvn.fsf@gnu.org> References: Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8337"; mail-complaints-to="usenet@ciao.gmane.io" To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sun Jul 21 08:36:31 2024 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sVQBK-0001zT-Ee for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 21 Jul 2024 08:36:30 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sVQAr-0005Yq-FN; Sun, 21 Jul 2024 02:36:01 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sVQAq-0005Yf-T1 for help-gnu-emacs@gnu.org; Sun, 21 Jul 2024 02:36:00 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sVQAq-0001ob-Ki for help-gnu-emacs@gnu.org; Sun, 21 Jul 2024 02:36:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=kn1PBze7BHCO8fbPMQItu9N98IDU74zqUIhFD6Sx81Y=; b=Mq3QccdtPW96 LfCX2Y3kZT2uY+/aK3tKWuSnaAMymz1ny6nDokE4pJcCc29veyjQDfagPPzSdFX2MwIeXHKN2RMSM NiR0Xsdf2PonNAm5RtDPljkICKDHDrBHeawh2RWve8uYVl8r6oIEC+nzdkIEsWbTRlt5QZm9UcVem 3PZs7djD9gsD7m7L9DYh4LAehzTiY7Sr51cgB//+RjK0AxSM/58b8wW8V8xJB+Z8vn/iSpQJmxNLZ crPzFbmUkyeIYnD0l4Jv6RZWPYppKhJWaVDrdo+YiUB1EnJt88hznJNBdICe/aUKoPY9GLt2N2cQ3 tSxDQtHx4c88kOnXLYBySw==; In-Reply-To: (message from Madhu on Sun, 21 Jul 2024 11:52:53 +0530) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.help:147289 Archived-At: > From: Madhu > Date: Sun, 21 Jul 2024 11:52:53 +0530 > > > I can sort lines with "accents" by calling STRING-COLLATE-LESSP with an > optional "en_US.UTF-8" LOCALE parameter. But would it be possible to > implement the string comparison function in emacs directly using the > unicode data that emacs already has? Emacs doesn't import the Unicode collation data, only the codepoints. string-lessp and friends are based on that, and they compare strings by codepoints only. compare-strings does the same, but it also uses the case-conversion tables (which Emacs does have, partly from Unicode, partly from its own code). The collation data is very large, and in addition depends (in minor, but significant, ways) on the language and country. > If someone has a pointer to the collation rules that have to be > implemented and maybe prior work, I'd appreciate it. -- Thanks, Madhu The rules are described in this Unicode Technical Standard (UTS#10): https://www.unicode.org/reports/tr10/