From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Yuri Khan Newsgroups: gmane.emacs.devel Subject: Re: One more string functions change Date: Sun, 29 Jun 2014 00:26:19 +0700 Message-ID: References: <53AD8D59.5000207@yandex.ru> <53AD9FDB.80705@cs.ucla.edu> <83simq6spc.fsf@gnu.org> <53AEEBA3.1030706@yandex.ru> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1403976395 8193 80.91.229.3 (28 Jun 2014 17:26:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 28 Jun 2014 17:26:35 +0000 (UTC) Cc: Eli Zaretskii , Paul Eggert , Emacs developers To: Dmitry Antipov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Jun 28 19:26:28 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1X0wOR-0002E6-94 for ged-emacs-devel@m.gmane.org; Sat, 28 Jun 2014 19:26:27 +0200 Original-Received: from localhost ([::1]:55422 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X0wOQ-0003li-Hj for ged-emacs-devel@m.gmane.org; Sat, 28 Jun 2014 13:26:26 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55183) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X0wOM-0003lI-Nu for emacs-devel@gnu.org; Sat, 28 Jun 2014 13:26:23 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X0wOL-00032U-W7 for emacs-devel@gnu.org; Sat, 28 Jun 2014 13:26:22 -0400 Original-Received: from mail-qg0-x22f.google.com ([2607:f8b0:400d:c04::22f]:39657) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X0wOK-00032G-Ju; Sat, 28 Jun 2014 13:26:20 -0400 Original-Received: by mail-qg0-f47.google.com with SMTP id q108so606612qgd.20 for ; Sat, 28 Jun 2014 10:26:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=Iea2rJsrCh7cFsVOt4B2TEcI/b97wue49WbGHdmn08A=; b=jQr0G+SccNRulLopQAtnc/qzqPb8ZPaFoldhiMhPnMZpvQpEkwq2NaQmclJGuUPh/y F+8ECOkVcOymJHrx60AW7EzTpSBACr2anA/0PTAS6B7R01ZgU4GQwb/3Q3eLXwahCYEF ZRkLaTZ3pcXG2KvciKCldipOJ8zqyvmfwzp8mQc9bPJoN5mg7YEqQTbLR1jynmjPJ/Km bSAFk+hpDkG18tXnwhl2c5tHmHLVWgk//bOCU6bykFojOMjO4tNlUemZC7iP40BLjYGD LEF7BQeBhe0nPLS2Q41A7T40CCdKygG+Xq5Ie+UMeLm0Y6DOQ3NY3Kru8GgELYbzNtX3 LakQ== X-Received: by 10.140.102.37 with SMTP id v34mr21257532qge.115.1403976379924; Sat, 28 Jun 2014 10:26:19 -0700 (PDT) Original-Received: by 10.96.154.73 with HTTP; Sat, 28 Jun 2014 10:26:19 -0700 (PDT) In-Reply-To: <53AEEBA3.1030706@yandex.ru> X-Google-Sender-Auth: gQYmnQGTWMuGuXoitKkSGgwCJHc X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:400d:c04::22f X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:172804 Archived-At: On Sat, Jun 28, 2014 at 11:21 PM, Dmitry Antipov wrot= e: > What's wrong with case tables? If we're talking about Unicode only, > is it enough/possible/desirable to have just one (huge) case table > for all supported characters? It=E2=80=99s not generally possible, because in Turkic locales there is thi= s funny couple of letters, i and dotless =C4=B1. They uppercase into dotted = =C4=B0 and I, respectively. This makes uppercase a function dependent on the locale. Further, comparing strings case-insensitively by downcasing is wrong, because of this funny German letter =C3=9F (sharp s, eszett), and these funny Greek letters =CF=83 (sigma) and =CF=82 (final sigma). Stra=C3=9Fe is case-insensitively equivalent to STRASSE, but they downcase to stra=C3=9Fe and strasse, respectively. Both sigma =CF=83 and final sigma =CF=82 are case-insensitively equivalent to Capital Sigma =CE=A3, but small letters downcase to themselves and Capital Sigma downcases to =CF=83. The right, Unicode-compliant way to compare strings case-insensitively involves a mapping called case folding, which is similar to downcasing, but subtly different. For example, it expands =C3=9F into ss, and normalizes final sigma to normal sigma, and does many other expansions. Case-folded strings are largely not usable for human consumption but only for case-insensitive comparison. Details can be found in the Unicode Standard, section 5.18.