From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Michal Nazarewicz Newsgroups: gmane.emacs.bugs Subject: bug#24603: [RFC 08/18] Support casing characters which map into multiple code points Date: Sun, 29 Jan 2017 00:48:02 +0100 Organization: http://mina86.com/ Message-ID: References: <1475543441-10493-1-git-send-email-mina86@mina86.com> <1475543441-10493-8-git-send-email-mina86@mina86.com> <838tu4o977.fsf@gnu.org> <837f9kk3en.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1485647374 2427 195.159.176.226 (28 Jan 2017 23:49:34 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 28 Jan 2017 23:49:34 +0000 (UTC) User-Agent: Notmuch/0.19+53~g2e63a09 (http://notmuchmail.org) Emacs/26.0.50.3 (x86_64-unknown-linux-gnu) Cc: 24603@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Jan 29 00:49:27 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cXck8-0007RE-Mu for geb-bug-gnu-emacs@m.gmane.org; Sun, 29 Jan 2017 00:49:17 +0100 Original-Received: from localhost ([::1]:53533 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cXckD-0000DO-Qw for geb-bug-gnu-emacs@m.gmane.org; Sat, 28 Jan 2017 18:49:21 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33459) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cXcjz-0000D7-AD for bug-gnu-emacs@gnu.org; Sat, 28 Jan 2017 18:49:15 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cXcju-00027r-Ht for bug-gnu-emacs@gnu.org; Sat, 28 Jan 2017 18:49:07 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:51321) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cXcju-00027J-C8 for bug-gnu-emacs@gnu.org; Sat, 28 Jan 2017 18:49:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cXcjt-0001pE-TO for bug-gnu-emacs@gnu.org; Sat, 28 Jan 2017 18:49:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Michal Nazarewicz Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 28 Jan 2017 23:49:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24603 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24603-submit@debbugs.gnu.org id=B24603.14856472986964 (code B ref 24603); Sat, 28 Jan 2017 23:49:01 +0000 Original-Received: (at 24603) by debbugs.gnu.org; 28 Jan 2017 23:48:18 +0000 Original-Received: from localhost ([127.0.0.1]:49520 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cXcjA-0001oE-Mh for submit@debbugs.gnu.org; Sat, 28 Jan 2017 18:48:18 -0500 Original-Received: from mail-wm0-f52.google.com ([74.125.82.52]:37252) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cXcj6-0001nz-R2 for 24603@debbugs.gnu.org; Sat, 28 Jan 2017 18:48:14 -0500 Original-Received: by mail-wm0-f52.google.com with SMTP id d140so2048858wmd.0 for <24603@debbugs.gnu.org>; Sat, 28 Jan 2017 15:48:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:from:to:cc:subject:in-reply-to:organization:references :user-agent:face:date:message-id:mime-version :content-transfer-encoding; bh=ghSapPko2dVJj92OwKTIb8V1oqvy3EL/yT9jKqdJ0ZE=; b=Qy2qvPqoNS4R5hjr8PAQZr8MXkitf1mPZDhSZ8NEbswwCaj/mGaCDNkEztllrxtbLu ef51JbeBP7jFV7eLXymtDefW7rpvgTOHE5UDsuTtxnVLX7VYXr1LUK2DZwUzBbrsCQsa 9Q7Trhtjgi8FVEvlsDvnGRh6/mgM0ps3+qHfCsKsplLMtkqBSAQ2697MTpAMthUX/5hh L+DK1q8Z6DFEyaaYqmcXTX/PGiflJRHHZkxHW/BKGZQqHL2wAzUgojkR33nmyPTZLOFw dBZKDjm07d/t2bOUG6J3P0sjvhNwDQ+OLYqKf9fnL1Eg7zAkgqDGyeeYX6td02ZYxavR uj1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:in-reply-to :organization:references:user-agent:face:date:message-id :mime-version:content-transfer-encoding; bh=ghSapPko2dVJj92OwKTIb8V1oqvy3EL/yT9jKqdJ0ZE=; b=Gu6CHRtjM2gBiSyfqKYUhtiI2ZFrUWo9rDZjzbvRPge3KS8i9leT7WZnbBz2pv7nDV EQOc161lIk9ZdRuHBpoFV4Ux7ib6ZyJTowwPWUHc/c/edV+bzP4uiFICgDE7E3UXapBH r8JTQ1d2wAs7hn1c6o3AUk6z9IPjkDMWrUhMAsh2DXDBbcFP+UkLKHK3pgN4CoLhWLYR sDBFznk7zR/c36v0z0ZubEzL6J7WIT5HQs92y/rfza+yeK06YdVODmw4QJ0VSNBf5eNl cCRb2lx9UvvMdas0gtp79j9xUE7FWQHy5HWmw943Vs3DQ1m1UEJbUMZf+3NWsaTLs15w 44SQ== X-Gm-Message-State: AIkVDXKHfMNTCJpcoyxl5p/fdq2LUs8aggL2h1a0Q7ZbOhBCCF6//FDBBPW0hiCO0poLFuD1 X-Received: by 10.223.136.206 with SMTP id g14mr12368185wrg.52.1485647286174; Sat, 28 Jan 2017 15:48:06 -0800 (PST) Original-Received: from mpn-glaptop ([2620:0:105f:303:2439:c3e7:7a06:3040]) by smtp.gmail.com with ESMTPSA id n13sm15057514wrn.40.2017.01.28.15.48.03 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sat, 28 Jan 2017 15:48:03 -0800 (PST) In-Reply-To: <837f9kk3en.fsf@gnu.org> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACP0lEQVQ4T23Sv2vbQBQHcBk1xE6WyALX107VUEgmn6+ouUwpEQQ6uRjttkWP4CkBg2M0BQLBdPFZYPsyFYo7qEtKDQ7on+t7+nF2Ux8ahD587717OmNYrOvycHsZ+o2r051wHTHysAvGb8ygvgu4QWT0sCmkgZCIEnlV2X8BtyraazFGDuxhmKSQJMlwHQ7v5MHSNxmz78rfElwAa3ieVD9e+hBhjaPDDG6NgFo2f4wBMNIo5YmRtF0RyDgFjJjlMIWbnuM4x9MMfABGTlN4qgIQB4A1DEyA1BHWtfeWNUMwiVJKoqh97KrkOO+qzgluVYLvFCUKAX73nONeBr7BGMdM6Sg0kuep03VywLaIzRiVr+GAzKlpQIsAFnWAG2e6DT5WmWDiudZMIc6hYrMOmeMQK9WX0B+/RfjzL9DI7Y9/Iayn29Ci0r2i4f9gMimMSZLCDMalgQGU5hnUtqAN0OGvEmO1Wnl0C0wWSCEHnuHBqmygxdxA8oWXwbipoc1EoNR9DqOpBpOJrnr0criQab9ZT4LL+wI+K7GBQH30CrhUruilgP9DRTrhVWZCiAyILP+wiuLeCKGTD6r/nc8LOJcAwR6IBTUs+7CASw3QFZ0MdA2PI3zNziH4ZKVhXCRMBjeZ1DWMekKwDCASwExy+NQ86TaykaDAFHO4aP48y4 fIcDM5yOG8GcTLbOyp8A8azjJI93JFd1EA6yN8sSxMQJWoABqniRZVykYgRXErzrdqExAoUrRb0xfRp8p2A/4XmfilTtkDZ4cAAAAASUVORK5CYII= X-Face: -TR8(rDTHy/(xl?SfWd1|3:TTgDIatE^t'vop%*gVg[kn$t{EpK(P"VQ=~T2#ysNmJKN$"yTRLB4YQs$4{[.]Fc1)*O]3+XO^oXM>Q#b^ix, O)Zbn)q[y06$`e3?C)`CwR9y5riE=fv^X@x$y?D:XO6L&x4f-}}I4=VRNwiA^t1-ZrVK^07.Pi/57c_du'& X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:170128:24603@debbugs.gnu.org::HWoC7fDy6mY4VP2W:00000000000000000000000000000000000000003ZEf X-Hashcash: 1:20:170128:eliz@gnu.org::8vwdUsDOwav+kL4B:000003Rl+ X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:128739 Archived-At: On Fri, Oct 07 2016, Eli Zaretskii wrote: > The way we deal with such augmentations is by having most of the data > auto-generated, and some of it maintained manually. One example is > the current characters.el and charscript.el it loads. Can we use a > similar approach in this case? Experience shows that maintaining > everything manually is error-prone and a huge maintenance head-ache in > the long run, what with a new version of the Unicode Standard > available at least once a year. The majority is handled automatically in both cases. My approach is that rules that are conditionals and those not included in Unicode are manually maintained as C code. In practice, if the Lisp data changes, C code that handles it would have to change as well. For example, if Unicode adds rules for Dutch =E2=80=98i= j=E2=80=99=C2=B9, it would be done by adding an =E2=80=98After_Uppercased_I=E2=80=99 conditio= n but then for that rule to work it=E2=80=99s not enough to include it in Lisp data bu= t it has to be coded in C. =C2=B9 =E2=80=98ij=E2=80=99 at the beginning of a word should be capitalise= d as =E2=80=98IJ=E2=80=99 not =E2=80=98Ij=E2=80=99. There=E2=80=99s also the case of =E2=80=98More_Above=E2=80=99: 0049; 0069 0307; 0049; 0049; lt More_Above; # LATIN CAPITAL LETTER I The rule means that =E2=80=98I * =E2=80=99 (where is any sequence of combining characters) should be lower-cased as =E2=80=98i * =E2=80=99=C2=B2. The way the SpecialCasing rules are structured would want us to scan the string from the where we encountered I to look for any combining characters and indeed this is how some libraries implement it. The problem in Emacs is that the casefiddle.c needs to work on strings as well as buffers which are different data structures. As a result, scanning future characters needs two different cases. So instead, the way I implemented it is by flipping a bit in casing_context so that case_character_impl knows to handle combining characters correctly. =C2=B2 Without addition of the , the tittle (dot above =E2=80=98i=E2=80=99) would disappear when rendering because of the and that=E2=80=99s apparently not how Lithuanian is supposed to work. So, yeah=E2=80=A6 Of course, I=E2=80=99m a bit biased by the virtue of hav= ing the code already written and not wanting to rewrite it (which will probably take me another few months, *sighs*) but with the conditional casing rules I=E2=80=99m honestly not convinced at the moment that trying to keep them in Lisp data would be better. Attached below is a new version of 08/18 with the unconditional casing rules moved from C code to a uniprop char table (I haven=E2=80=99t updated commit message yet). (Compared to previous version it=E2=80=99s a bit more C code but overall 200-line AWK script is replaced by around 50 lines of Elisp so overall the patch is shorter). This also fixes issues with undo and cursor positioning that I=E2=80=99ve mentioned before. Both versions are available on GitHub: - Elisp version: git://github.com/mina86/emacs.git master-el - C version: git://github.com/mina86/emacs.git master --=20 Best regards =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9C=F0=9D=93=B6=F0=9D=93=B2=F0=9D=93=B7= =F0=9D=93=AA86=E2=80=9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83= =84 =C2=ABIf at first you don=E2=80=99t succeed, give up skydiving=C2=BB >From bbcf826071b158438a03ab3c9fea92528b915bc8 Mon Sep 17 00:00:00 2001 From: Michal Nazarewicz Date: Wed, 5 Oct 2016 00:06:01 +0200 Subject: [PATCH 08/19] Support casing characters which map into multiple co= de points MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit Implement unconditional special casing rules defined in Unicode standard. Among other things, they deal with cases when a single code point is replaced by multiple ones because simple character does not exist (e.g. =EF=AC=81 ligature turning into FL) or is not commonly used (e.g. =C3=9F tu= rning into SS). * admin/unidata/SpecialCasing.txt: New data file pulled from Unicode standard distribution. * admin/unidata/README: Mention SpecialCasing.txt. * src/make-special-casing.awk: New script to generate special-casing.h file from the SpecialCasing.txt data file. * src/casefiddle.c: Include special-casing.h so special casing rules are available and can be used in the translation unit. (struct casing_str_buf): New structure for representing short strings. It=E2=80=99s used to compactly encode special casing rules. (case_character_imlp): New function which can handle one-to-many character mappings. (case_character, case_single_character): Wrappers for the above functions. The former may map one character to multiple code points while the latter does what the former used to do (i.e. handles one-to-one mappings only). (do_casify_integer, do_casify_unibyte_string, do_casify_unibyte_region): Use case_single_character. (do_casify_multibyte_string, do_casify_multibyte_region): Support new features of case_character. * (do_casify_region): Updated after do_casify_multibyte_string changes. (upcase, capitalize, upcase-initials): Update documentation to mention limitations when working on characters. * test/src/casefiddle-tests.el (casefiddle-tests-casing): Update test cases which are now passing. * test/lisp/char-fold-tests.el (char-fold--ascii-upcase, char-fold--ascii-downcase): New functions which behave like old =E2=80=98up= case=E2=80=99 and =E2=80=98downcase=E2=80=99. (char-fold--test-match-exactly): Use the new functions. This is needed because otherwise =EF=AC=81 and similar characters are turned into their mu= lti- -character representation. * doc/lispref/strings.texi: Describe issue with casing characters versus strings. --- admin/unidata/README | 4 + admin/unidata/SpecialCasing.txt | 281 ++++++++++++++++++++++++++++++++++++ admin/unidata/unidata-gen.el | 40 ++++++ doc/lispref/strings.texi | 23 +++ etc/NEWS | 16 ++- src/casefiddle.c | 305 +++++++++++++++++++++++++++++-------= ---- test/lisp/char-fold-tests.el | 12 +- test/src/casefiddle-tests.el | 9 +- 8 files changed, 591 insertions(+), 99 deletions(-) create mode 100644 admin/unidata/SpecialCasing.txt diff --git a/admin/unidata/README b/admin/unidata/README index 534670ce6db..06a66663a72 100644 --- a/admin/unidata/README +++ b/admin/unidata/README @@ -24,3 +24,7 @@ http://www.unicode.org/Public/8.0.0/ucd/Blocks.txt NormalizationTest.txt http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt 2016-07-16 + +SpecialCasing.txt +http://unicode.org/Public/UNIDATA/SpecialCasing.txt +2016-03-03 diff --git a/admin/unidata/SpecialCasing.txt b/admin/unidata/SpecialCasing.= txt new file mode 100644 index 00000000000..b23fa7f7680 --- /dev/null +++ b/admin/unidata/SpecialCasing.txt @@ -0,0 +1,281 @@ +# SpecialCasing-9.0.0.txt +# Date: 2016-03-02, 18:55:13 GMT +# =C2=A9 2016 Unicode=C2=AE, Inc. +# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. = in the U.S. and other countries. +# For terms of use, see http://www.unicode.org/terms_of_use.html +# +# Unicode Character Database +# For documentation, see http://www.unicode.org/reports/tr44/ +# +# Special Casing +# +# This file is a supplement to the UnicodeData.txt file. It does not defin= e any +# properties, but rather provides additional information about the casing = of +# Unicode characters, for situations when casing incurs a change in string= length +# or is dependent on context or locale. For compatibility, the UnicodeData= .txt +# file only contains simple case mappings for characters where they are on= e-to-one +# and independent of context and language. The data in this file, combined= with +# the simple case mappings in UnicodeData.txt, defines the full case mappi= ngs +# Lowercase_Mapping (lc), Titlecase_Mapping (tc), and Uppercase_Mapping (u= c). +# +# Note that the preferred mechanism for defining tailored casing operation= s is +# the Unicode Common Locale Data Repository (CLDR). For more information, = see the +# discussion of case mappings and case algorithms in the Unicode Standard. +# +# All code points not listed in this file that do not have a simple case m= appings +# in UnicodeData.txt map to themselves. +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D +# Format +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D +# The entries in this file are in the following machine-readable format: +# +# ; ; ; <upper>; (<condition_list>;)? # <comment> +# +# <code>, <lower>, <title>, and <upper> provide the respective full case m= appings +# of <code>, expressed as character values in hex. If there is more than o= ne character, +# they are separated by spaces. Other than as used to separate elements, s= paces are +# to be ignored. +# +# The <condition_list> is optional. Where present, it consists of one or m= ore language IDs +# or casing contexts, separated by spaces. In these conditions: +# - A condition list overrides the normal behavior if all of the listed co= nditions are true. +# - The casing context is always the context of the characters in the orig= inal string, +# NOT in the resulting string. +# - Case distinctions in the condition list are not significant. +# - Conditions preceded by "Not_" represent the negation of the condition. +# The condition list is not represented in the UCD as a formal property. +# +# A language ID is defined by BCP 47, with '-' and '_' treated equivalentl= y. +# +# A casing context for a character is defined by Section 3.13 Default Case= Algorithms +# of The Unicode Standard. +# +# Parsers of this file must be prepared to deal with future additions to t= his format: +# * Additional contexts +# * Additional fields +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D +# Unconditional mappings +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +# The German es-zed is special--the normal mapping is to SS. +# Note: the titlecase should never occur in practice. It is equal to title= case(uppercase(<es-zed>)) + +00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S + +# Preserve canonical equivalence for I with dot. Turkic is handled below. + +0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE + +# Ligatures + +FB00; FB00; 0046 0066; 0046 0046; # LATIN SMALL LIGATURE FF +FB01; FB01; 0046 0069; 0046 0049; # LATIN SMALL LIGATURE FI +FB02; FB02; 0046 006C; 0046 004C; # LATIN SMALL LIGATURE FL +FB03; FB03; 0046 0066 0069; 0046 0046 0049; # LATIN SMALL LIGATURE FFI +FB04; FB04; 0046 0066 006C; 0046 0046 004C; # LATIN SMALL LIGATURE FFL +FB05; FB05; 0053 0074; 0053 0054; # LATIN SMALL LIGATURE LONG S T +FB06; FB06; 0053 0074; 0053 0054; # LATIN SMALL LIGATURE ST + +0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN +FB13; FB13; 0544 0576; 0544 0546; # ARMENIAN SMALL LIGATURE MEN NOW +FB14; FB14; 0544 0565; 0544 0535; # ARMENIAN SMALL LIGATURE MEN ECH +FB15; FB15; 0544 056B; 0544 053B; # ARMENIAN SMALL LIGATURE MEN INI +FB16; FB16; 054E 0576; 054E 0546; # ARMENIAN SMALL LIGATURE VEW NOW +FB17; FB17; 0544 056D; 0544 053D; # ARMENIAN SMALL LIGATURE MEN XEH + +# No corresponding uppercase precomposed character + +0149; 0149; 02BC 004E; 02BC 004E; # LATIN SMALL LETTER N PRECEDED BY APOST= ROPHE +0390; 0390; 0399 0308 0301; 0399 0308 0301; # GREEK SMALL LETTER IOTA WITH= DIALYTIKA AND TONOS +03B0; 03B0; 03A5 0308 0301; 03A5 0308 0301; # GREEK SMALL LETTER UPSILON W= ITH DIALYTIKA AND TONOS +01F0; 01F0; 004A 030C; 004A 030C; # LATIN SMALL LETTER J WITH CARON +1E96; 1E96; 0048 0331; 0048 0331; # LATIN SMALL LETTER H WITH LINE BELOW +1E97; 1E97; 0054 0308; 0054 0308; # LATIN SMALL LETTER T WITH DIAERESIS +1E98; 1E98; 0057 030A; 0057 030A; # LATIN SMALL LETTER W WITH RING ABOVE +1E99; 1E99; 0059 030A; 0059 030A; # LATIN SMALL LETTER Y WITH RING ABOVE +1E9A; 1E9A; 0041 02BE; 0041 02BE; # LATIN SMALL LETTER A WITH RIGHT HALF R= ING +1F50; 1F50; 03A5 0313; 03A5 0313; # GREEK SMALL LETTER UPSILON WITH PSILI +1F52; 1F52; 03A5 0313 0300; 03A5 0313 0300; # GREEK SMALL LETTER UPSILON W= ITH PSILI AND VARIA +1F54; 1F54; 03A5 0313 0301; 03A5 0313 0301; # GREEK SMALL LETTER UPSILON W= ITH PSILI AND OXIA +1F56; 1F56; 03A5 0313 0342; 03A5 0313 0342; # GREEK SMALL LETTER UPSILON W= ITH PSILI AND PERISPOMENI +1FB6; 1FB6; 0391 0342; 0391 0342; # GREEK SMALL LETTER ALPHA WITH PERISPOM= ENI +1FC6; 1FC6; 0397 0342; 0397 0342; # GREEK SMALL LETTER ETA WITH PERISPOMENI +1FD2; 1FD2; 0399 0308 0300; 0399 0308 0300; # GREEK SMALL LETTER IOTA WITH= DIALYTIKA AND VARIA +1FD3; 1FD3; 0399 0308 0301; 0399 0308 0301; # GREEK SMALL LETTER IOTA WITH= DIALYTIKA AND OXIA +1FD6; 1FD6; 0399 0342; 0399 0342; # GREEK SMALL LETTER IOTA WITH PERISPOME= NI +1FD7; 1FD7; 0399 0308 0342; 0399 0308 0342; # GREEK SMALL LETTER IOTA WITH= DIALYTIKA AND PERISPOMENI +1FE2; 1FE2; 03A5 0308 0300; 03A5 0308 0300; # GREEK SMALL LETTER UPSILON W= ITH DIALYTIKA AND VARIA +1FE3; 1FE3; 03A5 0308 0301; 03A5 0308 0301; # GREEK SMALL LETTER UPSILON W= ITH DIALYTIKA AND OXIA +1FE4; 1FE4; 03A1 0313; 03A1 0313; # GREEK SMALL LETTER RHO WITH PSILI +1FE6; 1FE6; 03A5 0342; 03A5 0342; # GREEK SMALL LETTER UPSILON WITH PERISP= OMENI +1FE7; 1FE7; 03A5 0308 0342; 03A5 0308 0342; # GREEK SMALL LETTER UPSILON W= ITH DIALYTIKA AND PERISPOMENI +1FF6; 1FF6; 03A9 0342; 03A9 0342; # GREEK SMALL LETTER OMEGA WITH PERISPOM= ENI + +# IMPORTANT-when iota-subscript (0345) is uppercased or titlecased, +# the result will be incorrect unless the iota-subscript is moved to the = end +# of any sequence of combining marks. Otherwise, the accents will go on t= he capital iota. +# This process can be achieved by first transforming the text to NFC befo= re casing. +# E.g. <alpha><iota_subscript><acute> is uppercased to <ALPHA><acute><IOT= A> + +# The following cases are already in the UnicodeData.txt file, so are only= commented here. + +# 0345; 0345; 0345; 0399; # COMBINING GREEK YPOGEGRAMMENI + +# All letters with YPOGEGRAMMENI (iota-subscript) or PROSGEGRAMMENI (iota = adscript) +# have special uppercases. +# Note: characters with PROSGEGRAMMENI are actually titlecase, not upperca= se! + +1F80; 1F80; 1F88; 1F08 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPO= GEGRAMMENI +1F81; 1F81; 1F89; 1F09 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPO= GEGRAMMENI +1F82; 1F82; 1F8A; 1F0A 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND VAR= IA AND YPOGEGRAMMENI +1F83; 1F83; 1F8B; 1F0B 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND VAR= IA AND YPOGEGRAMMENI +1F84; 1F84; 1F8C; 1F0C 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND OXI= A AND YPOGEGRAMMENI +1F85; 1F85; 1F8D; 1F0D 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND OXI= A AND YPOGEGRAMMENI +1F86; 1F86; 1F8E; 1F0E 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND PER= ISPOMENI AND YPOGEGRAMMENI +1F87; 1F87; 1F8F; 1F0F 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND PER= ISPOMENI AND YPOGEGRAMMENI +1F88; 1F80; 1F88; 1F08 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND P= ROSGEGRAMMENI +1F89; 1F81; 1F89; 1F09 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND P= ROSGEGRAMMENI +1F8A; 1F82; 1F8A; 1F0A 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND V= ARIA AND PROSGEGRAMMENI +1F8B; 1F83; 1F8B; 1F0B 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND V= ARIA AND PROSGEGRAMMENI +1F8C; 1F84; 1F8C; 1F0C 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND O= XIA AND PROSGEGRAMMENI +1F8D; 1F85; 1F8D; 1F0D 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND O= XIA AND PROSGEGRAMMENI +1F8E; 1F86; 1F8E; 1F0E 0399; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND P= ERISPOMENI AND PROSGEGRAMMENI +1F8F; 1F87; 1F8F; 1F0F 0399; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND P= ERISPOMENI AND PROSGEGRAMMENI +1F90; 1F90; 1F98; 1F28 0399; # GREEK SMALL LETTER ETA WITH PSILI AND YPOGE= GRAMMENI +1F91; 1F91; 1F99; 1F29 0399; # GREEK SMALL LETTER ETA WITH DASIA AND YPOGE= GRAMMENI +1F92; 1F92; 1F9A; 1F2A 0399; # GREEK SMALL LETTER ETA WITH PSILI AND VARIA= AND YPOGEGRAMMENI +1F93; 1F93; 1F9B; 1F2B 0399; # GREEK SMALL LETTER ETA WITH DASIA AND VARIA= AND YPOGEGRAMMENI +1F94; 1F94; 1F9C; 1F2C 0399; # GREEK SMALL LETTER ETA WITH PSILI AND OXIA = AND YPOGEGRAMMENI +1F95; 1F95; 1F9D; 1F2D 0399; # GREEK SMALL LETTER ETA WITH DASIA AND OXIA = AND YPOGEGRAMMENI +1F96; 1F96; 1F9E; 1F2E 0399; # GREEK SMALL LETTER ETA WITH PSILI AND PERIS= POMENI AND YPOGEGRAMMENI +1F97; 1F97; 1F9F; 1F2F 0399; # GREEK SMALL LETTER ETA WITH DASIA AND PERIS= POMENI AND YPOGEGRAMMENI +1F98; 1F90; 1F98; 1F28 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND PRO= SGEGRAMMENI +1F99; 1F91; 1F99; 1F29 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND PRO= SGEGRAMMENI +1F9A; 1F92; 1F9A; 1F2A 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND VAR= IA AND PROSGEGRAMMENI +1F9B; 1F93; 1F9B; 1F2B 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND VAR= IA AND PROSGEGRAMMENI +1F9C; 1F94; 1F9C; 1F2C 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND OXI= A AND PROSGEGRAMMENI +1F9D; 1F95; 1F9D; 1F2D 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND OXI= A AND PROSGEGRAMMENI +1F9E; 1F96; 1F9E; 1F2E 0399; # GREEK CAPITAL LETTER ETA WITH PSILI AND PER= ISPOMENI AND PROSGEGRAMMENI +1F9F; 1F97; 1F9F; 1F2F 0399; # GREEK CAPITAL LETTER ETA WITH DASIA AND PER= ISPOMENI AND PROSGEGRAMMENI +1FA0; 1FA0; 1FA8; 1F68 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND YPO= GEGRAMMENI +1FA1; 1FA1; 1FA9; 1F69 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND YPO= GEGRAMMENI +1FA2; 1FA2; 1FAA; 1F6A 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND VAR= IA AND YPOGEGRAMMENI +1FA3; 1FA3; 1FAB; 1F6B 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND VAR= IA AND YPOGEGRAMMENI +1FA4; 1FA4; 1FAC; 1F6C 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND OXI= A AND YPOGEGRAMMENI +1FA5; 1FA5; 1FAD; 1F6D 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND OXI= A AND YPOGEGRAMMENI +1FA6; 1FA6; 1FAE; 1F6E 0399; # GREEK SMALL LETTER OMEGA WITH PSILI AND PER= ISPOMENI AND YPOGEGRAMMENI +1FA7; 1FA7; 1FAF; 1F6F 0399; # GREEK SMALL LETTER OMEGA WITH DASIA AND PER= ISPOMENI AND YPOGEGRAMMENI +1FA8; 1FA0; 1FA8; 1F68 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND P= ROSGEGRAMMENI +1FA9; 1FA1; 1FA9; 1F69 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND P= ROSGEGRAMMENI +1FAA; 1FA2; 1FAA; 1F6A 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND V= ARIA AND PROSGEGRAMMENI +1FAB; 1FA3; 1FAB; 1F6B 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND V= ARIA AND PROSGEGRAMMENI +1FAC; 1FA4; 1FAC; 1F6C 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND O= XIA AND PROSGEGRAMMENI +1FAD; 1FA5; 1FAD; 1F6D 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND O= XIA AND PROSGEGRAMMENI +1FAE; 1FA6; 1FAE; 1F6E 0399; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND P= ERISPOMENI AND PROSGEGRAMMENI +1FAF; 1FA7; 1FAF; 1F6F 0399; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND P= ERISPOMENI AND PROSGEGRAMMENI +1FB3; 1FB3; 1FBC; 0391 0399; # GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI +1FBC; 1FB3; 1FBC; 0391 0399; # GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMM= ENI +1FC3; 1FC3; 1FCC; 0397 0399; # GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI +1FCC; 1FC3; 1FCC; 0397 0399; # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI +1FF3; 1FF3; 1FFC; 03A9 0399; # GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI +1FFC; 1FF3; 1FFC; 03A9 0399; # GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMM= ENI + +# Some characters with YPOGEGRAMMENI also have no corresponding titlecases + +1FB2; 1FB2; 1FBA 0345; 1FBA 0399; # GREEK SMALL LETTER ALPHA WITH VARIA AN= D YPOGEGRAMMENI +1FB4; 1FB4; 0386 0345; 0386 0399; # GREEK SMALL LETTER ALPHA WITH OXIA AND= YPOGEGRAMMENI +1FC2; 1FC2; 1FCA 0345; 1FCA 0399; # GREEK SMALL LETTER ETA WITH VARIA AND = YPOGEGRAMMENI +1FC4; 1FC4; 0389 0345; 0389 0399; # GREEK SMALL LETTER ETA WITH OXIA AND Y= POGEGRAMMENI +1FF2; 1FF2; 1FFA 0345; 1FFA 0399; # GREEK SMALL LETTER OMEGA WITH VARIA AN= D YPOGEGRAMMENI +1FF4; 1FF4; 038F 0345; 038F 0399; # GREEK SMALL LETTER OMEGA WITH OXIA AND= YPOGEGRAMMENI + +1FB7; 1FB7; 0391 0342 0345; 0391 0342 0399; # GREEK SMALL LETTER ALPHA WIT= H PERISPOMENI AND YPOGEGRAMMENI +1FC7; 1FC7; 0397 0342 0345; 0397 0342 0399; # GREEK SMALL LETTER ETA WITH = PERISPOMENI AND YPOGEGRAMMENI +1FF7; 1FF7; 03A9 0342 0345; 03A9 0342 0399; # GREEK SMALL LETTER OMEGA WIT= H PERISPOMENI AND YPOGEGRAMMENI + +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D +# Conditional Mappings +# The remainder of this file provides conditional casing data used to prod= uce=20 +# full case mappings. +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D +# Language-Insensitive Mappings +# These are characters whose full case mappings do not depend on language,= but do +# depend on context (which characters come before or after). For more info= rmation +# see the header of this file and the Unicode Standard. +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +# Special case for final form of sigma + +03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA + +# Note: the following cases for non-final are already in the UnicodeData.t= xt file. + +# 03A3; 03C3; 03A3; 03A3; # GREEK CAPITAL LETTER SIGMA +# 03C3; 03C3; 03A3; 03A3; # GREEK SMALL LETTER SIGMA +# 03C2; 03C2; 03A3; 03A3; # GREEK SMALL LETTER FINAL SIGMA + +# Note: the following cases are not included, since they would case-fold i= n lowercasing + +# 03C3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK SMALL LETTER SIGMA +# 03C2; 03C3; 03A3; 03A3; Not_Final_Sigma; # GREEK SMALL LETTER FINAL SIGMA + +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D +# Language-Sensitive Mappings +# These are characters whose full case mappings depend on language and per= haps also +# context (which characters come before or after). For more information +# see the header of this file and the Unicode Standard. +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +# Lithuanian + +# Lithuanian retains the dot in a lowercase i when followed by accents. + +# Remove DOT ABOVE after "i" with upper or titlecase + +0307; 0307; ; ; lt After_Soft_Dotted; # COMBINING DOT ABOVE + +# Introduce an explicit dot above when lowercasing capital I's and J's +# whenever there are more accents above. +# (of the accents used in Lithuanian: grave, acute, tilde above, and ogone= k) + +0049; 0069 0307; 0049; 0049; lt More_Above; # LATIN CAPITAL LETTER I +004A; 006A 0307; 004A; 004A; lt More_Above; # LATIN CAPITAL LETTER J +012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WITH = OGONEK +00CC; 0069 0307 0300; 00CC; 00CC; lt; # LATIN CAPITAL LETTER I WITH GRAVE +00CD; 0069 0307 0301; 00CD; 00CD; lt; # LATIN CAPITAL LETTER I WITH ACUTE +0128; 0069 0307 0303; 0128; 0128; lt; # LATIN CAPITAL LETTER I WITH TILDE + +# =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +# Turkish and Azeri + +# I and i-dotless; I-dot and i are case pairs in Turkish and Azeri +# The following rules handle those cases. + +0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE +0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE + +# When lowercasing, remove dot_above in the sequence I + dot_above, which = will turn into i. +# This matches the behavior of the canonically equivalent I-dot_above + +0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE +0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE + +# When lowercasing, unless an I is before a dot_above, it turns into a dot= less i. + +0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I +0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I + +# When uppercasing, i turns into a dotted capital I + +0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I +0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I + +# Note: the following case is already in the UnicodeData.txt file. + +# 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I + +# EOF + diff --git a/admin/unidata/unidata-gen.el b/admin/unidata/unidata-gen.el index 3c5119a8a3d..5575f0e745a 100644 --- a/admin/unidata/unidata-gen.el +++ b/admin/unidata/unidata-gen.el @@ -268,6 +268,20 @@ unidata-prop-alist The value nil means that the actual property value of a character is the character itself." string) + (special-casing + nil unidata-gen-table-special-casing "uni-special-casing.el" + "Unicode special casing mapping. + +Property value is nil or a three-element list of strings or characters. E= ach +element denotes what characters maps into when upper-casing, lower-casing = or +title-casing respectively. String is used when the mapping is into an emp= ty +string or more than one character. + +The value nil means that no special casing rules exist for the character a= nd +`uppercase', `lowercase' or `titlecase' property needs to be consulted. + +The mapping includes only unconditional casing rules defined by Unicode." + nil) (mirroring unidata-gen-mirroring-list unidata-gen-table-character "uni-mirrored.= el" "Unicode bidi-mirroring characters. @@ -1084,6 +1098,32 @@ unidata-gen-table-decomposition =20 =20 + +(defun unidata-gen-table-special-casing (prop &rest ignore) + (let ((table (make-char-table 'char-code-property-table))) + (set-char-table-extra-slot table 0 prop) + (with-temp-buffer + (insert-file-contents (expand-file-name "SpecialCasing.txt" unidata-= dir)) + (goto-char (point-min)) + (while (not (eobp)) + (unless (or (eq (char-after) ?\n) (eq (char-after) ?#)) ;empty lin= e or comment + (let ((line (split-string + (buffer-substring (point) (progn (end-of-line) (poi= nt))) + ";" ""))) + ;; Ignore entries with conditions, i.e. those with six values. + (when (=3D (length line) 5) + (let ((ch (string-to-number (pop line) 16)) lo tc up) + (dolist (var '(lo tc up)) + (let ((v (mapcar (lambda (num) (string-to-number num 16)) + (split-string (pop line))))) + (set var (if (or (null v) (cdr v)) (apply 'string v) (= car v))))) + ;; Order must match order of case_action enum fields defin= ed in + ;; src/casefiddle.c + (set-char-table-range table ch (list up lo tc)))))) + (forward-line))) + table)) + + (defun unidata-describe-general-category (val) (cdr (assq val '((nil . "Uknown") diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index cf47db4a814..ba1cf2606ce 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -1166,6 +1166,29 @@ Case Conversion @end example @end defun =20 + Note that case conversion is not a one-to-one mapping and the length +of the result may differ from the length of the argument (including +being shorter). Furthermore, because passing a character forces +return type to be a character, functions are unable to perform proper +substitution and result may differ compared to treating +a one-character string. For example: + +@example +@group +(upcase "=EF=AC=81") ; note: single character, ligature "fi" + @result{} "FI" +@end group +@group +(upcase ?=EF=AC=81) + @result{} 64257 ; i.e. ?=EF=AC=81 +@end group +@end example + + To avoid this, a character must first be converted into a string, +using @code{string} function, before being passed to one of the casing +functions. Of course, no assumptions on the length of the result may +be made. + @xref{Text Comparison}, for functions that compare strings; some of them ignore case differences, or can optionally ignore case differences. =20 diff --git a/etc/NEWS b/etc/NEWS index 03790cac53f..bac396ecc18 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -325,13 +325,17 @@ same as in modes where the character is not whitespac= e. Instead of only checking the modification time, Emacs now also checks the file's actual content before prompting the user. =20 -** Title case characters are properly cased (from and into). -'upcase', 'upcase-region' et al. convert title case characters (such -as =C7=B2) into their upper case form (such as =C7=B1). +** Various casing improvements. =20 -Similarly, 'capitalize', 'upcase-initials' et al. make use of -title-case forms of initial characters (correctly producing for example -=C7=85ungla instead of incorrect =C7=84ungla). +*** 'upcase', 'upcase-region' et al. convert title case characters +(such as =C7=B2) into their upper case form (such as =C7=B1). + +*** 'capitalize', 'upcase-initials' et al. make use of title-case forms +of initial characters (correctly producing for example =C7=85ungla instead +of incorrect =C7=84ungla). + +*** Characters which turn into multiple ones when cased are correctly hand= led. +For example, =EF=AC=81 ligature is converted to FI when upper cased. =20 * Changes in Specialized Modes and Packages in Emacs 26.1 diff --git a/src/casefiddle.c b/src/casefiddle.c index c09d0609367..8a03eaabeaf 100644 --- a/src/casefiddle.c +++ b/src/casefiddle.c @@ -29,6 +29,7 @@ along with GNU Emacs. If not, see <http://www.gnu.org/li= censes/>. */ #include "composite.h" #include "keymap.h" =20 +/* Order must match order in unidata-gen-table-special-casing. */ enum case_action {CASE_UP, CASE_DOWN, CASE_CAPITALIZE, CASE_CAPITALIZE_UP}; =20 /* State for casing individual characters. */ @@ -37,6 +38,9 @@ struct casing_context { implies flag being CASE_CAPITALIZE or CASE_CAPITALIZE_UP (but the rev= erse is not true). */ Lisp_Object titlecase_char_table; + /* The special-casing Unicode properties case table with unconditional s= pecial + casing rules defined by Unicode. */ + Lisp_Object specialcase_char_table; /* User-requested action. */ enum case_action flag; /* If true, function operates on a buffer as opposed to a string or char= acter. @@ -61,6 +65,8 @@ prepare_casing_context (struct casing_context *ctx, ctx->titlecase_char_table =3D (int)flag >=3D (int)CASE_CAPITALIZE ? uniprop_table (intern_c_string ("titlecase")) : Qnil; + ctx->specialcase_char_table =3D + uniprop_table (intern_c_string ("special-casing")); =20 /* If the case table is flagged as modified, rescan it. */ if (NILP (XCHAR_TABLE (BVAR (current_buffer, downcase_table))->extras[1]= )) @@ -70,25 +76,117 @@ prepare_casing_context (struct casing_context *ctx, SETUP_BUFFER_SYNTAX_TABLE (); /* For syntax_prefix_flag_p. */ } =20 -/* Based on CTX, case character CH accordingly. Update CTX as necessary. - Return cased character. */ +struct casing_str_buf { + unsigned char data[MAX_MULTIBYTE_LENGTH > 6 ? MAX_MULTIBYTE_LENGTH : 6]; + unsigned char len_chars; + unsigned char len_bytes; +}; + +/* Based on CTX, case character CH. If BUF is NULL, return cased characte= r. + Otherwise, if BUF is non-NULL, save result in it and return whether the + character has been changed. + + Since meaning of return value depends on arguments, it=E2=80=99s more c= onvenient to + use case_single_character or case_character instead. */ static int -case_character (struct casing_context *ctx, int ch) +case_character_impl (struct casing_str_buf *buf, + struct casing_context *ctx, int ch) { + enum case_action flag; Lisp_Object prop; + bool was_inword; + int cased; + + /* Update inword state */ + was_inword =3D ctx->inword; + if ((int) ctx->flag >=3D (int) CASE_CAPITALIZE) + ctx->inword =3D SYNTAX (ch) =3D=3D Sword && + (!ctx->inbuffer || was_inword || !syntax_prefix_flag_p (ch)); + + /* Normalise flag so its one of CASE_UP, CASE_DOWN or CASE_CAPITALIZE. */ + if (!was_inword) + flag =3D ctx->flag =3D=3D CASE_UP ? CASE_UP : CASE_CAPITALIZE; + else if (ctx->flag !=3D CASE_CAPITALIZE_UP) + flag =3D CASE_DOWN; + else + { + cased =3D ch; + goto done; + } =20 - if (ctx->inword) - ch =3D ctx->flag =3D=3D CASE_CAPITALIZE_UP ? ch : downcase (ch); + /* Look through the special casing entries. */ + if (buf && !NILP(ctx->specialcase_char_table)) + { + prop =3D CHAR_TABLE_REF(ctx->specialcase_char_table, ch); + switch (flag) { + case CASE_CAPITALIZE: + case CASE_CAPITALIZE_UP: + if (!CONSP(prop)) + break; + prop =3D XCDR(prop); + /* FALL THROUGH */ + case CASE_DOWN: + if (!CONSP(prop)) + break; + prop =3D XCDR(prop); + /* FALL THROUGH */ + default: + if (!CONSP(prop)) + break; + prop =3D XCAR(prop); + if (INTEGERP(prop)) { + cased =3D XINT(prop); + if (0 <=3D cased && cased <=3D MAX_CHAR) + goto done; + } else if (STRINGP(prop)) { + struct Lisp_String *str =3D XSTRING(prop); + if (STRING_BYTES(str) <=3D sizeof buf->data) { + buf->len_chars =3D str->size; + buf->len_bytes =3D STRING_BYTES(str); + memcpy(buf->data, str->data, buf->len_bytes); + return 1; + } + } + } + } + + /* Handle simple, one-to-one case. */ + if (flag =3D=3D CASE_DOWN) + cased =3D downcase (ch); else if (!NILP (ctx->titlecase_char_table) && CHARACTERP (prop =3D CHAR_TABLE_REF (ctx->titlecase_char_table, ch))) - ch =3D XFASTINT (prop); + cased =3D XFASTINT (prop); else - ch =3D upcase(ch); + cased =3D upcase(ch); + + /* And we=E2=80=99re done. */ + done: + if (!buf) + return cased; + buf->len_chars =3D 1; + buf->len_bytes =3D CHAR_STRING (cased, buf->data); + return cased !=3D ch; +} =20 - if ((int) ctx->flag >=3D (int) CASE_CAPITALIZE) - ctx->inword =3D SYNTAX (ch) =3D=3D Sword && - (!ctx->inbuffer || ctx->inword || !syntax_prefix_flag_p (ch)); - return ch; +/* Based on CTX, case character CH accordingly. Update CTX as necessary. + Return cased character. + + Special casing rules (such as upcase(=EF=AC=81) =3D FI) are not handled= . For + characters whose casing results in multiple code points, the character = is + returned unchanged. */ +static inline int +case_single_character (struct casing_context *ctx, int ch) +{ + return case_character_impl (NULL, ctx, ch); +} + +/* Save in BUF result of casing character CH. Return whether casing chang= ed the + character. This is like case_single_character but also handles one-to-= many + casing rules. */ +static inline bool +case_character (struct casing_str_buf *buf, struct casing_context *ctx, in= t ch) +{ + return case_character_impl (buf, ctx, ch); } static Lisp_Object @@ -115,7 +213,7 @@ do_casify_integer (struct casing_context *ctx, Lisp_Obj= ect obj) !NILP (BVAR (current_buffer, enable_multibyte_characters))); if (! multibyte) MAKE_CHAR_MULTIBYTE (ch); - cased =3D case_character (ctx, ch); + cased =3D case_single_character (ctx, ch); if (cased =3D=3D ch) return obj; =20 @@ -128,25 +226,34 @@ do_casify_integer (struct casing_context *ctx, Lisp_O= bject obj) static Lisp_Object do_casify_multibyte_string (struct casing_context *ctx, Lisp_Object obj) { - ptrdiff_t i, i_byte, size =3D SCHARS (obj); - int len, ch, cased; + /* We assume data is the first member of casing_str_buf structure so tha= t if + we cast a (char *) into (struct casing_str_buf *) the representation = of the + character is at the beginning of the buffer. This is why we don=E2= =80=99t need + separate struct casing_str_buf object but rather write directly to o.= */ + typedef char static_assertion[offsetof(struct casing_str_buf, data) ? -1= : 1]; + + ptrdiff_t size =3D SCHARS (obj), n; + int ch; USE_SAFE_ALLOCA; - ptrdiff_t o_size; - if (INT_MULTIPLY_WRAPV (size, MAX_MULTIBYTE_LENGTH, &o_size)) - o_size =3D PTRDIFF_MAX; - unsigned char *dst =3D SAFE_ALLOCA (o_size); + if (INT_MULTIPLY_WRAPV (size, MAX_MULTIBYTE_LENGTH, &n) || + INT_ADD_WRAPV (n, sizeof(struct casing_str_buf), &n)) + n =3D PTRDIFF_MAX; + unsigned char *const dst =3D SAFE_ALLOCA (n), *const dst_end =3D dst + n; unsigned char *o =3D dst; =20 - for (i =3D i_byte =3D 0; i < size; i++, i_byte +=3D len) + const unsigned char *src =3D SDATA (obj); + + for (n =3D 0; size; --size) { - if (o_size - MAX_MULTIBYTE_LENGTH < o - dst) + if (dst_end - o < sizeof(struct casing_str_buf)) string_overflow (); - ch =3D STRING_CHAR_AND_LENGTH (SDATA (obj) + i_byte, len); - cased =3D case_character (ctx, ch); - o +=3D CHAR_STRING (cased, o); + ch =3D STRING_CHAR_ADVANCE (src); + case_character ((void *)o, ctx, ch); + n +=3D ((struct casing_str_buf *)o)->len_chars; + o +=3D ((struct casing_str_buf *)o)->len_bytes; } - eassert (o - dst <=3D o_size); - obj =3D make_multibyte_string ((char *) dst, size, o - dst); + eassert (o <=3D dst_end); + obj =3D make_multibyte_string ((char *) dst, n, o - dst); SAFE_FREE (); return obj; } @@ -162,7 +269,7 @@ do_casify_unibyte_string (struct casing_context *ctx, L= isp_Object obj) { ch =3D SREF (obj, i); MAKE_CHAR_MULTIBYTE (ch); - cased =3D case_character (ctx, ch); + cased =3D case_single_character (ctx, ch); if (ch =3D=3D cased) continue; MAKE_CHAR_UNIBYTE (cased); @@ -194,7 +301,9 @@ casify_object (enum case_action flag, Lisp_Object obj) DEFUN ("upcase", Fupcase, Supcase, 1, 1, 0, doc: /* Convert argument to upper case and return that. The argument may be a character or string. The result has the same type. -The argument object is not altered--the value is a copy. +The argument object is not altered--the value is a copy. If argument +is a character, characters which map to multiple code points when +cased, e.g. =EF=AC=81, are returned unchanged. See also `capitalize', `downcase' and `upcase-initials'. */) (Lisp_Object obj) { @@ -215,7 +324,9 @@ DEFUN ("capitalize", Fcapitalize, Scapitalize, 1, 1, 0, This means that each word's first character is upper case (more precisely, if available, title case) and the rest is lower case. The argument may be a character or string. The result has the same type. -The argument object is not altered--the value is a copy. */) +The argument object is not altered--the value is a copy. If argument +is a character, characters which map to multiple code points when +cased, e.g. =EF=AC=81, are returned unchanged. */) (Lisp_Object obj) { return casify_object (CASE_CAPITALIZE, obj); @@ -228,21 +339,28 @@ DEFUN ("upcase-initials", Fupcase_initials, Supcase_i= nitials, 1, 1, 0, (More precisely, if available, initial of each word is converted to title-case). Do not change the other letters of each word. The argument may be a character or string. The result has the same type. -The argument object is not altered--the value is a copy. */) +The argument object is not altered--the value is a copy. If argument +is a character, characters which map to multiple code points when +cased, e.g. =EF=AC=81, are returned unchanged. */) (Lisp_Object obj) { return casify_object (CASE_CAPITALIZE_UP, obj); } -/* Based on CTX, case region in a unibyte buffer from POS to *ENDP. Return - first position that has changed and save last position in *ENDP. If no - characters were changed, return -1 and *ENDP is unspecified. */ +/* Based on CTX, case region in a unibyte buffer from *STARTP to *ENDP. + + Save first and last positions that has changed in *STARTP and *ENDP + respectively. If no characters were changed, save -1 to *STARTP and le= ave + *ENDP unspecified. + + Always return 0. This is so that interface of this function is the sam= e as + do_casify_multibyte_region. */ static ptrdiff_t do_casify_unibyte_region (struct casing_context *ctx, - ptrdiff_t pos, ptrdiff_t *endp) + ptrdiff_t *startp, ptrdiff_t *endp) { ptrdiff_t first =3D -1, last =3D -1; /* Position of first and last chan= ges. */ - ptrdiff_t end =3D *endp; + ptrdiff_t pos =3D *startp, end =3D *endp; int ch, cased; =20 for (; pos < end; ++pos) @@ -250,11 +368,11 @@ do_casify_unibyte_region (struct casing_context *ctx, ch =3D FETCH_BYTE (pos); MAKE_CHAR_MULTIBYTE (ch); =20 - cased =3D case_character (ctx, ch); + cased =3D case_single_character (ctx, ch); if (cased =3D=3D ch) continue; =20 - last =3D pos; + last =3D pos + 1; if (first < 0) first =3D pos; =20 @@ -262,88 +380,107 @@ do_casify_unibyte_region (struct casing_context *ctx, FETCH_BYTE (pos) =3D cased; } =20 - *endp =3D last + 1; - return first; + *startp =3D first; + *endp =3D last; + return 0; } =20 -/* Based on CTX, case region in a multibyte buffer from POS to *ENDP. Ret= urn - first position that has changed and save last position in *ENDP. If no - characters were changed, return -1 and *ENDP is unspecified. */ +/* Based on CTX, case region in a multibyte buffer from *STARTP to *ENDP. + + Return number of added characters (may be negative if more characters w= ere + deleted then inserted), save first and last positions that has changed = in + *STARTP and *ENDP respectively. If no characters were changed, return = 0, + save -1 to *STARTP and leave *ENDP unspecified. */ static ptrdiff_t do_casify_multibyte_region (struct casing_context *ctx, - ptrdiff_t pos, ptrdiff_t *endp) + ptrdiff_t *startp, ptrdiff_t *endp) { ptrdiff_t first =3D -1, last =3D -1; /* Position of first and last chan= ges. */ - ptrdiff_t pos_byte =3D CHAR_TO_BYTE (pos), end =3D *endp; - ptrdiff_t opoint =3D PT; + ptrdiff_t pos =3D *startp, pos_byte =3D CHAR_TO_BYTE (pos), size =3D *en= dp - pos; + ptrdiff_t opoint =3D PT, added =3D 0; + struct casing_str_buf buf; int ch, cased, len; =20 - while (pos < end) + for (; size; --size) { ch =3D STRING_CHAR_AND_LENGTH (BYTE_POS_ADDR (pos_byte), len); - cased =3D case_character (ctx, ch); - if (cased !=3D ch) + if (!case_character (&buf, ctx, ch)) + { + pos_byte +=3D len; + ++pos; + continue; + } + + last =3D pos + buf.len_chars; + if (first < 0) + first =3D pos; + + if (buf.len_chars =3D=3D 1 && buf.len_bytes =3D=3D len) + memcpy (BYTE_POS_ADDR (pos_byte), buf.data, len); + else { - last =3D pos; - if (first < 0) - first =3D pos; - - if (ASCII_CHAR_P (cased) && ASCII_CHAR_P (ch)) - FETCH_BYTE (pos_byte) =3D cased; - else - { - unsigned char str[MAX_MULTIBYTE_LENGTH]; - int totlen =3D CHAR_STRING (cased, str); - if (len =3D=3D totlen) - memcpy (BYTE_POS_ADDR (pos_byte), str, len); - else - /* Replace one character with the other(s), keeping text - properties the same. */ - replace_range_2 (pos, pos_byte, pos + 1, pos_byte + len, - (char *) str, 9, totlen, 0); - len =3D totlen; - } + /* Replace one character with the other(s), keeping text + properties the same. */ + replace_range_2 (pos, pos_byte, pos + 1, pos_byte + len, + (const char *) buf.data, buf.len_chars, + buf.len_bytes, + 0); + added +=3D (ptrdiff_t) buf.len_chars - 1; + if (opoint > pos) + opoint +=3D (ptrdiff_t) buf.len_chars - 1; } - pos++; - pos_byte +=3D len; + + pos_byte +=3D buf.len_bytes; + pos +=3D buf.len_chars; } =20 if (PT !=3D opoint) TEMP_SET_PT_BOTH (opoint, CHAR_TO_BYTE (opoint)); =20 + *startp =3D first; *endp =3D last; - return first; + return added; } =20 -/* flag is CASE_UP, CASE_DOWN or CASE_CAPITALIZE or CASE_CAPITALIZE_UP. - b and e specify range of buffer to operate on. */ -static void +/* flag is CASE_UP, CASE_DOWN or CASE_CAPITALIZE or CASE_CAPITALIZE_UP. b= and + e specify range of buffer to operate on. Return character position of = the + end of the region after changes. */ +static ptrdiff_t casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e) { + ptrdiff_t start, end, orig_end, added; struct casing_context ctx; - ptrdiff_t start, end; - - if (EQ (b, e)) - /* Not modifying because nothing marked */ - return; =20 validate_region (&b, &e); start =3D XFASTINT (b); end =3D XFASTINT (e); + if (start =3D=3D end) + /* Not modifying because nothing marked */ + return end; modify_text (start, end); - record_change (start, end - start); prepare_casing_context (&ctx, flag, true); =20 + orig_end =3D end; + record_delete (start, make_buffer_string (start, end, true), false); if (NILP (BVAR (current_buffer, enable_multibyte_characters))) - start =3D do_casify_unibyte_region (&ctx, start, &end); + { + record_insert (start, end - start); + added =3D do_casify_unibyte_region (&ctx, &start, &end); + } else - start =3D do_casify_multibyte_region (&ctx, start, &end); + { + ptrdiff_t len =3D end - start, ostart =3D start; + added =3D do_casify_multibyte_region (&ctx, &start, &end); + record_insert (ostart, len + added); + } =20 if (start >=3D 0) { - signal_after_change (start, end + 1 - start, end + 1 - start); - update_compositions (start, end + 1, CHECK_ALL); + signal_after_change (start, end - start - added, end - start); + update_compositions (start, end, CHECK_ALL); } + + return orig_end + added; } =20 DEFUN ("upcase-region", Fupcase_region, Supcase_region, 2, 3, @@ -435,9 +572,7 @@ casify_word (enum case_action flag, Lisp_Object arg) ptrdiff_t farend =3D scan_words (PT, XINT (arg)); if (!farend) farend =3D XINT (arg) <=3D 0 ? BEGV : ZV; - ptrdiff_t newpoint =3D max (PT, farend); - casify_region (flag, make_number (PT), make_number (farend)); - SET_PT (newpoint); + SET_PT (casify_region (flag, make_number (PT), make_number (farend))); return Qnil; } =20 diff --git a/test/lisp/char-fold-tests.el b/test/lisp/char-fold-tests.el index d86c731b6e3..00bc3c83d05 100644 --- a/test/lisp/char-fold-tests.el +++ b/test/lisp/char-fold-tests.el @@ -54,6 +54,14 @@ char-fold--test-search-with-contents (concat w1 "\s\n\s\t\f\t\n\r\t" w2) (concat w1 (make-string 10 ?\s) w2))))) =20 +(defun char-fold--ascii-upcase (string) + "Like `upcase' but acts on ASCII characters only." + (replace-regexp-in-string "[a-z]+" 'upcase string)) + +(defun char-fold--ascii-downcase (string) + "Like `downcase' but acts on ASCII characters only." + (replace-regexp-in-string "[a-z]+" 'downcase string)) + (defun char-fold--test-match-exactly (string &rest strings-to-match) (let ((re (concat "\\`" (char-fold-to-regexp string) "\\'"))) (dolist (it strings-to-match) @@ -61,8 +69,8 @@ char-fold--test-match-exactly ;; Case folding (let ((case-fold-search t)) (dolist (it strings-to-match) - (should (string-match (upcase re) (downcase it))) - (should (string-match (downcase re) (upcase it))))))) + (should (string-match (char-fold--ascii-upcase re) (downcase it))) + (should (string-match (char-fold--ascii-downcase re) (upcase it)))= )))) =20 (ert-deftest char-fold--test-some-defaults () (dolist (it '(("ffl" . "=EF=AC=84") ("ffi" . "=EF=AC=83") diff --git a/test/src/casefiddle-tests.el b/test/src/casefiddle-tests.el index d7fe55f97d7..e347ed7b875 100644 --- a/test/src/casefiddle-tests.el +++ b/test/src/casefiddle-tests.el @@ -188,16 +188,13 @@ casefiddle-tests--test-casing ("=C7=84UNGLA" "=C7=84UNGLA" "=C7=86ungla" "=C7=85ungla" "=C7=85UN= GLA") ("=C7=85ungla" "=C7=84UNGLA" "=C7=86ungla" "=C7=85ungla" "=C7=85un= gla") ("=C7=86ungla" "=C7=84UNGLA" "=C7=86ungla" "=C7=85ungla" "=C7=85un= gla") + ("de=EF=AC=81ne" "DEFINE" "de=EF=AC=81ne" "De=EF=AC=81ne" "De=EF= =AC=81ne") + ("=EF=AC=81sh" "FISH" "=EF=AC=81sh" "Fish" "Fish") + ("Stra=C3=9Fe" "STRASSE" "stra=C3=9Fe" "Stra=C3=9Fe" "Stra=C3=9Fe") ;; FIXME: Everything below is broken at the moment. Here=E2=80=99= s what ;; should happen: - ;;("de=EF=AC=81ne" "DEFINE" "de=EF=AC=81ne" "De=EF=AC=81ne" "De=EF= =AC=81ne") - ;;("=EF=AC=81sh" "FIsh" "=EF=AC=81sh" "Fish" "Fish") - ;;("Stra=C3=9Fe" "STRASSE" "stra=C3=9Fe" "Stra=C3=9Fe" "Stra=C3=9F= e") ;;("=CE=8C=CE=A3=CE=9F=CE=A3" "=CE=8C=CE=A3=CE=9F=CE=A3" "=CF=8C= =CF=83=CE=BF=CF=82" "=CE=8C=CF=83=CE=BF=CF=82" "=CE=8C=CF=83=CE=BF=CF=82") ;; And here=E2=80=99s what is actually happening: - ("de=EF=AC=81ne" "DE=EF=AC=81NE" "de=EF=AC=81ne" "De=EF=AC=81ne" "= De=EF=AC=81ne") - ("=EF=AC=81sh" "=EF=AC=81SH" "=EF=AC=81sh" "=EF=AC=81sh" "=EF=AC= =81sh") - ("Stra=C3=9Fe" "STRA=C3=9FE" "stra=C3=9Fe" "Stra=C3=9Fe" "Stra=C3= =9Fe") ("=CE=8C=CE=A3=CE=9F=CE=A3" "=CE=8C=CE=A3=CE=9F=CE=A3" "=CF=8C=CF= =83=CE=BF=CF=83" "=CE=8C=CF=83=CE=BF=CF=83" "=CE=8C=CE=A3=CE=9F=CE=A3") =20 ("=CF=8C=CF=83=CE=BF=CF=82" "=CE=8C=CE=A3=CE=9F=CE=A3" "=CF=8C=CF= =83=CE=BF=CF=82" "=CE=8C=CF=83=CE=BF=CF=82" "=CE=8C=CF=83=CE=BF=CF=82")))))) --=20 2.11.0.483.g087da7b7c-goog