From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Michal Nazarewicz Newsgroups: gmane.emacs.bugs Subject: bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603) Date: Mon, 03 Apr 2017 11:01:40 +0200 Organization: http://mina86.com/ Message-ID: References: <20170309215150.9562-1-mina86@mina86.com> <20170321012709.19402-1-mina86@mina86.com> <20170321012709.19402-6-mina86@mina86.com> <831stp70tk.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1491210145 11095 195.159.176.226 (3 Apr 2017 09:02:25 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 3 Apr 2017 09:02:25 +0000 (UTC) Cc: schwab@linux-m68k.org, 24603@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Apr 03 11:02:17 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cuxsG-000185-3R for geb-bug-gnu-emacs@m.gmane.org; Mon, 03 Apr 2017 11:02:08 +0200 Original-Received: from localhost ([::1]:58157 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cuxsM-00047K-6N for geb-bug-gnu-emacs@m.gmane.org; Mon, 03 Apr 2017 05:02:14 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56329) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cuxsE-00046w-At for bug-gnu-emacs@gnu.org; Mon, 03 Apr 2017 05:02:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cuxsA-0007OS-Bd for bug-gnu-emacs@gnu.org; Mon, 03 Apr 2017 05:02:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:59896) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cuxsA-0007OO-88 for bug-gnu-emacs@gnu.org; Mon, 03 Apr 2017 05:02:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cuxs9-0008KL-Uz for bug-gnu-emacs@gnu.org; Mon, 03 Apr 2017 05:02:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Michal Nazarewicz Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 03 Apr 2017 09:02:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24603 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24603-submit@debbugs.gnu.org id=B24603.149121010931991 (code B ref 24603); Mon, 03 Apr 2017 09:02:01 +0000 Original-Received: (at 24603) by debbugs.gnu.org; 3 Apr 2017 09:01:49 +0000 Original-Received: from localhost ([127.0.0.1]:58095 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cuxrx-0008Ju-6X for submit@debbugs.gnu.org; Mon, 03 Apr 2017 05:01:49 -0400 Original-Received: from mail-wr0-f174.google.com ([209.85.128.174]:36189) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cuxrw-0008Jj-Ab for 24603@debbugs.gnu.org; Mon, 03 Apr 2017 05:01:48 -0400 Original-Received: by mail-wr0-f174.google.com with SMTP id w11so158408143wrc.3 for <24603@debbugs.gnu.org>; Mon, 03 Apr 2017 02:01:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:from:to:cc:subject:in-reply-to:organization:references:face :openpgp:date:message-id:mime-version:content-transfer-encoding; bh=WC3whJpc8HPun2Evva9704pM8Rjkcn0vmNspahh0wM8=; b=rKPOWk7d7jujU+TsN/eoPbtDwT3nikH4qClxBbiA1sss8pIwmcve8lWNQfnfaKWQZY 8zSYB/Nq6TmXjsbFMvc/b8H0oN7crN5OWbOmEAtDyaUW20nlLo+rColdV5j3QjGWix2I 33/BFZ0/HaN/kXFeOuotuR7Fvh0dIGHDk2laaWwuDBZEFYGyr4a4aulK+DGMHpGoH9oM t46mGD+xtr85Rv1sNw4a5s8SSN/HIYcbYEdR2xJOoMgdD2TxUbvmFMzw3EgUa6oryJF0 jjcI+Ubol1FphwxtVo7CW8N+CTdjXeJFoK6KMARi7Vv5qmVxzPSXFF3KNaIXXv/EbrPK zgyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:in-reply-to :organization:references:face:openpgp:date:message-id:mime-version :content-transfer-encoding; bh=WC3whJpc8HPun2Evva9704pM8Rjkcn0vmNspahh0wM8=; b=V1ik4MYSCfBeVyX0THwxgplxGrLmw7sAYrQyfC/IOJyRWQwRVTTho87zH1a8VgSUfE qxG95hzHe0zftZF7UoyMkOmeqX1PCTLxiFUHCTYl3nplm3hr4V2/HxSqLYMPImfRoPZX HED1gCaMLUkfbMoetYMrjN2FM5uEDHNx/bz+Z2n2zRMwsnk4jLDDzs8j6WJadZfEpY++ So7V7ObeHpRpYgdyRprRdGghF7TbeiKS8YoPdlxhGjQUDPj0XzntIRKZyLUghCkY/+gB lSPF2yNJ4ycX/0L5T3aUB5SssTofnVIYM97u9qMBfeLapcdiHYmSxg+1xnDXxkUsWnvn hfgQ== X-Gm-Message-State: AFeK/H03rQuwsGPS7lsNcTiDLH0aD+UgZrVm0Gf1dCfac9njvUWdqudFiIE1jXGgzZ/yv71G X-Received: by 10.223.142.18 with SMTP id n18mr14197240wrb.188.1491210102158; Mon, 03 Apr 2017 02:01:42 -0700 (PDT) Original-Received: from mpn-glaptop ([2620:0:105f:fd00:d08:d894:e330:76f5]) by smtp.gmail.com with ESMTPSA id 46sm17402446wru.37.2017.04.03.02.01.40 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Mon, 03 Apr 2017 02:01:41 -0700 (PDT) In-Reply-To: <831stp70tk.fsf@gnu.org> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACP0lEQVQ4T23Sv2vbQBQHcBk1xE6WyALX107VUEgmn6+ouUwpEQQ6uRjttkWP4CkBg2M0BQLBdPFZYPsyFYo7qEtKDQ7on+t7+nF2Ux8ahD587717OmNYrOvycHsZ+o2r051wHTHysAvGb8ygvgu4QWT0sCmkgZCIEnlV2X8BtyraazFGDuxhmKSQJMlwHQ7v5MHSNxmz78rfElwAa3ieVD9e+hBhjaPDDG6NgFo2f4wBMNIo5YmRtF0RyDgFjJjlMIWbnuM4x9MMfABGTlN4qgIQB4A1DEyA1BHWtfeWNUMwiVJKoqh97KrkOO+qzgluVYLvFCUKAX73nONeBr7BGMdM6Sg0kuep03VywLaIzRiVr+GAzKlpQIsAFnWAG2e6DT5WmWDiudZMIc6hYrMOmeMQK9WX0B+/RfjzL9DI7Y9/Iayn29Ci0r2i4f9gMimMSZLCDMalgQGU5hnUtqAN0OGvEmO1Wnl0C0wWSCEHnuHBqmygxdxA8oWXwbipoc1EoNR9DqOpBpOJrnr0criQab9ZT4LL+wI+K7GBQH30CrhUruilgP9DRTrhVWZCiAyILP+wiuLeCKGTD6r/nc8LOJcAwR6IBTUs+7CASw3QFZ0MdA2PI3zNziH4ZKVhXCRMBjeZ1DWMekKwDCASwExy+NQ86TaykaDAFHO4aP48y4 fIcDM5yOG8GcTLbOyp8A8azjJI93JFd1EA6yN8sSxMQJWoABqniRZVykYgRXErzrdqExAoUrRb0xfRp8p2A/4XmfilTtkDZ4cAAAAASUVORK5CYII= X-Face: -TR8(rDTHy/(xl?SfWd1|3:TTgDIatE^t'vop%*gVg[kn$t{EpK(P"VQ=~T2#ysNmJKN$"yTRLB4YQs$4{[.]Fc1)*O]3+XO^oXM>Q#b^ix, O)Zbn)q[y06$`e3?C)`CwR9y5riE=fv^X@x$y?D:XO6L&x4f-}}I4=VRNwiA^t1-ZrVK^07.Pi/57c_du'& OpenPGP: id=AC1F5F5CD41888F8CC8458582060401250751FF4; url=http://mina86.com/mina86.pub X-Hashcash: 1:20:170403:eliz@gnu.org::If2TxNJewWYfU5ck:000002Rgj X-Hashcash: 1:20:170403:24603@debbugs.gnu.org::sDARIGicihwr4ScM:00000000000000000000000000000000000000002EAn X-Hashcash: 1:20:170403:schwab@linux-m68k.org::oYPI5mAeofK07t/H:00000000000000000000000000000000000000003R94 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:131242 Archived-At: On Wed, Mar 22 2017, Eli Zaretskii wrote: >> From: Michal Nazarewicz >> Date: Tue, 21 Mar 2017 02:27:08 +0100 >>=20 >> Implement unconditional special casing rules defined in Unicode standard. > > Thanks. A few comments below. Diff with fixes attached. The rest of the patchset stays unchanged. I figured that posting just the fixes is most readable (rather than sending the full patch again). Unless there are more comments I=E2=80=99ll push the commits in a couple of days. >> diff --git a/admin/unidata/unidata-gen.el b/admin/unidata/unidata-gen.el >> index 3c5119a8a3d..32b05eacce6 100644 >> --- a/admin/unidata/unidata-gen.el >> +++ b/admin/unidata/unidata-gen.el >> @@ -268,6 +268,33 @@ unidata-prop-alist >> The value nil means that the actual property value of a character >> is the character itself." >> string) >> + (special-uppercase >> + 2 unidata-gen-table-special-casing "uni-special-uppercase.el" >> + "Unicode unconditional special casing mapping. >> + >> +Property value is nil, denoting no special rules, or a string, denoting >> +characters maps into given sequence of characters. > > Something is wrong with the last sentence. (This problem repeats in > other similar sentences in the patch.) > >> +The mapping includes only unconditional casing rules defined by Unicode= ." > > This begs for clarification: what is meant by "unconditional casing"? > I think a sentence or two of explanation are due. @@ -272,28 +272,37 @@ unidata-prop-alist 2 unidata-gen-table-special-casing "uni-special-uppercase.el" "Unicode unconditional special casing mapping. =20 -Property value is nil, denoting no special rules, or a string, denoting -characters maps into given sequence of characters. The string may be empt= y. +Property value is (possibly empty) string or nil. The value nil denotes t= hat +`uppercase' property should be consulted instead. A string denotes what +sequence of characters given character maps into. =20 -The mapping includes only unconditional casing rules defined by Unicode." +This mapping includes language- and context-independent special casing rul= es +defined by Unicode only. It also does not include association which would +duplicate information from `uppercase' property." nil) (special-lowercase 0 unidata-gen-table-special-casing "uni-special-lowercase.el" "Unicode unconditional special casing mapping. =20 -Property value is nil, denoting no special rules, or a string, denoting -characters maps into given sequence of characters. The string may be empt= y. +Property value is (possibly empty) string or nil. The value nil denotes t= hat +`lowercase' property should be consulted instead. A string denotes what +sequence of characters given character maps into. =20 -The mapping includes only unconditional casing rules defined by Unicode." +This mapping includes language- and context-independent special casing rul= es +defined by Unicode only. It also does not include association which would +duplicate information from `lowercase' property." nil) (special-titlecase 1 unidata-gen-table-special-casing "uni-special-titlecase.el" "Unicode unconditional special casing mapping. =20 -Property value is nil, denoting no special rules, or a string, denoting -characters maps into given sequence of characters. The string may be empt= y. +Property value is (possibly empty) string or nil. The value nil denotes t= hat +`titlecase' property should be consulted instead. A string denotes what +sequence of characters given character maps into. =20 -The mapping includes only unconditional casing rules defined by Unicode." +This mapping includes language- and context-independent special casing rul= es +defined by Unicode only. It also does not include association which would +duplicate information from `titlecase' property." nil) (mirroring unidata-gen-mirroring-list unidata-gen-table-character "uni-mirrored.= el" >> +@item special-uppercase >> +Corresponds to Unicode unconditional special upper-casing rules. The v= alue > > Likewise here: the "unconditional" part should be explained. > >> +is @code{"SS"}. For unassigned codepoints, the value is @code{nil} >> +which means @code{uppercase} property needs to be consulted instead. > > When you say "unassigned codepoints", do you mean codepoints that > don't have characters defined for them in Unicode? Because that's the > usual meaning of this term in the context of Unicode. If you mean > something else, please use some other term. (I think you mean > something else, since properties of unassigned codepoints are not > really interesting for Lisp programmers.) > >> +mapping for @code{U+0130} (@sc{latin capital letter i with dot above}) >> +the value is @code{"i\u0307"}. For unassigned codepoints, the value is > > Instead of using "i\u0307", in the hope that the reader will > understand it's a string made of 2 characters, I would say that > explicitly. @@ -621,26 +621,27 @@ Character Properties is @code{nil}, which means the character itself. =20 @item special-uppercase -Corresponds to Unicode unconditional special upper-casing rules. The value -of this property is a string (which may be empty). For example -mapping for @code{U+00DF} (@sc{latin smpall letter sharp s}) the value -is @code{"SS"}. For unassigned codepoints, the value is @code{nil} +Corresponds to Unicode language- and context-independent special upper-cas= ing +rules. The value of this property is a string (which may be empty). For +example mapping for @code{U+00DF} (@sc{latin small letter sharp s}) is +@code{"SS"}. For characters with no special mapping, the value is @code{n= il} which means @code{uppercase} property needs to be consulted instead. =20 @item special-lowercase -Corresponds to Unicode unconditional special lower-casing rules. The -value of this property is a string (which may be empty). For example -mapping for @code{U+0130} (@sc{latin capital letter i with dot above}) -the value is @code{"i\u0307"}. For unassigned codepoints, the value is -@code{nil} which means @code{lowercase} property needs to be consulted -instead. +Corresponds to Unicode language- and context-independent special lower-cas= ing +rules. The value of this property is a string (which may be empty). For +example mapping for @code{U+0130} (@sc{latin capital letter i with dot abo= ve}) +the value is @code{"i\u0307"} (i.e. 2-character string consisting of @sc{l= atin +small letter i} followed by @sc{combining dot above}). For characters wit= h no +special mapping, the value is @code{nil} which means @code{lowercase} prop= erty +needs to be consulted instead. =20 @item special-titlecase -Corresponds to Unicode unconditional special title-casing rules. The -value of this property is a string (which may be empty). For example -mapping for @code{U+FB01} (@sc{latin small ligature fi}) the value is -@code{"Fi"}. For unassigned codepoints, the value is @code{nil} which -means @code{titlecase} property needs to be consulted instead. +Corresponds to Unicode unconditional special title-casing rules. The valu= e of +this property is a string (which may be empty). For example mapping for +@code{U+FB01} (@sc{latin small ligature fi}) the value is @code{"Fi"}. For +characters with no special mapping, the value is @code{nil} which means +@code{titlecase} property needs to be consulted instead. @end table =20 @defun get-char-code-property char propname >> DEFUN ("upcase", Fupcase, Supcase, 1, 1, 0, >> doc: /* Convert argument to upper case and return that. >> The argument may be a character or string. The result has the same typ= e. >> -The argument object is not altered--the value is a copy. >> +The argument object is not altered--the value is a copy. If argument >> +is a character, characters which map to multiple code points when >> +cased, e.g. =EF=AC=81, are returned unchanged. >> See also `capitalize', `downcase' and `upcase-initials'. */) > > Using non-ASCII characters here requires adding a 'coding' cookie to > the file's first line. (C sources are not by default decoded as > UTF-8, unlike Lisp files.) @@ -1,3 +1,4 @@ +/* -*- coding: utf-8 -*- */ /* GNU Emacs case conversion functions. =20 Copyright (C) 1985, 1994, 1997-1999, 2001-2017 Free Software Foundation, --=20 Best regards =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9C=F0=9D=93=B6=F0=9D=93=B2=F0=9D=93=B7= =F0=9D=93=AA86=E2=80=9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83= =84 =C2=ABIf at first you don=E2=80=99t succeed, give up skydiving=C2=BB