From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603) Date: Wed, 22 Mar 2017 18:06:47 +0200 Message-ID: <831stp70tk.fsf@gnu.org> References: <20170309215150.9562-1-mina86@mina86.com> <20170321012709.19402-1-mina86@mina86.com> <20170321012709.19402-6-mina86@mina86.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1490199347 16472 195.159.176.226 (22 Mar 2017 16:15:47 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 22 Mar 2017 16:15:47 +0000 (UTC) Cc: schwab@linux-m68k.org, 24603@debbugs.gnu.org To: Michal Nazarewicz Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Mar 22 17:15:41 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cqiv9-00033o-7i for geb-bug-gnu-emacs@m.gmane.org; Wed, 22 Mar 2017 17:15:35 +0100 Original-Received: from localhost ([::1]:52036 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cqivF-0006Ux-4o for geb-bug-gnu-emacs@m.gmane.org; Wed, 22 Mar 2017 12:15:41 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38072) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cqinw-0008NY-Tm for bug-gnu-emacs@gnu.org; Wed, 22 Mar 2017 12:08:13 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cqinq-0003DP-Mb for bug-gnu-emacs@gnu.org; Wed, 22 Mar 2017 12:08:08 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:41412) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cqinq-0003D5-J9 for bug-gnu-emacs@gnu.org; Wed, 22 Mar 2017 12:08:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cqinq-0004k7-Bi for bug-gnu-emacs@gnu.org; Wed, 22 Mar 2017 12:08:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 22 Mar 2017 16:08:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24603 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24603-submit@debbugs.gnu.org id=B24603.149019882418159 (code B ref 24603); Wed, 22 Mar 2017 16:08:02 +0000 Original-Received: (at 24603) by debbugs.gnu.org; 22 Mar 2017 16:07:04 +0000 Original-Received: from localhost ([127.0.0.1]:39611 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cqimu-0004io-GE for submit@debbugs.gnu.org; Wed, 22 Mar 2017 12:07:04 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:46614) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cqimr-0004iJ-3T for 24603@debbugs.gnu.org; Wed, 22 Mar 2017 12:07:02 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cqimh-0002Q6-UM for 24603@debbugs.gnu.org; Wed, 22 Mar 2017 12:06:56 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:46649) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cqimh-0002Pz-R1; Wed, 22 Mar 2017 12:06:51 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1195 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cqimh-0001X9-2A; Wed, 22 Mar 2017 12:06:51 -0400 In-reply-to: <20170321012709.19402-6-mina86@mina86.com> (message from Michal Nazarewicz on Tue, 21 Mar 2017 02:27:08 +0100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:130809 Archived-At: > From: Michal Nazarewicz > Date: Tue, 21 Mar 2017 02:27:08 +0100 > > Implement unconditional special casing rules defined in Unicode standard. Thanks. A few comments below. > diff --git a/admin/unidata/unidata-gen.el b/admin/unidata/unidata-gen.el > index 3c5119a8a3d..32b05eacce6 100644 > --- a/admin/unidata/unidata-gen.el > +++ b/admin/unidata/unidata-gen.el > @@ -268,6 +268,33 @@ unidata-prop-alist > The value nil means that the actual property value of a character > is the character itself." > string) > + (special-uppercase > + 2 unidata-gen-table-special-casing "uni-special-uppercase.el" > + "Unicode unconditional special casing mapping. > + > +Property value is nil, denoting no special rules, or a string, denoting > +characters maps into given sequence of characters. Something is wrong with the last sentence. (This problem repeats in other similar sentences in the patch.) > +The mapping includes only unconditional casing rules defined by Unicode." This begs for clarification: what is meant by "unconditional casing"? I think a sentence or two of explanation are due. > +@item special-uppercase > +Corresponds to Unicode unconditional special upper-casing rules. The value Likewise here: the "unconditional" part should be explained. > +is @code{"SS"}. For unassigned codepoints, the value is @code{nil} > +which means @code{uppercase} property needs to be consulted instead. When you say "unassigned codepoints", do you mean codepoints that don't have characters defined for them in Unicode? Because that's the usual meaning of this term in the context of Unicode. If you mean something else, please use some other term. (I think you mean something else, since properties of unassigned codepoints are not really interesting for Lisp programmers.) > +mapping for @code{U+0130} (@sc{latin capital letter i with dot above}) > +the value is @code{"i\u0307"}. For unassigned codepoints, the value is Instead of using "i\u0307", in the hope that the reader will understand it's a string made of 2 characters, I would say that explicitly. > DEFUN ("upcase", Fupcase, Supcase, 1, 1, 0, > doc: /* Convert argument to upper case and return that. > The argument may be a character or string. The result has the same type. > -The argument object is not altered--the value is a copy. > +The argument object is not altered--the value is a copy. If argument > +is a character, characters which map to multiple code points when > +cased, e.g. fi, are returned unchanged. > See also `capitalize', `downcase' and `upcase-initials'. */) Using non-ASCII characters here requires adding a 'coding' cookie to the file's first line. (C sources are not by default decoded as UTF-8, unlike Lisp files.)