From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#17130: 24.4.50; Deficient Unicode case folding Date: Sat, 29 Mar 2014 17:45:47 +0300 Message-ID: <838urtdpwk.fsf@gnu.org> References: <87txair0g7.fsf@ivytech.edu> <83fvm2fhii.fsf@gnu.org> <87ob0qrugy.fsf@nbtrap.com> <83y4ztec5l.fsf@gnu.org> <87ob0pnptc.fsf@nbtrap.com> <83d2h5du2e.fsf@gnu.org> <87eh1lcdaj.fsf@nbtrap.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Trace: ger.gmane.org 1396104386 29505 80.91.229.3 (29 Mar 2014 14:46:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 29 Mar 2014 14:46:26 +0000 (UTC) Cc: 17130@debbugs.gnu.org To: Nathan Trapuzzano Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Mar 29 15:46:19 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTuWY-0001Ka-PD for geb-bug-gnu-emacs@m.gmane.org; Sat, 29 Mar 2014 15:46:18 +0100 Original-Received: from localhost ([::1]:39604 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTuWY-00061L-E0 for geb-bug-gnu-emacs@m.gmane.org; Sat, 29 Mar 2014 10:46:18 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55512) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTuWO-0005t9-Cp for bug-gnu-emacs@gnu.org; Sat, 29 Mar 2014 10:46:13 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTuWJ-0005r5-22 for bug-gnu-emacs@gnu.org; Sat, 29 Mar 2014 10:46:08 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:55108) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTuWI-0005r1-Tv for bug-gnu-emacs@gnu.org; Sat, 29 Mar 2014 10:46:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1WTuWI-0004m8-8b for bug-gnu-emacs@gnu.org; Sat, 29 Mar 2014 10:46:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 29 Mar 2014 14:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 17130 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 17130-submit@debbugs.gnu.org id=B17130.139610435618342 (code B ref 17130); Sat, 29 Mar 2014 14:46:02 +0000 Original-Received: (at 17130) by debbugs.gnu.org; 29 Mar 2014 14:45:56 +0000 Original-Received: from localhost ([127.0.0.1]:56290 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WTuWB-0004ll-0m for submit@debbugs.gnu.org; Sat, 29 Mar 2014 10:45:55 -0400 Original-Received: from mtaout25.012.net.il ([80.179.55.181]:57670) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WTuW5-0004lZ-Mk for 17130@debbugs.gnu.org; Sat, 29 Mar 2014 10:45:51 -0400 Original-Received: from conversion-daemon.mtaout25.012.net.il by mtaout25.012.net.il (HyperSendmail v2007.08) id <0N3700N00CEAHC00@mtaout25.012.net.il> for 17130@debbugs.gnu.org; Sat, 29 Mar 2014 17:44:26 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout25.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0N3700E7FCY279B0@mtaout25.012.net.il>; Sat, 29 Mar 2014 17:44:26 +0300 (IDT) In-reply-to: <87eh1lcdaj.fsf@nbtrap.com> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:87524 Archived-At: > From: Nathan Trapuzzano > Cc: 17130@debbugs.gnu.org > Date: Sat, 29 Mar 2014 10:03:32 -0400 > > Eli Zaretskii writes: > > >> Reading through the manual section on case tables, it seems that this > >> could be supported via the extra "canonicalize" slot: > >> > >> CANONICALIZE > >> The canonicalize table maps all of a set of case-related > >> characters into a particular member of that set. > > > > Not efficiently, no. E.g., how will you find ς from σ, using this > > method? > > σ, ς, and Σ would all have σ in the CANONICALIZE slot, since they all > fold to σ. So you would need to search all characters to find those which have σ in the CANONICALIZE slot -- not very efficient, to say the least. IOW, what you suggest will provide a one-way mapping, whereas we need a two-way mapping. > > Besides, don't we also need to know that ς can only be present at the > > end of a word? > > Don't think so. AFAIK, Unicode says nothing about ordering except when > it comes to combining characters. But even it did prescribe such a > rule, I don't think it would have anything to do with case folding. Who said this is only about case folding? Emacs should use this data for up-casing and down-casing as well, for example, so that M-l downcases Σ to ς, not σ, when it is at the end of the word. Wouldn't users of Greek expect that? > >> If this isn't already used for Unicode case folding, what _is_ it used > >> for? > > > > It is used for case-insensitive regexp matching, see search.c. > > Right, but what I'm asking is: if Emacs doesn't do Unicode case folding, > what is the purpose of the CANONICALIZE slot except as a kind of > placeholder that gets autofilled? Whenever you need the canonical equivalent of a character, such as in case-insensitive search, you need that slot. > Are there other kinds of case folding--other than traditional > upper/lower and Unicode--that I'm not aware of? There's "title case", of course. There are also characters whose case pair is not a single character, but several, like the upper-case variant of ß in German. Basically, any character not marked "C" in the Unicode CaseFolding.txt is special in some way. > I understand that Emacs autofills the CANONICALIZE slot from > the other slots, but only when the CANONICALIZE slot is not already set > to non-nil. What if the CANONICALIZE slot on ς were set to σ? I think > that's all that would have to happen for the Unicode folding to work. > It seems the machinery is already in place. For this case, maybe (and even it doesn't handle Σ correctly, I think, when downcased at the end of the word). For other cases, not necessarily. Personally, I think we need an additional slot for what you want, and code to use it.