From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Miles Bader Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: case-table functions clobbering extra slots Date: Wed, 2 Feb 2005 09:33:15 +0900 Message-ID: References: <200501270316.MAA20608@etlken.m17n.org> <200501280007.JAA24856@etlken.m17n.org> <200501290053.JAA00216@etlken.m17n.org> <200501310021.JAA09264@etlken.m17n.org> <200502012320.IAA18638@etlken.m17n.org> Reply-To: snogglethorpe@gmail.com, miles@gnu.org NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1107305026 25916 80.91.229.2 (2 Feb 2005 00:43:46 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 2 Feb 2005 00:43:46 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, fx@gnu.org, rms@gnu.org, Emacs Devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Feb 02 01:43:41 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Cw8cM-0005Qc-EH for ged-emacs-devel@m.gmane.org; Wed, 02 Feb 2005 01:43:38 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Cw8pN-0005Ii-60 for ged-emacs-devel@m.gmane.org; Tue, 01 Feb 2005 19:57:05 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Cw8o7-0004Sd-Ol for emacs-devel@gnu.org; Tue, 01 Feb 2005 19:55:47 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Cw8nu-0004M1-MC for emacs-devel@gnu.org; Tue, 01 Feb 2005 19:55:35 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Cw8nu-0004IM-2k for emacs-devel@gnu.org; Tue, 01 Feb 2005 19:55:34 -0500 Original-Received: from [64.233.184.206] (helo=wproxy.gmail.com) by monty-python.gnu.org with esmtp (Exim 4.34) id 1Cw8SJ-00087g-PG for emacs-devel@gnu.org; Tue, 01 Feb 2005 19:33:15 -0500 Original-Received: by wproxy.gmail.com with SMTP id 36so110135wri for ; Tue, 01 Feb 2005 16:33:15 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=FDsTvUYLvwC7hM/DDDLsXwPszXdudJSJeo80nUbwHU9doF4uDLQr61tUq9BjuOzoAEQch2R3vgWHVo9iKhvyNRpP4ZbM+l2We4mEAHRb9Xst8cliV/8w7jPnivxvYA3PNgoh+4Q0RVKAX+F4dlVMo5sqyAGkwb1msdeCflLW5bg= Original-Received: by 10.54.20.76 with SMTP id 76mr59042wrt; Tue, 01 Feb 2005 16:33:15 -0800 (PST) Original-Received: by 10.54.19.59 with HTTP; Tue, 1 Feb 2005 16:33:15 -0800 (PST) Original-To: Kenichi Handa In-Reply-To: <200502012320.IAA18638@etlken.m17n.org> X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org X-MailScanner-To: ged-emacs-devel@m.gmane.org Xref: main.gmane.org gmane.emacs.devel:32736 gmane.emacs.pretest.bugs:5730 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:32736 I suppose one question is: should support for such weird cases be expected to work only specific language environments (e.g., Turkish), or generally? For the purpose of supporting non-reversible mappings, it seems like the unicode suggestion would work -- a case mapping could have a flag meaning "non-reversible", and if the up/down-casing code sees such a flag, save a text property on the result saying what the original character was. So the "dotted-uppercase-I to i" mapping could have a "non-reversible" flag, and the downcasing code would notice this when changing it to a normal "i", and put an `uppercase' text property on the result character. Then if the user subsequently did an upcase, the upcasing code could notice the `uppercase' property and properly change the normal "i" to a dotted-uppercase-I. The same thing would work in the reverse direction for german eszet (upcasing it would change to "SS" and get a `lowercase' property containing the eszet, and presumbly some indication that the two S characters should be merged). The case of up/downcasing from scratch, where there's no text property attached, is obviously language specific for characters which have a one-to-many mapping. It seems like this could be accomplished using a language-environment-specific hook that gets called on _words_ (from the this thread, I get the idea that position within a word is significant) which are noted to be potentially problematic. For efficiency, you probably don't want to call the hook on every word, so in the up/down-case character tables, there could be a "suspicious" flag (since it's usually only a few characters and they're language specific, maybe this should be an alist or something similarly sparse?). The code would just do up/downcasing as normal, except that if a character had the "suspicious" flag set, it would call the hook on the whole word containing it instead, and skip ahead to the next word. In the Turkish case, there'd be a "suspicious" flag for the normal ascii "i" character. As for the interaction of these two mechanisms, I suppose a character should _not_ be considered "suspicious" if it has an appropiate `uppercase' or `lowercase' property, which would mean the hook would only get called on new words. The word-hook is probably unnecessary even for most funny mappings, e.g., in Turkish I guess "i" always gets translated to dotted-uppercase-I, so I suppose the "suspicious alist" could offer language-specific character mappings as well, e.g., if the alist property contained a string, it would just contain the language-specific mapping, -- (?i . "dotted-uppercase-I") for Turkish -- and if `t', would instead mean "suspicious" and result in the word-hook being called (funny greek characters or whatever). [I guess it's not possible to do a perfect job, but it seems possible to at least do a respectable one.] -Miles -- Do not taunt Happy Fun Ball.