all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Miles Bader <snogglethorpe@gmail.com>
Cc: emacs-pretest-bug@gnu.org, fx@gnu.org, rms@gnu.org,
	Emacs Devel <emacs-devel@gnu.org>
Subject: Re: case-table functions clobbering extra slots
Date: Wed, 2 Feb 2005 09:33:15 +0900	[thread overview]
Message-ID: <fc339e4a05020116335cf0f02a@mail.gmail.com> (raw)
In-Reply-To: <200502012320.IAA18638@etlken.m17n.org>

I suppose one question is: should support for such weird cases be
expected to work only specific language environments (e.g., Turkish),
or generally?

For the purpose of supporting non-reversible mappings, it seems like
the unicode suggestion would work -- a case mapping could have a flag
meaning "non-reversible", and if the up/down-casing code sees such a
flag, save a text property on the result saying what the original
character was.  So the "dotted-uppercase-I to i" mapping could have a
"non-reversible" flag, and the downcasing code would notice this when
changing it to a normal "i", and put an `uppercase' text property on
the result character.  Then if the user subsequently did an upcase,
the upcasing code could notice the `uppercase' property and properly
change the normal "i" to a dotted-uppercase-I.  The same thing would
work in the reverse direction for german eszet (upcasing it would
change to "SS" and get a `lowercase' property containing the eszet,
and presumbly some indication that the two S characters should be
merged).

The case of up/downcasing from scratch, where there's no text property
attached, is obviously language specific for characters which have a
one-to-many mapping.  It seems like this could be accomplished using a
language-environment-specific hook that gets called on _words_ (from
the this thread, I get the idea that position within a word is
significant) which are noted to be potentially problematic.  For
efficiency, you probably don't want to call the hook on every word, so
in the up/down-case character tables, there could be a "suspicious"
flag (since it's usually only a few characters and they're language
specific, maybe this should be an alist or something similarly
sparse?).  The code would just do up/downcasing as normal, except that
if a character had the "suspicious" flag set, it would call the hook
on the whole word containing it instead, and skip ahead to the next
word.  In the Turkish case, there'd be a "suspicious" flag for the
normal ascii "i" character.

As for the interaction of these two mechanisms, I suppose a character
should _not_ be considered "suspicious" if it has an appropiate
`uppercase' or `lowercase' property, which would mean the hook would
only get called on new words.

The word-hook is probably unnecessary even for most funny mappings,
e.g., in Turkish I guess "i" always gets translated to
dotted-uppercase-I, so I suppose the "suspicious alist" could offer
language-specific character mappings as well, e.g., if the alist
property contained a string, it would just contain the
language-specific mapping, -- (?i . "dotted-uppercase-I") for Turkish
-- and if `t', would instead mean "suspicious" and result in the
word-hook being called (funny greek characters or whatever).

[I guess it's not possible to do a perfect job, but it seems possible
to at least do a respectable one.]

-Miles
-- 
Do not taunt Happy Fun Ball.

       reply	other threads:[~2005-02-02  0:33 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <rzq7jmbd6at.fsf@loveshack>
     [not found] ` <E1Cr43e-0007Sz-3j@fencepost.gnu.org>
     [not found]   ` <rzq651rvie5.fsf@loveshack>
     [not found]     ` <E1Cs577-0008Mb-WC@fencepost.gnu.org>
     [not found]       ` <rzqu0p5rq35.fsf@loveshack>
     [not found]         ` <200501260656.PAA16289@etlken.m17n.org>
     [not found] ` <200501270316.MAA20608@etlken.m17n.org>
     [not found]   ` <E1CuHye-0006BN-VV@fencepost.gnu.org>
     [not found]     ` <200501280007.JAA24856@etlken.m17n.org>
     [not found]       ` <E1CuWt5-00069J-6x@fencepost.gnu.org>
     [not found]         ` <200501290053.JAA00216@etlken.m17n.org>
     [not found]           ` <rzqy8eaobws.fsf@loveshack>
     [not found]             ` <200501310021.JAA09264@etlken.m17n.org>
     [not found]               ` <E1Cvy6i-0005gA-OQ@fencepost.gnu.org>
     [not found]                 ` <200502012320.IAA18638@etlken.m17n.org>
2005-02-02  0:33                   ` Miles Bader [this message]
2005-02-03  6:40                     ` case-table functions clobbering extra slots Richard Stallman
     [not found]           ` <E1CuvLc-0007j3-Ko@fencepost.gnu.org>
2005-02-02  2:57             ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc339e4a05020116335cf0f02a@mail.gmail.com \
    --to=snogglethorpe@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=emacs-pretest-bug@gnu.org \
    --cc=fx@gnu.org \
    --cc=miles@gnu.org \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.