Re: case-table functions clobbering extra slots

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: case-table functions clobbering extra slots
       [not found]                 ` <200502012320.IAA18638@etlken.m17n.org>
@ 2005-02-02  0:33                   ` Miles Bader
  2005-02-03  6:40                     ` Richard Stallman
  0 siblings, 1 reply; 3+ messages in thread
From: Miles Bader @ 2005-02-02  0:33 UTC (permalink / raw)
  Cc: emacs-pretest-bug, fx, rms, Emacs Devel

I suppose one question is: should support for such weird cases be
expected to work only specific language environments (e.g., Turkish),
or generally?

For the purpose of supporting non-reversible mappings, it seems like
the unicode suggestion would work -- a case mapping could have a flag
meaning "non-reversible", and if the up/down-casing code sees such a
flag, save a text property on the result saying what the original
character was.  So the "dotted-uppercase-I to i" mapping could have a
"non-reversible" flag, and the downcasing code would notice this when
changing it to a normal "i", and put an `uppercase' text property on
the result character.  Then if the user subsequently did an upcase,
the upcasing code could notice the `uppercase' property and properly
change the normal "i" to a dotted-uppercase-I.  The same thing would
work in the reverse direction for german eszet (upcasing it would
change to "SS" and get a `lowercase' property containing the eszet,
and presumbly some indication that the two S characters should be
merged).

The case of up/downcasing from scratch, where there's no text property
attached, is obviously language specific for characters which have a
one-to-many mapping.  It seems like this could be accomplished using a
language-environment-specific hook that gets called on _words_ (from
the this thread, I get the idea that position within a word is
significant) which are noted to be potentially problematic.  For
efficiency, you probably don't want to call the hook on every word, so
in the up/down-case character tables, there could be a "suspicious"
flag (since it's usually only a few characters and they're language
specific, maybe this should be an alist or something similarly
sparse?).  The code would just do up/downcasing as normal, except that
if a character had the "suspicious" flag set, it would call the hook
on the whole word containing it instead, and skip ahead to the next
word.  In the Turkish case, there'd be a "suspicious" flag for the
normal ascii "i" character.

As for the interaction of these two mechanisms, I suppose a character
should _not_ be considered "suspicious" if it has an appropiate
`uppercase' or `lowercase' property, which would mean the hook would
only get called on new words.

The word-hook is probably unnecessary even for most funny mappings,
e.g., in Turkish I guess "i" always gets translated to
dotted-uppercase-I, so I suppose the "suspicious alist" could offer
language-specific character mappings as well, e.g., if the alist
property contained a string, it would just contain the
language-specific mapping, -- (?i . "dotted-uppercase-I") for Turkish
-- and if `t', would instead mean "suspicious" and result in the
word-hook being called (funny greek characters or whatever).

[I guess it's not possible to do a perfect job, but it seems possible
to at least do a respectable one.]

-Miles
-- 
Do not taunt Happy Fun Ball.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: case-table functions clobbering extra slots
  2005-02-02  0:33                   ` case-table functions clobbering extra slots Miles Bader
@ 2005-02-03  6:40                     ` Richard Stallman
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Stallman @ 2005-02-03  6:40 UTC (permalink / raw)
  Cc: emacs-pretest-bug, emacs-devel, fx, handa

      So the "dotted-uppercase-I to i" mapping could have a
    "non-reversible" flag, and the downcasing code would notice this when
    changing it to a normal "i", and put an `uppercase' text property on
    the result character.  Then if the user subsequently did an upcase,
    the upcasing code could notice the `uppercase' property and properly
    change the normal "i" to a dotted-uppercase-I.

This is not a useful way to handle the problem of `i' in Turkish.
Every `i' has to upcase to dotted-uppercase-I, in Turkish.  Doing it
only for an `i' that has a special 'upcase' property would not be
correct.

Perhaps you are thinking of this property as a kind of cache
mechanism.  If so, the most it could achieve is to increase
efficiency.  We would still need some other mechanism to get correct
results in general.

So I think we should first design the other mechanism.  If that
mechanism turns out not to be fast enough, and if caching could make
it faster, we could add caching.

Something like the "suspicious flag" might work for that mechanism.

^ permalink raw reply	[flat|nested] 3+ messages in thread

[parent not found: <E1CuvLc-0007j3-Ko@fencepost.gnu.org>]

* Re: case-table functions clobbering extra slots
       [not found]           ` <E1CuvLc-0007j3-Ko@fencepost.gnu.org>
@ 2005-02-02  2:57             ` Kenichi Handa
  0 siblings, 0 replies; 3+ messages in thread
From: Kenichi Handa @ 2005-02-02  2:57 UTC (permalink / raw)
  Cc: emacs-pretest-bug, fx, emacs-devel

In article <E1CuvLc-0007j3-Ko@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>     Downcase of dotted-I is dotted-i (`i'), but, in
>     lang. env. other than Turkish, upcase of dotted-i must be
>     dotless-I (`I').  So we need
>       (set-downcase-syntax dotted-I dotted-i table)

> Ok, if we need them both, please add both.

I've just installed these changes:

o lisp/case-table.el
    Add set-upcase-syntax and set-downcase-syntax, modify
    th other function to maintain upcase table.

o lisp/international/characters.el, lisp/international/latin-5.el:
    Setup cases of GREEK-FINAL-SIGMA, Y-WITH-DIAERESIS, 
    I-WITH-DOT-ABOVE, DOTLESS-i.

o src/casefiddle.c:
    Enable casify_object to change characters of different
    byte size.  Fix casify_region to work on such a case
    correctly.

o src/insdel.c
    Fix bugs of replace_range_2.

Now, upcase/downcase commands on the above characters seems
to work correctly.

But, I found case-fold searching and regexp matching doesn't
work as expected.  For instance, searching for DOTLESS-i
ignores `i' and `I' but find I-WITH-DOT-ABOVE.  On the other
hand, all of the following returns nil:
  (staring-match "DOTLESS-i" "i")
  (staring-match "DOTLESS-i" "I")
  (staring-match "DOTLESS-i" "I-WITH-DOT-ABOVE")

It seems that this behaviour is because compile-pattern is
given a downcase table and search_buffer is given a canon
table and an equiv table, but I still don't understand
the underlying logic of this difference.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-02-03  6:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <rzq7jmbd6at.fsf@loveshack>
     [not found] ` <E1Cr43e-0007Sz-3j@fencepost.gnu.org>
     [not found]   ` <rzq651rvie5.fsf@loveshack>
     [not found]     ` <E1Cs577-0008Mb-WC@fencepost.gnu.org>
     [not found]       ` <rzqu0p5rq35.fsf@loveshack>
     [not found]         ` <200501260656.PAA16289@etlken.m17n.org>
     [not found] ` <200501270316.MAA20608@etlken.m17n.org>
     [not found]   ` <E1CuHye-0006BN-VV@fencepost.gnu.org>
     [not found]     ` <200501280007.JAA24856@etlken.m17n.org>
     [not found]       ` <E1CuWt5-00069J-6x@fencepost.gnu.org>
     [not found]         ` <200501290053.JAA00216@etlken.m17n.org>
     [not found]           ` <rzqy8eaobws.fsf@loveshack>
     [not found]             ` <200501310021.JAA09264@etlken.m17n.org>
     [not found]               ` <E1Cvy6i-0005gA-OQ@fencepost.gnu.org>
     [not found]                 ` <200502012320.IAA18638@etlken.m17n.org>
2005-02-02  0:33                   ` case-table functions clobbering extra slots Miles Bader
2005-02-03  6:40                     ` Richard Stallman
     [not found]           ` <E1CuvLc-0007j3-Ko@fencepost.gnu.org>
2005-02-02  2:57             ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).