* Re: case-table functions clobbering extra slots [not found] ` <200502012320.IAA18638@etlken.m17n.org> @ 2005-02-02 0:33 ` Miles Bader 2005-02-03 6:40 ` Richard Stallman 0 siblings, 1 reply; 3+ messages in thread From: Miles Bader @ 2005-02-02 0:33 UTC (permalink / raw) Cc: emacs-pretest-bug, fx, rms, Emacs Devel I suppose one question is: should support for such weird cases be expected to work only specific language environments (e.g., Turkish), or generally? For the purpose of supporting non-reversible mappings, it seems like the unicode suggestion would work -- a case mapping could have a flag meaning "non-reversible", and if the up/down-casing code sees such a flag, save a text property on the result saying what the original character was. So the "dotted-uppercase-I to i" mapping could have a "non-reversible" flag, and the downcasing code would notice this when changing it to a normal "i", and put an `uppercase' text property on the result character. Then if the user subsequently did an upcase, the upcasing code could notice the `uppercase' property and properly change the normal "i" to a dotted-uppercase-I. The same thing would work in the reverse direction for german eszet (upcasing it would change to "SS" and get a `lowercase' property containing the eszet, and presumbly some indication that the two S characters should be merged). The case of up/downcasing from scratch, where there's no text property attached, is obviously language specific for characters which have a one-to-many mapping. It seems like this could be accomplished using a language-environment-specific hook that gets called on _words_ (from the this thread, I get the idea that position within a word is significant) which are noted to be potentially problematic. For efficiency, you probably don't want to call the hook on every word, so in the up/down-case character tables, there could be a "suspicious" flag (since it's usually only a few characters and they're language specific, maybe this should be an alist or something similarly sparse?). The code would just do up/downcasing as normal, except that if a character had the "suspicious" flag set, it would call the hook on the whole word containing it instead, and skip ahead to the next word. In the Turkish case, there'd be a "suspicious" flag for the normal ascii "i" character. As for the interaction of these two mechanisms, I suppose a character should _not_ be considered "suspicious" if it has an appropiate `uppercase' or `lowercase' property, which would mean the hook would only get called on new words. The word-hook is probably unnecessary even for most funny mappings, e.g., in Turkish I guess "i" always gets translated to dotted-uppercase-I, so I suppose the "suspicious alist" could offer language-specific character mappings as well, e.g., if the alist property contained a string, it would just contain the language-specific mapping, -- (?i . "dotted-uppercase-I") for Turkish -- and if `t', would instead mean "suspicious" and result in the word-hook being called (funny greek characters or whatever). [I guess it's not possible to do a perfect job, but it seems possible to at least do a respectable one.] -Miles -- Do not taunt Happy Fun Ball. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: case-table functions clobbering extra slots 2005-02-02 0:33 ` case-table functions clobbering extra slots Miles Bader @ 2005-02-03 6:40 ` Richard Stallman 0 siblings, 0 replies; 3+ messages in thread From: Richard Stallman @ 2005-02-03 6:40 UTC (permalink / raw) Cc: emacs-pretest-bug, emacs-devel, fx, handa So the "dotted-uppercase-I to i" mapping could have a "non-reversible" flag, and the downcasing code would notice this when changing it to a normal "i", and put an `uppercase' text property on the result character. Then if the user subsequently did an upcase, the upcasing code could notice the `uppercase' property and properly change the normal "i" to a dotted-uppercase-I. This is not a useful way to handle the problem of `i' in Turkish. Every `i' has to upcase to dotted-uppercase-I, in Turkish. Doing it only for an `i' that has a special 'upcase' property would not be correct. Perhaps you are thinking of this property as a kind of cache mechanism. If so, the most it could achieve is to increase efficiency. We would still need some other mechanism to get correct results in general. So I think we should first design the other mechanism. If that mechanism turns out not to be fast enough, and if caching could make it faster, we could add caching. Something like the "suspicious flag" might work for that mechanism. ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <E1CuvLc-0007j3-Ko@fencepost.gnu.org>]
* Re: case-table functions clobbering extra slots [not found] ` <E1CuvLc-0007j3-Ko@fencepost.gnu.org> @ 2005-02-02 2:57 ` Kenichi Handa 0 siblings, 0 replies; 3+ messages in thread From: Kenichi Handa @ 2005-02-02 2:57 UTC (permalink / raw) Cc: emacs-pretest-bug, fx, emacs-devel In article <E1CuvLc-0007j3-Ko@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > Downcase of dotted-I is dotted-i (`i'), but, in > lang. env. other than Turkish, upcase of dotted-i must be > dotless-I (`I'). So we need > (set-downcase-syntax dotted-I dotted-i table) > Ok, if we need them both, please add both. I've just installed these changes: o lisp/case-table.el Add set-upcase-syntax and set-downcase-syntax, modify th other function to maintain upcase table. o lisp/international/characters.el, lisp/international/latin-5.el: Setup cases of GREEK-FINAL-SIGMA, Y-WITH-DIAERESIS, I-WITH-DOT-ABOVE, DOTLESS-i. o src/casefiddle.c: Enable casify_object to change characters of different byte size. Fix casify_region to work on such a case correctly. o src/insdel.c Fix bugs of replace_range_2. Now, upcase/downcase commands on the above characters seems to work correctly. But, I found case-fold searching and regexp matching doesn't work as expected. For instance, searching for DOTLESS-i ignores `i' and `I' but find I-WITH-DOT-ABOVE. On the other hand, all of the following returns nil: (staring-match "DOTLESS-i" "i") (staring-match "DOTLESS-i" "I") (staring-match "DOTLESS-i" "I-WITH-DOT-ABOVE") It seems that this behaviour is because compile-pattern is given a downcase table and search_buffer is given a canon table and an equiv table, but I still don't understand the underlying logic of this difference. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-02-03 6:40 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <rzq7jmbd6at.fsf@loveshack> [not found] ` <E1Cr43e-0007Sz-3j@fencepost.gnu.org> [not found] ` <rzq651rvie5.fsf@loveshack> [not found] ` <E1Cs577-0008Mb-WC@fencepost.gnu.org> [not found] ` <rzqu0p5rq35.fsf@loveshack> [not found] ` <200501260656.PAA16289@etlken.m17n.org> [not found] ` <200501270316.MAA20608@etlken.m17n.org> [not found] ` <E1CuHye-0006BN-VV@fencepost.gnu.org> [not found] ` <200501280007.JAA24856@etlken.m17n.org> [not found] ` <E1CuWt5-00069J-6x@fencepost.gnu.org> [not found] ` <200501290053.JAA00216@etlken.m17n.org> [not found] ` <rzqy8eaobws.fsf@loveshack> [not found] ` <200501310021.JAA09264@etlken.m17n.org> [not found] ` <E1Cvy6i-0005gA-OQ@fencepost.gnu.org> [not found] ` <200502012320.IAA18638@etlken.m17n.org> 2005-02-02 0:33 ` case-table functions clobbering extra slots Miles Bader 2005-02-03 6:40 ` Richard Stallman [not found] ` <E1CuvLc-0007j3-Ko@fencepost.gnu.org> 2005-02-02 2:57 ` Kenichi Handa
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).