unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* i18n search/replace with input methods latin-4-postfix and rfc1345
@ 2005-03-14  3:39 B.T. Raven
  2005-03-14 13:33 ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: B.T. Raven @ 2005-03-14  3:39 UTC (permalink / raw)


I have files with
;; -*- coding: utf-8 -*-
on first line and with many unicode characters. I use either
latin-4-postfix or a lisp routine to input the Latin Extended-A
characters. During an editing session I can search (and replace) only
those extended characters that I have input during that session.
Characters input in former sessions (before a file save and close
buffer) are not seen by any of the flavors of the search command. An
inspection of the characters with C-x = shows that what look like the
same characters have, in fact, different code points.
rfc1345 is not listed among my input methods but I suspect that I am
going to need something other than latin-4-postfix to search for these
characters. I have tried different combinations of unify-on-decode (and
encode) but that hasn't shed any light on my problem. The files seem to
be normal utf-8 files that can be imported into other applications
(Win98) and I can operate on them in emacs (NT build) in every other way
I am familiar with so far, it's just that no searching is possible. Any
suggestions?

Thanks,

Ed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i18n search/replace with input methods latin-4-postfix and rfc1345
  2005-03-14  3:39 i18n search/replace with input methods latin-4-postfix and rfc1345 B.T. Raven
@ 2005-03-14 13:33 ` Stefan Monnier
  2005-03-14 17:43   ` B.T. Raven
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2005-03-14 13:33 UTC (permalink / raw)


> I have files with
> ;; -*- coding: utf-8 -*-
> on first line and with many unicode characters. I use either
> latin-4-postfix or a lisp routine to input the Latin Extended-A
> characters. During an editing session I can search (and replace) only
> those extended characters that I have input during that session.

You need "input unification".  Try to put (unify-8859-on-decoding-mode 1)
in your .emacs.


        Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i18n search/replace with input methods latin-4-postfix and rfc1345
  2005-03-14 13:33 ` Stefan Monnier
@ 2005-03-14 17:43   ` B.T. Raven
  2005-03-15 20:33     ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: B.T. Raven @ 2005-03-14 17:43 UTC (permalink / raw)



"Stefan Monnier" <monnier@iro.umontreal.ca> wrote in message
news:87psy273pq.fsf-monnier+gnu.emacs.help@gnu.org...
> > I have files with
> > ;; -*- coding: utf-8 -*-
> > on first line and with many unicode characters. I use either
> > latin-4-postfix or a lisp routine to input the Latin Extended-A
> > characters. During an editing session I can search (and replace)
only
> > those extended characters that I have input during that session.
>
> You need "input unification".  Try to put (unify-8859-on-decoding-mode
1)
> in your .emacs.
>
>
>         Stefan

Thanks, Stefan, but no cigar. Here is the pertinent part of my .emacs:

[...]

;;(setq unify-8859-on-decoding-mode 1)
(unify-8859-on-decoding-mode 1)


(custom-set-variables
  ;; custom-set-variables was added by Custom -- don't edit or cut/paste
it!
  ;; Your init file should contain only one such instance.
 '(case-fold-search t)
 '(current-language-environment "UTF-8")
 '(default-input-method "latin-4-postfix")
 '(diary-file "~/mydie" t)
 '(kill-read-only-ok t)
 '(unify-8859-on-encoding-mode t nil (ucs-tables)))

[...]

The first line I added at your suggestion (both as variable and function
call, and later in the editor as a command. Before, I had it in
custom-set-variables with arguments t, nil, (ucs-tables). I erased this
specific option with customize. I even tried copying Dave Love's
rfc1345.el file into \leim\quail but something else is needed since
emacs still doesn't recognize it as a valid imput method. Using your
suggestion, do I also have to require ucs-tables? Any other suggestions,
short of trying to use Yudit?
If a specific symptom is any help then I identify a fresh o-with-macron
(input with latin-4-postfix) as 0xa72 and the exact same character (at
file position) after the file is saved and then revisited is 0x5106d

Ed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i18n search/replace with input methods latin-4-postfix and rfc1345
  2005-03-14 17:43   ` B.T. Raven
@ 2005-03-15 20:33     ` Stefan Monnier
  2005-03-16  0:11       ` B.T. Raven
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2005-03-15 20:33 UTC (permalink / raw)


> (unify-8859-on-decoding-mode 1)

Good.

>  '(unify-8859-on-encoding-mode t nil (ucs-tables))

Good as well.  Except that the two do the same thing redundantly, so it's
better to get rid of one of them.  I.e. if you like to configure your system
with Custom, then keep the second, else keep the first.

If this doesn't work for you, maybe it's a bug in Emacs-21.[34].
Try it with Emacs-CVS where it *should* work,


        Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i18n search/replace with input methods latin-4-postfix and rfc1345
  2005-03-15 20:33     ` Stefan Monnier
@ 2005-03-16  0:11       ` B.T. Raven
  2005-03-16  4:43         ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: B.T. Raven @ 2005-03-16  0:11 UTC (permalink / raw)



"Stefan Monnier" <monnier@iro.umontreal.ca> wrote in message
news:87ll8omz87.fsf-monnier+gnu.emacs.help@gnu.org...
> > (unify-8859-on-decoding-mode 1)
>
> Good.
>
> >  '(unify-8859-on-encoding-mode t nil (ucs-tables))
>
> Good as well.  Except that the two do the same thing redundantly, so
it's
> better to get rid of one of them.  I.e. if you like to configure your
system
> with Custom, then keep the second, else keep the first.

They shouldn't do the same thing since one is for decoding and the other
for encoding. Anyway I think I'll stick with Custom since it's probably
the less error prone method. Apparently unify on encoding is safe but
the other one can cause information loss.

>
> If this doesn't work for you, maybe it's a bug in Emacs-21.[34].
> Try it with Emacs-CVS where it *should* work,
>
>
>         Stefan

I am using the NT build and am not comfortable compiling from source. I
have cygwin running under MS win but have never tried to build anything
with gcc. In my case it might be wiser just to wait for the 21.4 (22.0?)
NT binaries. I would like to install the rfc1345 input method but that
probably requires a rebuild also. I put this in my leim-list.el:

;;(register-input-method
;; "rfc1345" "UTF-8" 'quail-use-package <- is this right activate
function???
;; "&utf<" "Utf-8 characters input method &prefix with postfix
modifiers"  <- is title to go in mode line arbitrary?
;; "quail/rfc1345")

but I don't dare to uncomment it because its dependencies seem to ramify
usque ad infinitum and I can't afford to break the functionality I have
now.  I haven't run into any  related files that aren't compiled lisp
functions, though. Does that mean that they are just *.elc or are some
of them built-ins written in C?

Again, thanks anyway.

Ed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i18n search/replace with input methods latin-4-postfix and rfc1345
  2005-03-16  0:11       ` B.T. Raven
@ 2005-03-16  4:43         ` Stefan Monnier
  2005-03-16 17:18           ` B.T. Raven
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2005-03-16  4:43 UTC (permalink / raw)


>> > (unify-8859-on-decoding-mode 1)
>> 
>> Good.
>> 
>> >  '(unify-8859-on-encoding-mode t nil (ucs-tables))
>> 
>> Good as well.  Except that the two do the same thing redundantly, so
> it's
>> better to get rid of one of them.  I.e. if you like to configure your
> system
>> with Custom, then keep the second, else keep the first.

> They shouldn't do the same thing since one is for decoding and the other
> for encoding.

Oops, sorry, I wasn't careful enough.

> Anyway I think I'll stick with Custom since it's probably the less error
> prone method.  Apparently unify on encoding is safe but the other one can
> cause information loss.

Indeed, but only in "unusual" situations (e.g. if you use encodings like
iso-2022).  And in your case, unification on decoding is exactly what you
need (provided you're not bumping into a bug that prevents it from doing
its job, of course).

> I am using the NT build and am not comfortable compiling from source. I
> have cygwin running under MS win but have never tried to build anything
> with gcc.

I think cygwin has a precompiled cygwin version of the CVS code.


        Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i18n search/replace with input methods latin-4-postfix and rfc1345
  2005-03-16  4:43         ` Stefan Monnier
@ 2005-03-16 17:18           ` B.T. Raven
  2005-03-16 17:54             ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: B.T. Raven @ 2005-03-16 17:18 UTC (permalink / raw)


Thanks again, monsieur Monnier. I posted a note about my problem to
gnu.emacs.bug. As a forlorn hope, I changed my encoding of .emacs from
emacs-mule to utf-8 but it didn't make any difference. A fresh latin-4
character is still different from the same one after it has been saved
and revisited.
Although I downloaded the entire suite of cygwin packages (including cvs
and stunnel) I know how to use only the shell tools (grep, sed, sort,
etc) but these things aren't unicode aware and I couldn't conveniently
input strange characters from the command line anyway (if for instance I
wanted make changes with sed by looking at the emacs buffer in one
window and the command line in a Dos Window.

Ed.

"Stefan Monnier" <monnier@iro.umontreal.ca> wrote in message
news:877jk8kxtt.fsf-monnier+gnu.emacs.help@gnu.org...
> >> > (unify-8859-on-decoding-mode 1)
> >>
> >> Good.
> >>
> >> >  '(unify-8859-on-encoding-mode t nil (ucs-tables))
> >>
> >> Good as well.  Except that the two do the same thing redundantly,
so
> > it's
> >> better to get rid of one of them.  I.e. if you like to configure
your
> > system
> >> with Custom, then keep the second, else keep the first.
>
> > They shouldn't do the same thing since one is for decoding and the
other
> > for encoding.
>
> Oops, sorry, I wasn't careful enough.
>
> > Anyway I think I'll stick with Custom since it's probably the less
error
> > prone method.  Apparently unify on encoding is safe but the other
one can
> > cause information loss.
>
> Indeed, but only in "unusual" situations (e.g. if you use encodings
like
> iso-2022).  And in your case, unification on decoding is exactly what
you
> need (provided you're not bumping into a bug that prevents it from
doing
> its job, of course).
>
> > I am using the NT build and am not comfortable compiling from
source. I
> > have cygwin running under MS win but have never tried to build
anything
> > with gcc.
>
> I think cygwin has a precompiled cygwin version of the CVS code.
>
>
>         Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i18n search/replace with input methods latin-4-postfix and rfc1345
  2005-03-16 17:18           ` B.T. Raven
@ 2005-03-16 17:54             ` Stefan Monnier
  2005-03-17  3:52               ` B.T. Raven
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2005-03-16 17:54 UTC (permalink / raw)


> Thanks again, monsieur Monnier. I posted a note about my problem to
> gnu.emacs.bug.

I doubt you'll get much help this way: there have been various changes in
the way unify-on-decoding works, but I don't think anyone knows of
a particular change that would explain your problem.  Maybe your problem is
actually unrelated, or not fixed in Emacs-CVS.  Unify on decoding does work,
but the details matter (e.g. does unification take place only when reading
files or also when inputting chars via quail.  What about when inputting
chars via XIM, ...).

Now that I think about it, I don't know why I didn't think of telling you to
try one of the precompiled NTEmacs binaries that wander around on the net.
I don't know where to find them, but I know they exist.


        Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i18n search/replace with input methods latin-4-postfix and rfc1345
  2005-03-16 17:54             ` Stefan Monnier
@ 2005-03-17  3:52               ` B.T. Raven
  2005-03-17 15:18                 ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: B.T. Raven @ 2005-03-17  3:52 UTC (permalink / raw)



"Stefan Monnier" <monnier@iro.umontreal.ca> wrote in message
news:87sm2viir2.fsf-monnier+gnu.emacs.help@gnu.org...
> > Thanks again, monsieur Monnier. I posted a note about my problem to
> > gnu.emacs.bug.
>
> I doubt you'll get much help this way: there have been various changes
in
> the way unify-on-decoding works, but I don't think anyone knows of
> a particular change that would explain your problem.  Maybe your
problem is
> actually unrelated, or not fixed in Emacs-CVS.  Unify on decoding does
work,
> but the details matter (e.g. does unification take place only when
reading
> files or also when inputting chars via quail.  What about when
inputting
> chars via XIM, ...).
>
> Now that I think about it, I don't know why I didn't think of telling
you to
> try one of the precompiled NTEmacs binaries that wander around on the
net.
> I don't know where to find them, but I know they exist.
>
>
>         Stefan

That is in fact what I use. It's from headquarters at:

http://ftp.gnu.org/gnu/windows/emacs/

It looks like what C-x = reports is not the code-point (e.g. U+0100) but
some transformation or offset from that. Now that I've made .emacs a
utf-8 file instead of an emacs-mule one, even the freshly input
characters (via Latin-4-postfix) now show yet another byte value. I
guess this could be anywhere from 2 to 4 bytes long.
Do you, Stefan, or does anyone out there have any idea when the 21.4
Windows binaries will be inserted into the ftp tree at the above cited
URL?

Ed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i18n search/replace with input methods latin-4-postfix and rfc1345
  2005-03-17  3:52               ` B.T. Raven
@ 2005-03-17 15:18                 ` Stefan Monnier
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Monnier @ 2005-03-17 15:18 UTC (permalink / raw)


> That is in fact what I use. It's from headquarters at:
> http://ftp.gnu.org/gnu/windows/emacs/

I meant precompiled NTEmacs binaries of Emacs-CVS, not of Emacs-21.[1234].

> It looks like what C-x = reports is not the code-point (e.g. U+0100) but
> some transformation or offset from that. Now that I've made .emacs a

Try C-u C-x =, it'll give you more info.

> Do you, Stefan, or does anyone out there have any idea when the 21.4
> Windows binaries will be inserted into the ftp tree at the above
> cited URL?

There's virtually no difference between 21.3, 21.4 (and 21.4a).


        Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-03-17 15:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-14  3:39 i18n search/replace with input methods latin-4-postfix and rfc1345 B.T. Raven
2005-03-14 13:33 ` Stefan Monnier
2005-03-14 17:43   ` B.T. Raven
2005-03-15 20:33     ` Stefan Monnier
2005-03-16  0:11       ` B.T. Raven
2005-03-16  4:43         ` Stefan Monnier
2005-03-16 17:18           ` B.T. Raven
2005-03-16 17:54             ` Stefan Monnier
2005-03-17  3:52               ` B.T. Raven
2005-03-17 15:18                 ` Stefan Monnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).