all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* user sees \xxx but is thwarted from searching for them
@ 2002-04-16  2:24 Dan Jacobson
  2002-04-16  8:41 ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Jacobson @ 2002-04-16  2:24 UTC (permalink / raw)


The user is given no easy way to search for those \333 binary things
that he sees on his screen.

"Deep in my file there is some binary character[s] that are messing up
my life.  I must page thru the whole file looking around for their
\xxx butts, as emacs won't just let me do C-s \, which would find them
right away, if what we see is what we search."  Istead, emacs probably
wants me to do things a complicated way, doing C-s C-q followed by the
exact character, which I don't know until I've seen it, or emacs
probably wants me to specify a range in a regular expression, which
would be "all the characters that still cause a \xxx on the screen
even when in when in some Chinese mode etc. that encompasses most of
them..."

Anyway, the user sees a \.  The user wants to hunt for a \.  The user
must have a Ph.D. to hunt for a \. 

Wait.  Do emacs -nw file, somehow paste the whole buffer into the
x-windows mouse cut and paster, or xclip, then paste the file into
another file and then search :-(

By the way, the \xxx's are still octal.  Isn't today a hexy world?
-- 
http://jidanni.org/ Taiwan(04)25854780

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16  2:24 user sees \xxx but is thwarted from searching for them Dan Jacobson
@ 2002-04-16  8:41 ` Eli Zaretskii
  2002-04-16 10:55   ` Kai Großjohann
  2002-04-16 11:36   ` David Kastrup
  0 siblings, 2 replies; 19+ messages in thread
From: Eli Zaretskii @ 2002-04-16  8:41 UTC (permalink / raw)
  Cc: bug-gnu-emacs


On 16 Apr 2002, Dan Jacobson wrote:

> Anyway, the user sees a \.  The user wants to hunt for a \.  The user
> must have a Ph.D. to hunt for a \. 

Not really.  `M-: (skip-chars-forward "\000-\177") RET' will do.  
Wrapping this into a simple user command is left as an exercise for the 
interested reader.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16  8:41 ` Eli Zaretskii
@ 2002-04-16 10:55   ` Kai Großjohann
  2002-04-16 11:57     ` Heinrich Rommerskirchen
  2002-04-16 12:21     ` Eli Zaretskii
  2002-04-16 11:36   ` David Kastrup
  1 sibling, 2 replies; 19+ messages in thread
From: Kai Großjohann @ 2002-04-16 10:55 UTC (permalink / raw)


eliz@is.elta.co.il (Eli Zaretskii) writes:

> On 16 Apr 2002, Dan Jacobson wrote:
>
>> Anyway, the user sees a \.  The user wants to hunt for a \.  The user
>> must have a Ph.D. to hunt for a \. 
>
> Not really.  `M-: (skip-chars-forward "\000-\177") RET' will do.  
> Wrapping this into a simple user command is left as an exercise for the 
> interested reader.

This command finds ä, too, even if it is displayed without \ on
screen.  I presume it will also find a lot of other nonascii
characters.

Suppose you have a file which is mostly in the foo encoding, but
contains some bytes that are invalid in that encoding.  I think this
is the situation Dan is talking about.  He wants to find the invalid
bytes, IIUC.

Maybe it helps to try to save the buffer to a file, and let Emacs
complain about the character that couldn't be encoded using the
current coding system.  But I'm not sure if that does the trick,
though.

kai
-- 
Silence is foo!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16  8:41 ` Eli Zaretskii
  2002-04-16 10:55   ` Kai Großjohann
@ 2002-04-16 11:36   ` David Kastrup
  2002-04-16 11:57     ` Kai Großjohann
  2002-04-16 12:32     ` Eli Zaretskii
  1 sibling, 2 replies; 19+ messages in thread
From: David Kastrup @ 2002-04-16 11:36 UTC (permalink / raw)


eliz@is.elta.co.il (Eli Zaretskii) writes:

> On 16 Apr 2002, Dan Jacobson wrote:
> 
> > Anyway, the user sees a \.  The user wants to hunt for a \.  The user
> > must have a Ph.D. to hunt for a \. 
> 
> Not really.  `M-: (skip-chars-forward "\000-\177") RET' will do.  
> Wrapping this into a simple user command is left as an exercise for the 
> interested reader.

That's exactly what Dan means by "must have a Ph.D.".  It is easy, but
non-obvious.  It's something a highly specialized and skilled person
will be able to come up with, while a normal human won't.  That's why
the specialists are paid serious money, not because they are better at
grinding through tedious work that anybody else could do, but because
they know how to do the job fast, appropriate and properly.

There will always be enough of a job left for the specialists in
Emacs.

BTW, one common quirk of mine is that when I am saving or sending or
whatever else a file, and Emacs decides it is not able to find a
suitable encoding system for it, there is _no_ easy way to find out
just what characters prohibit encoding in, say, Latin-1.  I usually
end up deleting and replacing all accented letters in the text
manually and hoping that I will eventually have hit upon the culprit
from a different code page.

We have regular expressions like [::ascii::] or so, perhaps something
like [::encodable-in-the-current-default-encoding::]
[::not-encodable-in-latin2::] (look for better names) would be a
first shot at making things easier to wrap into user accessible
functions.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
Email: David.Kastrup@t-online.de

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 11:36   ` David Kastrup
@ 2002-04-16 11:57     ` Kai Großjohann
  2002-04-16 14:07       ` Eli Zaretskii
  2002-04-16 12:32     ` Eli Zaretskii
  1 sibling, 1 reply; 19+ messages in thread
From: Kai Großjohann @ 2002-04-16 11:57 UTC (permalink / raw)


David.Kastrup@t-online.de (David Kastrup) writes:

> BTW, one common quirk of mine is that when I am saving or sending or
> whatever else a file, and Emacs decides it is not able to find a
> suitable encoding system for it, there is _no_ easy way to find out
> just what characters prohibit encoding in, say, Latin-1.

In Emacs 20.6, the offending characters are highlighted upon C-x C-s.
Not sure why this fails to happen in 21 from CVS.

kai
-- 
Silence is foo!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 10:55   ` Kai Großjohann
@ 2002-04-16 11:57     ` Heinrich Rommerskirchen
  2002-04-16 13:52       ` Kai Großjohann
  2002-04-16 14:08       ` Eli Zaretskii
  2002-04-16 12:21     ` Eli Zaretskii
  1 sibling, 2 replies; 19+ messages in thread
From: Heinrich Rommerskirchen @ 2002-04-16 11:57 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> Maybe it helps to try to save the buffer to a file, and let Emacs
> complain about the character that couldn't be encoded using the
> current coding system.  But I'm not sure if that does the trick,
> though.

But that doesn't help always: 

Assume you have a buffer in latin-1 encoding which contains umlauts, 
add a umlaut in latin-9 and press C-x C-f. Then emacs complains that it
cannot safely save the buffer without showing the offending characters. In
fact it can't because it doesn't know if the latin-1 umlaut is wrong or the
latin-9 umlaut; and both look the same in the buffer :-(

-- 
Regards

Heinz

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 10:55   ` Kai Großjohann
  2002-04-16 11:57     ` Heinrich Rommerskirchen
@ 2002-04-16 12:21     ` Eli Zaretskii
  2002-04-16 13:56       ` Kai Großjohann
  1 sibling, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2002-04-16 12:21 UTC (permalink / raw)
  Cc: bug-gnu-emacs

> From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=)
> Newsgroups: gnu.emacs.bug
> Date: Tue, 16 Apr 2002 12:55:12 +0200
> 
> eliz@is.elta.co.il (Eli Zaretskii) writes:
> 
> > `M-: (skip-chars-forward "\000-\177") RET' will do.  
> 
> This command finds \x7f, too, even if it is displayed without \ on
> screen.  I presume it will also find a lot of other nonascii
> characters.

More accurately, if finds _any_ non-ASCII character.  That's what it
is supposed to do.

> Suppose you have a file which is mostly in the foo encoding, but
> contains some bytes that are invalid in that encoding.  I think this
> is the situation Dan is talking about.  He wants to find the invalid
> bytes, IIUC.

Perhaps I don't understand the original request, but if I do, it is
very hard to do that (AFAIK) without knowing what--i.e. which
character sets--are you looking for.  Recall that, once the file is
visited by a buffer, there are no bytes, just characters.  What you
want is to find characters that don't belong to some set of
characters, without actually telling Emacs what are those ``good''
sets.  This might be relatively easy if your buffer holds characters
from a single charset, but I doubt that Emacs users can be charged
with the burden of knowing about such technicalities.

> Maybe it helps to try to save the buffer to a file, and let Emacs
> complain about the character that couldn't be encoded using the
> current coding system.  But I'm not sure if that does the trick,
> though.

It doesn't; see my other mail.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 11:36   ` David Kastrup
  2002-04-16 11:57     ` Kai Großjohann
@ 2002-04-16 12:32     ` Eli Zaretskii
  1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2002-04-16 12:32 UTC (permalink / raw)
  Cc: bug-gnu-emacs

> From: David Kastrup <David.Kastrup@t-online.de>
> Newsgroups: gnu.emacs.bug
> Date: 16 Apr 2002 13:36:45 +0200
> 
> eliz@is.elta.co.il (Eli Zaretskii) writes:
> 
> > On 16 Apr 2002, Dan Jacobson wrote:
> > 
> > > Anyway, the user sees a \.  The user wants to hunt for a \.  The user
> > > must have a Ph.D. to hunt for a \. 
> > 
> > Not really.  `M-: (skip-chars-forward "\000-\177") RET' will do.  
> > Wrapping this into a simple user command is left as an exercise for the 
> > interested reader.
> 
> That's exactly what Dan means by "must have a Ph.D.".  It is easy, but
> non-obvious.

It's easy once you know what to do.  To _know_ it might require
specific knowledge, but to _use_ it does not.

> We have regular expressions like [::ascii::] or so, perhaps something
> like [::encodable-in-the-current-default-encoding::]
> [::not-encodable-in-latin2::] (look for better names) would be a
> first shot at making things easier to wrap into user accessible
> functions.

This was discussed in preparation for Emacs 21.1, and turned out to be
a very complex job.  The main problem is that, contrary to what users
may expect, Emacs does not actually know what characters prevent it to
encode the buffer in the default coding systems.  The code which
implements this test (see the function select-safe-coding-system and
its subroutines) calls primitives that don't return this information.
Instead, they return a list of encodings that can safely encode all of
the characters in the region; Emacs then compares that list with the
list of default and preferred encodings, and if these two lists don't
intersect, it pops up the question.

Several alternatives were suggested to show the offending characters,
but IIRC they were all non-trivial.  On top of that, all the effort to
implement that will go down the drain when Emacs switches to
Unicode-based internal representation of characters.  And since
Handa-san, who does most of the Mule-related development, is currently
busy working on Unicode support... well, you can guess the rest.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 11:57     ` Heinrich Rommerskirchen
@ 2002-04-16 13:52       ` Kai Großjohann
  2002-04-16 14:08       ` Eli Zaretskii
  1 sibling, 0 replies; 19+ messages in thread
From: Kai Großjohann @ 2002-04-16 13:52 UTC (permalink / raw)


Heinrich.Rommerskirchen@icn.siemen.de (Heinrich Rommerskirchen) writes:

> Assume you have a buffer in latin-1 encoding which contains umlauts, 
> add a umlaut in latin-9 and press C-x C-f. Then emacs complains that it
> cannot safely save the buffer without showing the offending characters. In
> fact it can't because it doesn't know if the latin-1 umlaut is wrong or the
> latin-9 umlaut; and both look the same in the buffer :-(

Remember that first you select a coding system.  This is either
latin-1 (then ä in latin-9 is wrong), or it is latin-9 (then ä in
latin-1 is wrong).  So you can show the wrong character in both cases.

Oh, I didn't say that you have to choose the coding system, first.
Sorry.

kai
-- 
Silence is foo!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 12:21     ` Eli Zaretskii
@ 2002-04-16 13:56       ` Kai Großjohann
  2002-04-18  2:15         ` Dan Jacobson
  0 siblings, 1 reply; 19+ messages in thread
From: Kai Großjohann @ 2002-04-16 13:56 UTC (permalink / raw)


"Eli Zaretskii" <eliz@is.elta.co.il> writes:

>> From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=)
>> Newsgroups: gnu.emacs.bug
>> Date: Tue, 16 Apr 2002 12:55:12 +0200
>> 
>> eliz@is.elta.co.il (Eli Zaretskii) writes:
>> 
>> > `M-: (skip-chars-forward "\000-\177") RET' will do.  
>> 
>> This command finds \x7f, too, even if it is displayed without \ on
>> screen.  I presume it will also find a lot of other nonascii
>> characters.
>
> More accurately, if finds _any_ non-ASCII character.  That's what it
> is supposed to do.

Let me cite again from Dan's original posting:

/----
| "Deep in my file there is some binary character[s] that are messing up
| my life.  I must page thru the whole file looking around for their
| \xxx butts, as emacs won't just let me do C-s \, which would find them
| right away, if what we see is what we search."  Istead, emacs probably
| wants me to do things a complicated way, doing C-s C-q followed by the
| exact character, which I don't know until I've seen it, or emacs
| probably wants me to specify a range in a regular expression, which
| would be "all the characters that still cause a \xxx on the screen
| even when in when in some Chinese mode etc. that encompasses most of
| them..."
\----

In the last sentence, he mentions Chinese.  This means that he
doesn't want to find Chinese characters.

He only wants to find characters which are displayed as \xxx.

>> Suppose you have a file which is mostly in the foo encoding, but
>> contains some bytes that are invalid in that encoding.  I think this
>> is the situation Dan is talking about.  He wants to find the invalid
>> bytes, IIUC.
>
> Perhaps I don't understand the original request, but if I do, it is
> very hard to do that (AFAIK) without knowing what--i.e. which
> character sets--are you looking for.

Hm, yes.

Maybe Dan should say how did the \xxx things get into the buffer in
the first place.  For example, maybe Dan said C-x RET c foo RET C-x
C-f /tmp/somefile RET.  Further suppose that /tmp/somefile contains
byte sequences not valid in the foo coding.

Then it is clear that Dan wants to search for buffer parts that
aren't in (representable) in the foo coding.  Right?

Dan?

> Recall that, once the file is visited by a buffer, there are no
> bytes, just characters.  What you want is to find characters that
> don't belong to some set of characters, without actually telling
> Emacs what are those ``good'' sets.

Maybe Dan knows what are the good sets.

kai
-- 
Silence is foo!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 11:57     ` Kai Großjohann
@ 2002-04-16 14:07       ` Eli Zaretskii
  0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2002-04-16 14:07 UTC (permalink / raw)
  Cc: bug-gnu-emacs

> From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=)
> Newsgroups: gnu.emacs.bug
> Date: Tue, 16 Apr 2002 13:57:26 +0200
> 
> In Emacs 20.6, the offending characters are highlighted upon C-x C-s.
> Not sure why this fails to happen in 21 from CVS.

Yes, this is a known (and unfortunate) side effect of changes in Mule
between Emacs 20.x and Emacs 21.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 11:57     ` Heinrich Rommerskirchen
  2002-04-16 13:52       ` Kai Großjohann
@ 2002-04-16 14:08       ` Eli Zaretskii
  2002-04-16 14:41         ` Heinrich Rommerskirchen
  1 sibling, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2002-04-16 14:08 UTC (permalink / raw)
  Cc: bug-gnu-emacs

> From: Heinrich Rommerskirchen <Heinrich.Rommerskirchen@icn.siemen.de>
> Newsgroups: gnu.emacs.bug
> Date: 16 Apr 2002 11:57:42 +0000
> 
> Assume you have a buffer in latin-1 encoding which contains umlauts, 
> add a umlaut in latin-9 and press C-x C-f. Then emacs complains that it
> cannot safely save the buffer without showing the offending characters. In
> fact it can't because it doesn't know if the latin-1 umlaut is wrong or the
> latin-9 umlaut; and both look the same in the buffer :-(

Try "C-u C-x =", it should tell.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 14:08       ` Eli Zaretskii
@ 2002-04-16 14:41         ` Heinrich Rommerskirchen
  2002-04-16 18:19           ` Eli Zaretskii
  2002-04-17 16:04           ` Richard Stallman
  0 siblings, 2 replies; 19+ messages in thread
From: Heinrich Rommerskirchen @ 2002-04-16 14:41 UTC (permalink / raw)


"Eli Zaretskii" <eliz@is.elta.co.il> writes:

> Try "C-u C-x =", it should tell.

But you have to do it for each candidate character. Or is there some easy
way to search for the next latin-9 character in a predominantly latin-1
buffer?

I was in such a situation yesterday. Normally I use latin-1 encoding but
switched to language environment latin-9 to edit some files containing Euro
signs and forgot about this change. Then I loaded a 400 line DOS file
containing "-*- coding: cp850 -*-" in the first line. Emacs encoded all the
umlauts already in the file as latin-1 but encoded the typed umlauts as
latin-9. And after minor changes all over the file and a few interruptions
I didn't remember which parts were changed and which were old ...

-- 
Regards

Heinz

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 14:41         ` Heinrich Rommerskirchen
@ 2002-04-16 18:19           ` Eli Zaretskii
  2002-04-17 16:04           ` Richard Stallman
  1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2002-04-16 18:19 UTC (permalink / raw)
  Cc: bug-gnu-emacs

> From: Heinrich Rommerskirchen <Heinrich.Rommerskirchen@icn.siemen.de>
> Newsgroups: gnu.emacs.bug
> Date: 16 Apr 2002 14:41:02 +0000
> 
> "Eli Zaretskii" <eliz@is.elta.co.il> writes:
> 
> > Try "C-u C-x =", it should tell.
> 
> But you have to do it for each candidate character. Or is there some easy
> way to search for the next latin-9 character in a predominantly latin-1
> buffer?

I'm not aware of a command to do that (but that might because I never
bumped into it, not because it doesn't exist).  If indeed there is
none, you should be able to write something like this using the
functions char-charset and find-charset-region.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 14:41         ` Heinrich Rommerskirchen
  2002-04-16 18:19           ` Eli Zaretskii
@ 2002-04-17 16:04           ` Richard Stallman
  2002-04-17 17:04             ` Stefan Monnier
  2002-04-17 17:18             ` Eli Zaretskii
  1 sibling, 2 replies; 19+ messages in thread
From: Richard Stallman @ 2002-04-17 16:04 UTC (permalink / raw)
  Cc: emacs-devel

    I was in such a situation yesterday. Normally I use latin-1 encoding but
    switched to language environment latin-9 to edit some files containing Euro
    signs and forgot about this change. Then I loaded a 400 line DOS file
    containing "-*- coding: cp850 -*-" in the first line. Emacs encoded all the
    umlauts already in the file as latin-1 but encoded the typed umlauts as
    latin-9. And after minor changes all over the file and a few interruptions
    I didn't remember which parts were changed and which were old ...

Does anyone have an idea for what we should do about this?
Does the change to turn on unify-on-encoding fix this automatically?

Will the switch to native Unicode fix it?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-17 16:04           ` Richard Stallman
@ 2002-04-17 17:04             ` Stefan Monnier
  2002-04-17 17:18             ` Eli Zaretskii
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Monnier @ 2002-04-17 17:04 UTC (permalink / raw)
  Cc: Heinrich.Rommerskirchen, emacs-devel

>     I was in such a situation yesterday. Normally I use latin-1 encoding but
>     switched to language environment latin-9 to edit some files containing Euro
>     signs and forgot about this change. Then I loaded a 400 line DOS file
>     containing "-*- coding: cp850 -*-" in the first line. Emacs encoded all the
>     umlauts already in the file as latin-1 but encoded the typed umlauts as
>     latin-9. And after minor changes all over the file and a few interruptions
>     I didn't remember which parts were changed and which were old ...
> 
> Does anyone have an idea for what we should do about this?
> Does the change to turn on unify-on-encoding fix this automatically?

It does fix the above case, yes.

> Will the switch to native Unicode fix it?

It will also fix it.


	Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-17 16:04           ` Richard Stallman
  2002-04-17 17:04             ` Stefan Monnier
@ 2002-04-17 17:18             ` Eli Zaretskii
  1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2002-04-17 17:18 UTC (permalink / raw)
  Cc: Heinrich.Rommerskirchen, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Date: Wed, 17 Apr 2002 10:04:50 -0600 (MDT)
> 
>     I was in such a situation yesterday. Normally I use latin-1 encoding but
>     switched to language environment latin-9 to edit some files containing Euro
>     signs and forgot about this change. Then I loaded a 400 line DOS file
>     containing "-*- coding: cp850 -*-" in the first line. Emacs encoded all the
>     umlauts already in the file as latin-1 but encoded the typed umlauts as
>     latin-9. And after minor changes all over the file and a few interruptions
>     I didn't remember which parts were changed and which were old ...
> 
> Does anyone have an idea for what we should do about this?

Help Handa-san make the switch to Unicode ;-)

The problem is that the target charset of cp850 is Latin-1, not
Latin-9.  OTOH, in a Latin-9 language environment, non-ASCII
characters typed by the user are by default converted to Latin-9
characters.

> Does the change to turn on unify-on-encoding fix this automatically?

Yes, as long as the user doesn't type characters that are unique to
Latin-1 and to Latin-9 (like if they use both the currency symbol and
the Euro symbol in the same buffer).  That is, assuming that the
result, probably UTF-8, is not what the users expect.

> Will the switch to native Unicode fix it?

Yes.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-16 13:56       ` Kai Großjohann
@ 2002-04-18  2:15         ` Dan Jacobson
  2002-04-18  9:42           ` Kai Großjohann
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Jacobson @ 2002-04-18  2:15 UTC (permalink / raw)


>>>>> "K" == Kai Großjohann <Kai.Grossjohann@CS.Uni-Dortmund.DE> writes:

K> Then it is clear that Dan wants to search for buffer parts that
K> aren't in (representable) in the foo coding.  Right?

K> Dan?

[sniff] they referred to me by name.  it's almost like I exist
[sniff].  Sorry, I've been chasing the wild pig hunters of my land.

OK, my file would be a well behaved big5 chinese file except for a few
scattered characters that the author was using to represent some IPA
symbols.  My mission: no hunt them down and deal with them so that the
file can the be used with emacs.

what I probably should do is find a perl script that will replace any
characters outside the intended coding system of the file [which I
could tell it explicitly], "with ***\343\433 was here***" [ASCII]
which I could then deal with later in emacs.

Hmmm, this seems hard in perl, given big5's definition of
/[\x80-\xFE][\x40-\x7E\xA1-\xFE]/   also one should ignore any
[0x00-0x7F].

Indeed, how do the \xxx's get on my screen in the first place? well
C-x C-f is just going to make the whole file \xxx, so I do M-! cat
file, at least there I can see most of the chinese, and the \xxx's
stick out like a sore thumb.  but, what a drag it is that one can see
the \xxx's but cant search for them.  it almost makes one want to wrap
this emacs inside another emacs to be able to search for them [but a
screen at a time].

Anyway, I would just be searching in *Shell Command Output*, and still
have to navigate the now 100% \xxx source file.  So, my perl script
idea seems better.

By the way, apparently gnus asked me if I wanted to save Kai's name to
BBDB and I hit "y" or something.  Well, as Kai has that big B in his
name [=ss I think], and as I already had some big5 in my BBDB, well,
when it came time to save I was given the Spanish inquisition about
coding sets or something ... who knows, one false step here and your
file will become Coptic Egyptian or something.  So I want back and
switched the B for ss before saving.

By they way, does emacs require on to see octal codes on their screen
or can it live in a hex world yet?
-- 
http://jidanni.org/ Taiwan(04)25854780

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user sees \xxx but is thwarted from searching for them
  2002-04-18  2:15         ` Dan Jacobson
@ 2002-04-18  9:42           ` Kai Großjohann
  0 siblings, 0 replies; 19+ messages in thread
From: Kai Großjohann @ 2002-04-18  9:42 UTC (permalink / raw)


jidanni@deadspam.com (Dan Jacobson) writes:

> OK, my file would be a well behaved big5 chinese file except for a few
> scattered characters that the author was using to represent some IPA
> symbols.  My mission: no hunt them down and deal with them so that the
> file can the be used with emacs.

Right.  So, I think that C-x RET c big5 RET C-x C-f /tmp/somefile RET
would visit the file in the manner you want.  And then, if you were
using Emacs 20, you could try to save the file in the same coding
system and then Emacs would highlight the characters that aren't
encodable.

But with Emacs 21, some Lisp is require that searches for characters
not encodable in coding system X.

kai
-- 
Silence is foo!

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2002-04-18  9:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-16  2:24 user sees \xxx but is thwarted from searching for them Dan Jacobson
2002-04-16  8:41 ` Eli Zaretskii
2002-04-16 10:55   ` Kai Großjohann
2002-04-16 11:57     ` Heinrich Rommerskirchen
2002-04-16 13:52       ` Kai Großjohann
2002-04-16 14:08       ` Eli Zaretskii
2002-04-16 14:41         ` Heinrich Rommerskirchen
2002-04-16 18:19           ` Eli Zaretskii
2002-04-17 16:04           ` Richard Stallman
2002-04-17 17:04             ` Stefan Monnier
2002-04-17 17:18             ` Eli Zaretskii
2002-04-16 12:21     ` Eli Zaretskii
2002-04-16 13:56       ` Kai Großjohann
2002-04-18  2:15         ` Dan Jacobson
2002-04-18  9:42           ` Kai Großjohann
2002-04-16 11:36   ` David Kastrup
2002-04-16 11:57     ` Kai Großjohann
2002-04-16 14:07       ` Eli Zaretskii
2002-04-16 12:32     ` Eli Zaretskii

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.