replacing characters and whacky trans-buffer conversion

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* replacing characters and whacky trans-buffer conversion
@ 2007-03-06 15:15 ken
  2007-03-06 16:28 ` Peter Dyballa
  2007-03-07 20:48 ` ken
  0 siblings, 2 replies; 34+ messages in thread
From: ken @ 2007-03-06 15:15 UTC (permalink / raw)
  To: GNU Emacs List

An email comes in with this (emdash) character in it: –

It looks like an em-dash until the text containing it is pasted into an
emacs buffer; then it appears as a series of "garbage characters".
(Copy and paste the emdash into an emacs buffer yourself, and perhaps
you'll see what I mean.)

To me and, possibly to you, this emdash appears in emacs as nine (9)
"garbage" characters.

Because I want to programmatically replace these 9 garbage characters
into something latin1-friendly, I copy-and-paste these nine characters
into an *.el file containing a line like this:

  (replace-string "–" "--" nil (point-min) (point-max))

The sought string (i.e., the first argument above) isn't found, however
because, for some whacky reason, the emdash pasted into the *.el file is
different-- by one character-- from exactly the same emdash pasted into
the other emacs buffer (the one I'm saving the email in).

In the emacs buffer containing the email, the fourth garbage character
(as shown by C-u C-x=) is:

  character: β (05542, 2914, 0xb62)
    charset: greek-iso8859-7
	     (Right-Hand Part of Latin/Greek Alphabet (ISO/IEC 8859-7): ISO-IR-126)
 code point: 98
     syntax: word
   category: g:Greek
buffer code: 0x86 0xE2
  file code: not encodable by coding system undecided-unix
       font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-7

In the *.el buffer, the fourth garbage character (which should be
exactly the same character) is:

  character: â (0342, 226, 0xe2)
    charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
 code point: 226
     syntax: whitespace
   category:
buffer code: 0xE2
  file code: 0xE2 (encoded by coding system raw-text-unix)
       font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-1

I tried entering "C-q 5542 RETURN" into the *.el file, but emacs
immediately makes it into the second (â, or 0342) character.  Doing the
same into the other emacs buffer (containing my copy of the email)
*does* enter the good (β, or 05542) character.

All I really want is for the above replace-string function to work as
expected.  But emacs consistently converts that fourth character in the
emdash string into a different character, subsequently causing the
search to fail.  So how do I get the correct "garbage" characters into
the first argument of the replace-string function-- i.e., into the *.el
file?

tnx,
ken

-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-06 15:15 replacing characters and whacky trans-buffer conversion ken
@ 2007-03-06 16:28 ` Peter Dyballa
  2007-03-07  7:38   ` Matthew Flaschen
                     ` (2 more replies)
  2007-03-07 20:48 ` ken
  1 sibling, 3 replies; 34+ messages in thread
From: Peter Dyballa @ 2007-03-06 16:28 UTC (permalink / raw)
  To: ken; +Cc: GNU Emacs List

Am 06.03.2007 um 16:15 schrieb ken:

> An email comes in with this (emdash) character in it: –

It's only an EN DASH, U+2013 (dec 8211, oct 20023). There are only a  
few encodings that contain it:

	CP1250
	CP1251
	CP1252
	NeXT
	Mac-Greek
	Mac-Cyrillic
	Mac-Roman
	Adobe Standard Encoding

(not complete, I presume). This character has in UTF-8 a  
representation of 0xE2 0x80 0x92, three bytes. It seems that somehow  
each of this three bytes is converted into some three byte  
representation. A malfunction in GNOME? (At least I had once such  
problems in Fedora Core 1.)

Can you try to paste into an UTF-8 encoding buffer? Its mode-line  
should start with -u: (or -U: in GNU Emacs 23.0.0).

--
Greetings

              ~  O
   Pete       ~~_\\_/%
              ~  O  o

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-06 16:28 ` Peter Dyballa
@ 2007-03-07  7:38   ` Matthew Flaschen
  2007-03-07  9:59     ` Peter Dyballa
  2007-03-08 12:16   ` ken
  2007-03-08 20:43   ` ken
  2 siblings, 1 reply; 34+ messages in thread
From: Matthew Flaschen @ 2007-03-07  7:38 UTC (permalink / raw)
  To: emacs

Peter Dyballa wrote:
> A malfunction in
> GNOME? (At least I had once such problems in Fedora Core 1.)
> 
FWIW, I'm running Thunderbird and emacs 21.4 on gnewsense-kde and have
the same behavior.

Matt Flaschen

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-07  7:38   ` Matthew Flaschen
@ 2007-03-07  9:59     ` Peter Dyballa
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Dyballa @ 2007-03-07  9:59 UTC (permalink / raw)
  To: Matthew Flaschen; +Cc: emacs

Am 07.03.2007 um 08:38 schrieb Matthew Flaschen:

> Peter Dyballa wrote:
>> A malfunction in
>> GNOME? (At least I had once such problems in Fedora Core 1.)
>>
> FWIW, I'm running Thunderbird and emacs 21.4 on gnewsense-kde and have
> the same behavior.
>

Then it looks more like a malfunction in Thunderbird ...

Ken, the original poster, is using Thunderbird 2.0pre (X11/20070214),  
you are using Thunderbird 1.5.0.9 (X11/20070104) – or you need to  
tell Thunderbird that you are composing in UTF-8. I think in this  
millennium the modern Linuxen use UTF-8 internally, and when a  
client, such as Thunderbird, claims to be UTF-8 aware, then the X  
selection is not converted – which would be Thunderbird's task,  
because it's Thunderbird who "knows" which encodings are used inside  
its "buffers."

(I found Thunderbird a bit troublesome in its way to handle  
encodings. If I would use it more often than once in a month I would  
have written a few bug reports.)

--
Greetings

   Pete

When confronted with actual numbers, a mathematician is at a loss.
                                          (Steffen Hokland)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-06 16:28 ` Peter Dyballa
  2007-03-07  7:38   ` Matthew Flaschen
@ 2007-03-08 12:16   ` ken
  2007-03-08 16:31     ` Peter Dyballa
  2007-03-08 20:43   ` ken
  2 siblings, 1 reply; 34+ messages in thread
From: ken @ 2007-03-08 12:16 UTC (permalink / raw)
  To: Peter Dyballa; +Cc: GNU Emacs List

On 03/06/2007 11:28 AM somebody named Peter Dyballa wrote:
> 
> Am 06.03.2007 um 16:15 schrieb ken:
> 
>> An email comes in with this (emdash) character in it: –
> 
> It's only an EN DASH, U+2013 (dec 8211, oct 20023). 

Looking at it, it's obvious you're right.  The author of the email used
it where an emdash should have been used instead.  I guess I shouldn't
have blindly assumed the author was correct... should have trusted my
own eyes.  Thanks for pointing that out... brings up another issue with
what I'm trying to do.


> There are only a few
> encodings that contain it:
> 
>     CP1250
>     CP1251
>     CP1252
>     NeXT
>     Mac-Greek
>     Mac-Cyrillic
>     Mac-Roman
>     Adobe Standard Encoding
> 
> (not complete, I presume). This character has in UTF-8 a representation
> of 0xE2 0x80 0x92, three bytes. It seems that somehow each of this three
> bytes is converted into some three byte representation. A malfunction in
> GNOME? (At least I had once such problems in Fedora Core 1.)
> 
> Can you try to paste into an UTF-8 encoding buffer? Its mode-line should
> start with -u: (or -U: in GNU Emacs 23.0.0).

How do I open up a utf-8 buffer?  I'd much prefer to do this just for
this one time, not change my emacs configuration to do it forever.


> 
> -- 
> Greetings
> 
>              ~  O
>   Pete       ~~_\\_/%
>              ~  O  o
> 
> 


-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-08 12:16   ` ken
@ 2007-03-08 16:31     ` Peter Dyballa
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Dyballa @ 2007-03-08 16:31 UTC (permalink / raw)
  To: ken; +Cc: GNU Emacs List


Am 08.03.2007 um 13:16 schrieb ken:

> How do I open up a utf-8 buffer?  I'd much prefer to do this just for
> this one time, not change my emacs configuration to do it forever.

C-x RET c utf-8-unix RET C-x C-f, for example.

--
Greetings

   Pete

Without vi there is only GNU Emacs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-06 16:28 ` Peter Dyballa
  2007-03-07  7:38   ` Matthew Flaschen
  2007-03-08 12:16   ` ken
@ 2007-03-08 20:43   ` ken
  2007-03-08 23:14     ` Peter Dyballa
       [not found]     ` <mailman.688.1173395790.7795.help-gnu-emacs@gnu.org>
  2 siblings, 2 replies; 34+ messages in thread
From: ken @ 2007-03-08 20:43 UTC (permalink / raw)
  To: GNU Emacs List

On 03/06/2007 11:28 AM somebody named Peter Dyballa wrote:
> 
> Am 06.03.2007 um 16:15 schrieb ken:
> 
>> An email comes in with this (emdash) character in it: –
> 
> ....
> 
> Can you try to paste into an UTF-8 encoding buffer? Its mode-line should
> start with -u: (or -U: in GNU Emacs 23.0.0).

Copying from the endash above and yanked into the urf8 buffer, I get the
same string of garbage which appears when I yank the endash into my *.el
file.  It looks like this:

^[%GX\200\223^[%@

except that the fourth character-- represented above by the X-- is

  character: â (04342, 2274, 0x8e2)
    charset: latin-iso8859-1
	     (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100)
 code point: 98
     syntax: word
   category: l:Latin
buffer code: 0x81 0xE2
  file code: 0xC3 0xA2 (encoded by coding system utf-8-unix)
       font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-1

when it should be the Greek lowercase beta... as previously discussed.



-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-08 20:43   ` ken
@ 2007-03-08 23:14     ` Peter Dyballa
       [not found]     ` <mailman.688.1173395790.7795.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Dyballa @ 2007-03-08 23:14 UTC (permalink / raw)
  To: ken; +Cc: GNU Emacs List

Am 08.03.2007 um 21:43 schrieb ken:

> ^[%GX\200\223^[%@

Yes, this is the scrap I found in GNU Emacs 21.x in Fedora Core 1  
when I pasted some contents from some other GNOME application!

Since this Linux PC was just my desktop machine and my /real/ Emacsen  
were running on Sun servers my workaround was very simple: vi(m) in  
some terminal emulation. Sometimes a simple echo or cat was my  
friend ...

I never studied GNOME that much that I found a clue. Could be it  
helps to set in login and any other shell the environment variables  
LC_ALL and LANG to some utf-8 based value. What I also observed, is  
that parts of the ANSI like ESC sequences changed when I copied from  
differently encoded files. So it's actually, IMO, not the *real* X  
selection you are pasting, but some GNOME selection that also tells  
all GNOME applications how the text you paste is encoded – which is  
clever!

--
Greetings

   Pete

Make it simple, as simple as possible but no simpler.
                               Albert Einstein

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <mailman.688.1173395790.7795.help-gnu-emacs@gnu.org>]

* Re: replacing characters and whacky trans-buffer conversion
       [not found]     ` <mailman.688.1173395790.7795.help-gnu-emacs@gnu.org>
@ 2007-03-09 14:28       ` Oliver Scholz
  0 siblings, 0 replies; 34+ messages in thread
From: Oliver Scholz @ 2007-03-09 14:28 UTC (permalink / raw)
  To: help-gnu-emacs

Peter Dyballa <Peter_Dyballa@Web.DE> writes:

> Am 08.03.2007 um 21:43 schrieb ken:
>
>> ^[%GX\200\223^[%@

[...]
> So it's actually, IMO, not the *real* X selection you are pasting,
> but some GNOME selection that also tells all GNOME applications how
> the text you paste is encoded – which is clever!

Actually, it just looks like something ISO 2022-ish.

It wouldn't explain the weird difference between unibyte and multibyte
in those two buffers, anyways. Ken, can you reproduce all that when
you start Emacs with -q --no-site-file?


    Oliver
-- 
19 Ventôse an 215 de la Révolution
Liberté, Egalité, Fraternité!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-06 15:15 replacing characters and whacky trans-buffer conversion ken
  2007-03-06 16:28 ` Peter Dyballa
@ 2007-03-07 20:48 ` ken
  2007-03-07 21:03   ` ken
  1 sibling, 1 reply; 34+ messages in thread
From: ken @ 2007-03-07 20:48 UTC (permalink / raw)
  To: GNU Emacs List


Okay, try this:

Create two buffers in one emacs frame.

In one of them enter C-q 5442 RETURN.  You should get a Greek character
which looks much like a German double-s.

Using only emacs, put this character into the kill buffer and yank it
into the second buffer.

When I do this, I get a different character in the second buffer.  Its
coding (ascertained via C-x=) is also different.

Go back to the first buffer and do another yank.  What character do you
get.  I get the original character, the one inserted with C-q 5442 RETURN.



On 03/06/2007 10:15 AM somebody named ken wrote:
> An email comes in with this (emdash) character in it: –
> 
> It looks like an em-dash until the text containing it is pasted into an
> emacs buffer; then it appears as a series of "garbage characters".
> (Copy and paste the emdash into an emacs buffer yourself, and perhaps
> you'll see what I mean.)
> 
> To me and, possibly to you, this emdash appears in emacs as nine (9)
> "garbage" characters.
> 
> Because I want to programmatically replace these 9 garbage characters
> into something latin1-friendly, I copy-and-paste these nine characters
> into an *.el file containing a line like this:
> 
>   (replace-string "–" "--" nil (point-min) (point-max))
> 
> The sought string (i.e., the first argument above) isn't found, however
> because, for some whacky reason, the emdash pasted into the *.el file is
> different-- by one character-- from exactly the same emdash pasted into
> the other emacs buffer (the one I'm saving the email in).
> 
> In the emacs buffer containing the email, the fourth garbage character
> (as shown by C-u C-x=) is:
> 
>   character: β (05542, 2914, 0xb62)
>     charset: greek-iso8859-7
> 	     (Right-Hand Part of Latin/Greek Alphabet (ISO/IEC 8859-7): ISO-IR-126)
>  code point: 98
>      syntax: word
>    category: g:Greek
> buffer code: 0x86 0xE2
>   file code: not encodable by coding system undecided-unix
>        font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-7
> 
> In the *.el buffer, the fourth garbage character (which should be
> exactly the same character) is:
> 
>   character: â (0342, 226, 0xe2)
>     charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
>  code point: 226
>      syntax: whitespace
>    category:
> buffer code: 0xE2
>   file code: 0xE2 (encoded by coding system raw-text-unix)
>        font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-1
> 
> I tried entering "C-q 5542 RETURN" into the *.el file, but emacs
> immediately makes it into the second (â, or 0342) character.  Doing the
> same into the other emacs buffer (containing my copy of the email)
> *does* enter the good (β, or 05542) character.
> 
> All I really want is for the above replace-string function to work as
> expected.  But emacs consistently converts that fourth character in the
> emdash string into a different character, subsequently causing the
> search to fail.  So how do I get the correct "garbage" characters into
> the first argument of the replace-string function-- i.e., into the *.el
> file?
> 
> 
> tnx,
> ken
> 
> 


-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-07 20:48 ` ken
@ 2007-03-07 21:03   ` ken
  2007-03-07 21:30     ` Peter Dyballa
  0 siblings, 1 reply; 34+ messages in thread
From: ken @ 2007-03-07 21:03 UTC (permalink / raw)
  To: GNU Emacs List


Sorry... in the below it should say 5542 instead of 5442.


On 03/07/2007 03:48 PM somebody named ken wrote:
> Okay, try this:
> 
> Create two buffers in one emacs frame.
> 
> In one of them enter C-q 5442 RETURN.  You should get a Greek character
> which looks much like a German double-s.
> 
> Using only emacs, put this character into the kill buffer and yank it
> into the second buffer.
> 
> When I do this, I get a different character in the second buffer.  Its
> coding (ascertained via C-x=) is also different.
> 
> Go back to the first buffer and do another yank.  What character do you
> get.  I get the original character, the one inserted with C-q 5442 RETURN.
> 
> 
> 
> On 03/06/2007 10:15 AM somebody named ken wrote:
>> An email comes in with this (emdash) character in it: –
>>
>> It looks like an em-dash until the text containing it is pasted into an
>> emacs buffer; then it appears as a series of "garbage characters".
>> (Copy and paste the emdash into an emacs buffer yourself, and perhaps
>> you'll see what I mean.)
>>
>> To me and, possibly to you, this emdash appears in emacs as nine (9)
>> "garbage" characters.
>>
>> Because I want to programmatically replace these 9 garbage characters
>> into something latin1-friendly, I copy-and-paste these nine characters
>> into an *.el file containing a line like this:
>>
>>   (replace-string "–" "--" nil (point-min) (point-max))
>>
>> The sought string (i.e., the first argument above) isn't found, however
>> because, for some whacky reason, the emdash pasted into the *.el file is
>> different-- by one character-- from exactly the same emdash pasted into
>> the other emacs buffer (the one I'm saving the email in).
>>
>> In the emacs buffer containing the email, the fourth garbage character
>> (as shown by C-u C-x=) is:
>>
>>   character: β (05542, 2914, 0xb62)
>>     charset: greek-iso8859-7
>> 	     (Right-Hand Part of Latin/Greek Alphabet (ISO/IEC 8859-7): ISO-IR-126)
>>  code point: 98
>>      syntax: word
>>    category: g:Greek
>> buffer code: 0x86 0xE2
>>   file code: not encodable by coding system undecided-unix
>>        font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-7
>>
>> In the *.el buffer, the fourth garbage character (which should be
>> exactly the same character) is:
>>
>>   character: â (0342, 226, 0xe2)
>>     charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
>>  code point: 226
>>      syntax: whitespace
>>    category:
>> buffer code: 0xE2
>>   file code: 0xE2 (encoded by coding system raw-text-unix)
>>        font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-1
>>
>> I tried entering "C-q 5542 RETURN" into the *.el file, but emacs
>> immediately makes it into the second (â, or 0342) character.  Doing the
>> same into the other emacs buffer (containing my copy of the email)
>> *does* enter the good (β, or 05542) character.
>>
>> All I really want is for the above replace-string function to work as
>> expected.  But emacs consistently converts that fourth character in the
>> emdash string into a different character, subsequently causing the
>> search to fail.  So how do I get the correct "garbage" characters into
>> the first argument of the replace-string function-- i.e., into the *.el
>> file?
>>
>>
>> tnx,
>> ken
>>
>>
> 
> 


-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-07 21:03   ` ken
@ 2007-03-07 21:30     ` Peter Dyballa
  2007-03-08  1:11       ` ken
       [not found]       ` <mailman.627.1173316331.7795.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 34+ messages in thread
From: Peter Dyballa @ 2007-03-07 21:30 UTC (permalink / raw)
  To: ken; +Cc: GNU Emacs List

Am 07.03.2007 um 22:03 schrieb ken:

> Sorry... in the below it should say 5542 instead of 5442.

Sorry, doesn't work like that in my GNU Emacs 22.0.93! I have both  
times, in UTF-8 encoded buffers, the Greek small letter beta, i.e.  
``β´´. And I use C-u C-x = to determine what it is.

Can you take a bit of care on the few characters left of the leftmost  
``:´´ in the mode-line? These characters, by which the mode-line  
actually starts on the left, show the encoding used in that buffer.  
This information is as important as the name of the currency unit on  
your coins or bank-notes. I mean, what are 1 million Lire?!

--
Greetings

   Pete

The human brain operates at only 10% of its capacity. The rest is  
overhead for the operating system.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-07 21:30     ` Peter Dyballa
@ 2007-03-08  1:11       ` ken
       [not found]       ` <mailman.627.1173316331.7795.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 34+ messages in thread
From: ken @ 2007-03-08  1:11 UTC (permalink / raw)
  To: Peter Dyballa; +Cc: GNU Emacs List

On 03/07/2007 04:30 PM somebody named Peter Dyballa wrote:
> 
> Am 07.03.2007 um 22:03 schrieb ken:
> 
>> Sorry... in the below it should say 5542 instead of 5442.
> 
> Sorry, doesn't work like that in my GNU Emacs 22.0.93! I have both
> times, in UTF-8 encoded buffers, the Greek small letter beta, i.e.
> ``β´´. And I use C-u C-x = to determine what it is.
> 
> Can you take a bit of care on the few characters left of the leftmost
> ``:´´ in the mode-line? These characters, by which the mode-line
> actually starts on the left, show the encoding used in that buffer. This
> information is as important as the name of the currency unit on your
> coins or bank-notes. I mean, what are 1 million Lire?!
> 
>....


The first buffer is a *scratch* buffer, the modeline starts "--:".  The
second contains the *.el file mentioned in the original post; its
modeline begins "-:".

Good analogy about currency.  Via email I get all kinds.  I want to
convert it all into something I can use... otherwise it's garbage.  So I
want to convert it into something I can use.  But emacs isn't letting me
convert it because it's converting it to something else which it won't
let me insert into my conversion function's buffer/file.


-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <mailman.627.1173316331.7795.help-gnu-emacs@gnu.org>]

* Re: replacing characters and whacky trans-buffer conversion
       [not found]       ` <mailman.627.1173316331.7795.help-gnu-emacs@gnu.org>
@ 2007-03-08  7:50         ` Stefan Monnier
  2007-03-08 10:40           ` ken
       [not found]           ` <mailman.648.1173350436.7795.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 34+ messages in thread
From: Stefan Monnier @ 2007-03-08  7:50 UTC (permalink / raw)
  To: help-gnu-emacs

> The first buffer is a *scratch* buffer, the modeline starts "--:".  The
> second contains the *.el file mentioned in the original post; its
> modeline begins "-:".

For some reason this second buffer is in unibyte mode.
That's the source of your problem.  Tell us how you created that buffer.


        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-08  7:50         ` Stefan Monnier
@ 2007-03-08 10:40           ` ken
  2007-03-08 11:55             ` ken
       [not found]           ` <mailman.648.1173350436.7795.help-gnu-emacs@gnu.org>
  1 sibling, 1 reply; 34+ messages in thread
From: ken @ 2007-03-08 10:40 UTC (permalink / raw)
  To: help-gnu-emacs

On 03/08/2007 02:50 AM somebody named Stefan Monnier wrote:
>> The first buffer is a *scratch* buffer, the modeline starts "--:".  The
>> second contains the *.el file mentioned in the original post; its
>> modeline begins "-:".
> 
> For some reason this second buffer is in unibyte mode.
> That's the source of your problem.  Tell us how you created that buffer.

C-x C-f

++++++++++++++++++++++++++++++++++++++++++++++

C-x C-f runs the command find-file
   which is an interactive compiled Lisp function in `files'.
(find-file FILENAME &optional WILDCARDS)

Edit file FILENAME.
Switch to a buffer visiting file FILENAME,
creating one if none already exists.
Interactively, or if WILDCARDS is non-nil in a call from Lisp,
expand wildcards (if any) and visit multiple files.  Wildcard expansion
can be suppressed by setting `find-file-wildcards'.

++++++++++++++++++++++++++++++++++++++++++++++


-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-08 10:40           ` ken
@ 2007-03-08 11:55             ` ken
  0 siblings, 0 replies; 34+ messages in thread
From: ken @ 2007-03-08 11:55 UTC (permalink / raw)
  To: help-gnu-emacs

On 03/08/2007 05:40 AM somebody named ken wrote:
> On 03/08/2007 02:50 AM somebody named Stefan Monnier wrote:
>>> The first buffer is a *scratch* buffer, the modeline starts "--:".  The
>>> second contains the *.el file mentioned in the original post; its
>>> modeline begins "-:".
>> For some reason this second buffer is in unibyte mode.
>> That's the source of your problem.  Tell us how you created that buffer.
> 
> C-x C-f
> 
> ....

I should add that this is how I create most all files in emacs.
Occasionally I'll use vi(m) to create or edit a file and subsequently
edit it using emacs.  Less occasionally I'll use "cat > filename" to
create a file, or "[some (series of) command(s)] > filename" and then
pull it into emacs.  And, as mentioned earlier, sometimes I'll open a
file (C-x C-f) and yank the clipboard contents into it.  It seems you're
implying that different means of creating a file will force emacs' use
of different character encodings, yes?  (Actually, I've encountered many
times that emacs, after C-x C-s, tells me to select some other encoding
system... which I almost never want to do.)

tnx

-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <mailman.648.1173350436.7795.help-gnu-emacs@gnu.org>]

* Re: replacing characters and whacky trans-buffer conversion
       [not found]           ` <mailman.648.1173350436.7795.help-gnu-emacs@gnu.org>
@ 2007-03-09  1:51             ` Stefan Monnier
  2007-03-09 10:15               ` ken
                                 ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Stefan Monnier @ 2007-03-09  1:51 UTC (permalink / raw)
  To: help-gnu-emacs

>>> The first buffer is a *scratch* buffer, the modeline starts "--:".  The
>>> second contains the *.el file mentioned in the original post; its
>>> modeline begins "-:".
>> 
>> For some reason this second buffer is in unibyte mode.
>> That's the source of your problem.  Tell us how you created that buffer.

> C-x C-f

Hmm, that doesn't say much.
Tell us the value of C-h v default-enable-multibyte-characters.
Also shows us the first few and last few lines of the file.


        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-09  1:51             ` Stefan Monnier
@ 2007-03-09 10:15               ` ken
  2007-03-09 13:14                 ` Peter Dyballa
  2007-03-09 10:21               ` ken
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: ken @ 2007-03-09 10:15 UTC (permalink / raw)
  To: help-gnu-emacs

On 03/08/2007 08:51 PM somebody named Stefan Monnier wrote:
>>>> The first buffer is a *scratch* buffer, the modeline starts "--:".  The
>>>> second contains the *.el file mentioned in the original post; its
>>>> modeline begins "-:".
>>> For some reason this second buffer is in unibyte mode.
>>> That's the source of your problem.  Tell us how you created that buffer.
> 
>> C-x C-f
> 
> Hmm, that doesn't say much.
> Tell us the value of C-h v default-enable-multibyte-characters.
> Also shows us the first few and last few lines of the file.

In both the *.el file and in the *scratch* buffer
default-enable-multibyte-characters is t.

Here's the entirety of the *.el file:

;Replace goofy MS chars with latin1 equivalents.
;You can, of course, add to the list of chars.

; Multi-byte strings such as the one below should be toward
; the top of the list so that single-byte replacements don't
; cut them up, making subsequent searches for them impossible.
;"—" => "--"

; Also, to enter the escaped numbers, e.g. "\221", do
; C-q 2 2 1 RETURN.

;To discover the code for a new (garbage) char to be replaced,
;put the point over it and do "C-x="; the first code returned in
;the minibuffer tells you the escaped number you want to replace.

;Wrote up more on this at
;<http://www.emacswiki.org/cgi-bin/emacs-en/ReplaceGarbageChars>

(defun replace-garbage-chars ()
"Replace goofy MS and other garbage characters with latin1 equivalents."
(interactive)
(save-excursion				;save the current point
  (replace-string "—" "--" nil (point-min) (point-max)); multi-byte
  (replace-string "“" "``" nil (point-min) (point-max))
  (replace-string "–" "--" nil (point-min) (point-max))
  (replace-string "–" "'" nil (point-min) (point-max))
  (replace-string "k," "i" nil (point-min) (point-max))
  (replace-string "¢" "'" nil (point-min) (point-max))
  (replace-string "”" "'" nil (point-min) (point-max))
  (replace-string "?" "`" nil (point-min) (point-max))
  (replace-string "?" "'" nil (point-min) (point-max))
  (replace-string "?" "``" nil (point-min) (point-max))
  (replace-string "?" "''" nil (point-min) (point-max))
  (replace-string "?" "--" nil (point-min) (point-max))
))



-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-09 10:15               ` ken
@ 2007-03-09 13:14                 ` Peter Dyballa
  2007-03-09 15:54                   ` ken
  2007-03-09 18:41                   ` Reiner Steib
  0 siblings, 2 replies; 34+ messages in thread
From: Peter Dyballa @ 2007-03-09 13:14 UTC (permalink / raw)
  To: ken; +Cc: help-gnu-emacs

Am 09.03.2007 um 11:15 schrieb ken:

> "Replace goofy MS and other garbage characters with latin1  
> equivalents."

I wouldn't recommend to do so! You can easily open such a file in  
some MS Windows encoding, enter a SPC, save in some ISO Latin, and  
remove the SPC (I think one change is needed to make GNU Emacs save a  
file, because: what's the sense in saving a safe file?). Don't forget  
to save again ...

If you have more of such files you can use recode or iconv to convert  
them to some ISO Latin. Or: you can use Samba to present all MS  
Windows files in ISO Latin ... (a mount option)

--
Greetings

   Pete

"If you don't find it in the index, look very carefully through the  
entire
  catalogue."          –  Sears, Roebuck, and Co., Consumer's Guide,  
1897

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-09 13:14                 ` Peter Dyballa
@ 2007-03-09 15:54                   ` ken
  2007-03-09 16:13                     ` Peter Dyballa
  2007-03-09 18:41                   ` Reiner Steib
  1 sibling, 1 reply; 34+ messages in thread
From: ken @ 2007-03-09 15:54 UTC (permalink / raw)
  To: help-gnu-emacs

On 03/09/2007 08:14 AM somebody named Peter Dyballa wrote:
> 
> Am 09.03.2007 um 11:15 schrieb ken:
> 
>> "Replace goofy MS and other garbage characters with latin1 equivalents."
> 
> I wouldn't recommend to do so! You can easily open such a file in some
> MS Windows encoding, enter a SPC, save in some ISO Latin, and remove the
> SPC (I think one change is needed to make GNU Emacs save a file,
> because: what's the sense in saving a safe file?). Don't forget to save
> again ...
> 
> If you have more of such files you can use recode or iconv to convert
> them to some ISO Latin. Or: you can use Samba to present all MS Windows
> files in ISO Latin ... (a mount option)

Peter,

Thanks for the suggestions.  I can see situations in which in which
they'd be better solutions.  Much of what I do entails copying & yanking
text into an existing emacs buffer.  I might do this a number of times
into the same file and each of the coding types of the insertions might
be different.  If there were one particular encoding type in emacs which
would display all possible other encoding types properly and allow me to
edit that file, I would try that and maybe even change all my files to
that.  But I doubt that capability exists (or ever will in my current
lifetime).

-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-09 15:54                   ` ken
@ 2007-03-09 16:13                     ` Peter Dyballa
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Dyballa @ 2007-03-09 16:13 UTC (permalink / raw)
  To: ken; +Cc: help-gnu-emacs

Am 09.03.2007 um 16:54 schrieb ken:

> Much of what I do entails copying & yanking
> text into an existing emacs buffer.  I might do this a number of times
> into the same file and each of the coding types of the insertions  
> might
> be different.  If there were one particular encoding type in emacs  
> which
> would display all possible other encoding types properly and allow  
> me to
> edit that file, I would try that and maybe even change all my files to
> that.  But I doubt that capability exists (or ever will in my current
> lifetime).

There definitely is! I have test files encoded in 20 or 30 encodings  
(ISO 8859, some Mac, Adobe PostScript, NeXT, UTF-8, some MS Windows).  
In Mac OS X for example, without GNOME, it works without flaw to  
copy&paste between buffers of the same Emacs, between X clients, and  
between X and Aqua (Apple's "Display PDF" windowing system) clients –  
provided one thing: the contents you paste into some buffer fits into  
this buffer's encoding! Otherwise you get empty boxes ...

GNU Emacs uses internally a particular encoding for text. Depending  
on the coding system you have chosen (by default or deliberately) you  
can get different presentations of the same content. A problem are  
Mac, NeXT, and MS encodings: they use the area of 8 bit control  
characters (U+0080...U+009F) to encode "real" characters. So they  
have 32 entities more than the more useful ISO Latin (ISO 8859-x)  
encodings. Copying from them and pasting into ISO Latin buffers can  
easily fail because many of these "proprietary" characters cannot be  
found in ISO Latin. For example EN or EM DASH, DOUBLE QUOTATION  
MARKs ...

UTF-8 or the iso-2022-x encodings offer some kind of 'one encoding  
fits all' ...

--
Greetings

   Pete

Eat the rich – the poor are tough and stringy.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-09 13:14                 ` Peter Dyballa
  2007-03-09 15:54                   ` ken
@ 2007-03-09 18:41                   ` Reiner Steib
  2007-03-10 18:29                     ` ken
  1 sibling, 1 reply; 34+ messages in thread
From: Reiner Steib @ 2007-03-09 18:41 UTC (permalink / raw)
  To: help-gnu-emacs

On Fri, Mar 09 2007, Peter Dyballa wrote:

> You can easily open such a file in some MS Windows encoding, enter a
> SPC, save in some ISO Latin, and remove the SPC (I think one change
> is needed to make GNU Emacs save a file, because: what's the sense
> in saving a safe file?). Don't forget to save again ...

At least in Emacs 22 you don't need to change the buffer:

C-x C-m r windows-1252 RET
C-x C-m f new-coding RET
C-x C-s

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-09 18:41                   ` Reiner Steib
@ 2007-03-10 18:29                     ` ken
  2007-03-10 18:57                       ` Reiner Steib
                                         ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: ken @ 2007-03-10 18:29 UTC (permalink / raw)
  To: Reiner Steib; +Cc: help-gnu-emacs

On 03/09/2007 01:41 PM somebody named Reiner Steib wrote:
> On Fri, Mar 09 2007, Peter Dyballa wrote:
> 
>> You can easily open such a file in some MS Windows encoding, enter a
>> SPC, save in some ISO Latin, and remove the SPC (I think one change
>> is needed to make GNU Emacs save a file, because: what's the sense
>> in saving a safe file?). Don't forget to save again ...
> 
> At least in Emacs 22 you don't need to change the buffer:
> 
> C-x C-m r windows-1252 RET
> C-x C-m f new-coding RET
> C-x C-s
> 
> Bye, Reiner.

Thanks for these.  Is there a function which will tell me a buffer's
current encoding.  I found describe-coding-system, but when I tried it,
it listed dozens of-- maybe a hundred-- codings.

-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-10 18:29                     ` ken
@ 2007-03-10 18:57                       ` Reiner Steib
  2007-03-10 19:00                       ` Peter Dyballa
  2007-03-10 19:12                       ` Eli Zaretskii
  2 siblings, 0 replies; 34+ messages in thread
From: Reiner Steib @ 2007-03-10 18:57 UTC (permalink / raw)
  To: help-gnu-emacs

On Sat, Mar 10 2007, ken wrote:

> On 03/09/2007 01:41 PM somebody named Reiner Steib wrote:
>> At least in Emacs 22 you don't need to change the buffer:
>> 
>> C-x C-m r windows-1252 RET
>> C-x C-m f new-coding RET
>> C-x C-s
>
> Thanks for these.  Is there a function which will tell me a buffer's
> current encoding.  I found describe-coding-system, but when I tried it,
> it listed dozens of-- maybe a hundred-- codings.

`M-x describe-coding-system RET RET' seems to describe the right one.
Or `<f1> v buffer-file-coding-system RET' ...

,----[ <f1> v buffer-file-coding-system RET ]
| buffer-file-coding-system is a variable defined in `C source code'.
| Its value is emacs-mule
| Local in buffer *followup to ken on gmane.emacs.help*; global value
| is iso-latin-1
| Automatically becomes buffer-local when set in any fashion.
| 
| Documentation:
| Coding system to be used for encoding the buffer contents on saving.
| [...]
`----

Bye, Reiner.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-10 18:29                     ` ken
  2007-03-10 18:57                       ` Reiner Steib
@ 2007-03-10 19:00                       ` Peter Dyballa
  2007-03-10 19:12                       ` Eli Zaretskii
  2 siblings, 0 replies; 34+ messages in thread
From: Peter Dyballa @ 2007-03-10 19:00 UTC (permalink / raw)
  To: ken; +Cc: help-gnu-emacs, Reiner Steib


Am 10.03.2007 um 19:29 schrieb ken:

> Is there a function which will tell me a buffer's
> current encoding.  I found describe-coding-system, but when I tried  
> it,
> it listed dozens of-- maybe a hundred-- codings.

C-u C-x = lists it among other things – you can customise what it shows!

--
Greetings

   Pete

The human brain operates at only 10% of its capacity. The rest is  
overhead for the operating system.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-10 18:29                     ` ken
  2007-03-10 18:57                       ` Reiner Steib
  2007-03-10 19:00                       ` Peter Dyballa
@ 2007-03-10 19:12                       ` Eli Zaretskii
  2 siblings, 0 replies; 34+ messages in thread
From: Eli Zaretskii @ 2007-03-10 19:12 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Sat, 10 Mar 2007 13:29:33 -0500
> From: ken <gebser@speakeasy.net>
> Cc: help-gnu-emacs@gnu.org
> 
> Is there a function which will tell me a buffer's current encoding.

Not function, variable:

    "C-h v buffer-file-coding-system RET"

Also, if you hover the mouse pointer above the encoding indicator at
the left of the mode line, Emacs will show the buffer's encoding in a
tooltip.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-09  1:51             ` Stefan Monnier
  2007-03-09 10:15               ` ken
@ 2007-03-09 10:21               ` ken
  2007-03-09 13:02                 ` Peter Dyballa
       [not found]               ` <mailman.699.1173435731.7795.help-gnu-emacs@gnu.org>
       [not found]               ` <mailman.698.1173435330.7795.help-gnu-emacs@gnu.org>
  3 siblings, 1 reply; 34+ messages in thread
From: ken @ 2007-03-09 10:21 UTC (permalink / raw)
  To: GNU Emacs List

On 03/08/2007 08:51 PM somebody named Stefan Monnier wrote:
>>>> The first buffer is a *scratch* buffer, the modeline starts "--:".  The
>>>> second contains the *.el file mentioned in the original post; its
>>>> modeline begins "-:".
>>> For some reason this second buffer is in unibyte mode.
>>> That's the source of your problem.  Tell us how you created that buffer.
> 
>> C-x C-f
> 
> ....

I've further discovered that it's not even a matter of cutting &
pasting/yanking.  If I simply enter C-q 5 5 4 2 RETURN in *.el and again
in *scratch*, this results in a different character in each buffer, the
*.el file getting it wrong (which is the kern of the problem).


-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-09 10:21               ` ken
@ 2007-03-09 13:02                 ` Peter Dyballa
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Dyballa @ 2007-03-09 13:02 UTC (permalink / raw)
  To: ken; +Cc: GNU Emacs List

Am 09.03.2007 um 11:21 schrieb ken:

> I've further discovered that it's not even a matter of cutting &
> pasting/yanking.  If I simply enter C-q 5 5 4 2 RETURN in *.el and  
> again
> in *scratch*, this results in a different character in each buffer,  
> the
> *.el file getting it wrong (which is the kern of the problem).

Emacs Lisp files are usually kep in some ISO Latin. If you want to  
enter the GREEK SMALL LETTER BETA, U+03B2, i.e. ``β´´ (and not the  
German LATIN SMALL LETTER SHARP S, U+00DF, i.e. ``ß´´), you should  
add this as the first line of your ~/.emacs file (more variables  
possible):

	;;; -*- mode: Emacs-Lisp; coding: utf-8-unix; -*-

First open the file as usual, then add the new line as first line (or  
enter them as local variables at the file's end), and finally save as  
utf-8.

--
Greetings

   Pete

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
      -Benjamin Franklin, Historical Review of Pennsylvania.

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <mailman.699.1173435731.7795.help-gnu-emacs@gnu.org>]

* Re: replacing characters and whacky trans-buffer conversion
       [not found]               ` <mailman.699.1173435731.7795.help-gnu-emacs@gnu.org>
@ 2007-03-09 20:20                 ` Stefan Monnier
  2007-03-10 18:32                   ` ken
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Monnier @ 2007-03-09 20:20 UTC (permalink / raw)
  To: help-gnu-emacs

> the *.el file getting it wrong (which is the kern of the problem).

Yup, and it gets it wrong because it's in unibyte mode.


        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-09 20:20                 ` Stefan Monnier
@ 2007-03-10 18:32                   ` ken
  0 siblings, 0 replies; 34+ messages in thread
From: ken @ 2007-03-10 18:32 UTC (permalink / raw)
  To: help-gnu-emacs

On 03/09/2007 03:20 PM somebody named Stefan Monnier wrote:
>> the *.el file getting it wrong (which is the kern of the problem).
> 
> Yup, and it gets it wrong because it's in unibyte mode.

So it should be in another mode...?  If so, which?  And how to do that?


-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
	-- Samuel Butler

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <mailman.698.1173435330.7795.help-gnu-emacs@gnu.org>]

* Re: replacing characters and whacky trans-buffer conversion
       [not found]               ` <mailman.698.1173435330.7795.help-gnu-emacs@gnu.org>
@ 2007-03-09 20:34                 ` Stefan Monnier
  2007-03-09 22:00                   ` Oliver Scholz
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Monnier @ 2007-03-09 20:34 UTC (permalink / raw)
  To: help-gnu-emacs

> In both the *.el file and in the *scratch* buffer
> default-enable-multibyte-characters is t.

Good (it can't be different: it's a global var.  To see differences, you'd
have to look at enable-multibyte-characters).

> Here's the entirety of the *.el file:

Hmm... the problem is that you have bytes in your file, rather than chars.
Part of it is likely due to bugs in Emacs (it shouldn't let you get into
such a state).

Try to open your file and replace all the funny "“" or "\NNN" (3-chars) with
"\NNN" (6 chars).  Then save&reload.  I.e. turn your file into a plain
ASCII file.

To help you find all the chars you need to replace by escape sequences, you
can use  C-u C-s [^[:ascii:]]

        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
  2007-03-09 20:34                 ` Stefan Monnier
@ 2007-03-09 22:00                   ` Oliver Scholz
  0 siblings, 0 replies; 34+ messages in thread
From: Oliver Scholz @ 2007-03-09 22:00 UTC (permalink / raw)
  To: help-gnu-emacs

Stefan Monnier <monnier@iro.umontreal.ca> writes:

[...]
> Hmm... the problem is that you have bytes in your file, rather than chars.
> Part of it is likely due to bugs in Emacs (it shouldn't let you get into
> such a state).

Ah! I didn't see that. I remember now that Emacs tries to auto-detect
binary files. This isn't a bug.


    Oliver
-- 
19 Ventôse an 215 de la Révolution
Liberté, Egalité, Fraternité!

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <mailman.528.1173194164.7795.help-gnu-emacs@gnu.org>]

* Re: replacing characters and whacky trans-buffer conversion
       [not found] <mailman.528.1173194164.7795.help-gnu-emacs@gnu.org>
@ 2007-03-06 16:41 ` Oliver Scholz
  2007-03-06 17:52 ` Stefan Monnier
  1 sibling, 0 replies; 34+ messages in thread
From: Oliver Scholz @ 2007-03-06 16:41 UTC (permalink / raw)
  To: help-gnu-emacs

ken <gebser@speakeasy.net> writes:

> An email comes in with this (emdash) character in it: –
>
> It looks like an em-dash until the text containing it is pasted into an
> emacs buffer; then it appears as a series of "garbage characters".
> (Copy and paste the emdash into an emacs buffer yourself, and perhaps
> you'll see what I mean.)

I am afraid, your're making assumptions here that don't hold true. I
am using Emacs to read mail and news. Your message was encoded in
UTF-8 and what I see is an en-dash U+2013 (are you sure this was
supposed to be an e_m_-dash?). This does not change when I kill&yank
it into another buffer.

I assume you are using another (non-Emacs) E-Mail client. And you are
copying&pasting it via X-selection, Gnome-Clipboard, MS Windows, the
MacOS GUI or something like this? You might be interested in fixing
this instead of the retroactive fix you're trying to undertake. (I
wouldn't know how though, X-selection is a mystery to me.)

(Have you tried saving the message from your Email application instead
of c&p?)

> To me and, possibly to you, this emdash appears in emacs as nine (9)
> "garbage" characters.
>
> Because I want to programmatically replace these 9 garbage characters
> into something latin1-friendly, I copy-and-paste these nine characters
> into an *.el file containing a line like this:
>
>   (replace-string "–" "--" nil (point-min) (point-max))
                    ^^^

I assume this consisted of nine garbage characters when you wrote your
message? Because here it showed up as a valid U+2013 (EN DASH) encoded
in UTF-8. (I'd have no explanation for this though, because EN DASH in
unrecognized UTF-8 would show up as 3 garbage characters, not 9.)

> The sought string (i.e., the first argument above) isn't found, however
> because, for some whacky reason, the emdash pasted into the *.el file is
> different-- by one character-- from exactly the same emdash pasted into
> the other emacs buffer (the one I'm saving the email in).

The reason is: characters aren't bytes. Because of some bug or
missing feature you are exposed to the internal byte/octet
representation of character in one way or the other (I'd guess the
internal representation used by the clipboard.) Emacs tries to fit
this into its own internal representation, unfortunately it has two of
them: unibyte and multibyte.

> In the emacs buffer containing the email, the fourth garbage character
> (as shown by C-u C-x=) is:

[...]
> buffer code: 0x86 0xE2
>   file code: not encodable by coding system undecided-unix

This is a multibyte buffer.

> In the *.el buffer, the fourth garbage character (which should be
> exactly the same character) is:

[...]
> buffer code: 0xE2
>   file code: 0xE2 (encoded by coding system raw-text-unix)

This is a unibyte buffer. Normally el-files shouldn't be encoded in
raw-text-unix. Were you visiting this file with find-file-literally?
If not what setting has caused this?

It is not a proper solution, but if you are really keen on searching
and replacing the octets rather than fixing the character encoding,
then the proper way would be to make sure that *both* buffers are
unibyte, because that's for dealing with binaries.

    Oliver
-- 
16 Ventôse an 215 de la Révolution
Liberté, Egalité, Fraternité!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: replacing characters and whacky trans-buffer conversion
       [not found] <mailman.528.1173194164.7795.help-gnu-emacs@gnu.org>
  2007-03-06 16:41 ` Oliver Scholz
@ 2007-03-06 17:52 ` Stefan Monnier
  1 sibling, 0 replies; 34+ messages in thread
From: Stefan Monnier @ 2007-03-06 17:52 UTC (permalink / raw)
  To: help-gnu-emacs

> It looks like an em-dash until the text containing it is pasted into an
> Emacs buffer; then it appears as a series of "garbage characters".

It may be a bug elsewhere (e.g. in the application from which you paste),
but please report this via M-x report-emacs-bug.


        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2007-03-10 19:12 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-06 15:15 replacing characters and whacky trans-buffer conversion ken
2007-03-06 16:28 ` Peter Dyballa
2007-03-07  7:38   ` Matthew Flaschen
2007-03-07  9:59     ` Peter Dyballa
2007-03-08 12:16   ` ken
2007-03-08 16:31     ` Peter Dyballa
2007-03-08 20:43   ` ken
2007-03-08 23:14     ` Peter Dyballa
     [not found]     ` <mailman.688.1173395790.7795.help-gnu-emacs@gnu.org>
2007-03-09 14:28       ` Oliver Scholz
2007-03-07 20:48 ` ken
2007-03-07 21:03   ` ken
2007-03-07 21:30     ` Peter Dyballa
2007-03-08  1:11       ` ken
     [not found]       ` <mailman.627.1173316331.7795.help-gnu-emacs@gnu.org>
2007-03-08  7:50         ` Stefan Monnier
2007-03-08 10:40           ` ken
2007-03-08 11:55             ` ken
     [not found]           ` <mailman.648.1173350436.7795.help-gnu-emacs@gnu.org>
2007-03-09  1:51             ` Stefan Monnier
2007-03-09 10:15               ` ken
2007-03-09 13:14                 ` Peter Dyballa
2007-03-09 15:54                   ` ken
2007-03-09 16:13                     ` Peter Dyballa
2007-03-09 18:41                   ` Reiner Steib
2007-03-10 18:29                     ` ken
2007-03-10 18:57                       ` Reiner Steib
2007-03-10 19:00                       ` Peter Dyballa
2007-03-10 19:12                       ` Eli Zaretskii
2007-03-09 10:21               ` ken
2007-03-09 13:02                 ` Peter Dyballa
     [not found]               ` <mailman.699.1173435731.7795.help-gnu-emacs@gnu.org>
2007-03-09 20:20                 ` Stefan Monnier
2007-03-10 18:32                   ` ken
     [not found]               ` <mailman.698.1173435330.7795.help-gnu-emacs@gnu.org>
2007-03-09 20:34                 ` Stefan Monnier
2007-03-09 22:00                   ` Oliver Scholz
     [not found] <mailman.528.1173194164.7795.help-gnu-emacs@gnu.org>
2007-03-06 16:41 ` Oliver Scholz
2007-03-06 17:52 ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.