Emacs text bug

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* Emacs text bug
@ 2013-01-26 20:23 drain
  2013-01-26 22:26 ` Peter Dyballa
  0 siblings, 1 reply; 15+ messages in thread
From: drain @ 2013-01-26 20:23 UTC (permalink / raw)
  To: Help-gnu-emacs

Before I report this as a bug, I want to make sure it doesn't already have
a solution:

All of the "-" characters have been replaced with "\ 342\200\224" (which
has a different face and cannot be replaced with replace-string).

--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577.html
Sent from the Emacs - Help mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-26 20:23 Emacs text bug drain
@ 2013-01-26 22:26 ` Peter Dyballa
  2013-01-26 22:43   ` drain
  2013-01-26 22:48   ` Drew Adams
  0 siblings, 2 replies; 15+ messages in thread
From: Peter Dyballa @ 2013-01-26 22:26 UTC (permalink / raw)
  To: drain; +Cc: Help-gnu-emacs

Am 26.01.2013 um 21:23 schrieb drain:

> All of the "-" characters have been replaced with "\ 342\200\224" (which
> has a different face and cannot be replaced with replace-string).

Because the encoding of the buffer has changed? I can see similar things in one specific user's GNU Emacs. In *compilation* buffers the curly quotes are turned into their byte-triplets, in dired buffers the "ä" in the German name März for March are also sometimes lost. But why and when does this happen? Without this knowledge it's kind of senseless to report…

--
Greetings

  Pete

The best way to accelerate a PC is 9.8 m/s²

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-26 22:26 ` Peter Dyballa
@ 2013-01-26 22:43   ` drain
  2013-01-26 22:59     ` Peter Dyballa
  2013-01-26 22:48   ` Drew Adams
  1 sibling, 1 reply; 15+ messages in thread
From: drain @ 2013-01-26 22:43 UTC (permalink / raw)
  To: Help-gnu-emacs

Perhaps the encoding did change. I recall copy / pasting a bunch of text
from a book online into the buffer, and somewhere along the way I might
have blindly changed the setting.

Which encoding system supports the "—" character?

--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577p276587.html
Sent from the Emacs - Help mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-26 22:43   ` drain
@ 2013-01-26 22:59     ` Peter Dyballa
  2013-01-26 23:23       ` drain
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Dyballa @ 2013-01-26 22:59 UTC (permalink / raw)
  To: drain; +Cc: Help-gnu-emacs


Am 26.01.2013 um 23:43 schrieb drain:

> Which encoding system supports the "—" character?

You showed before that three bytes were used for the EM DASH' encoding, so it was done in UTF-8. (This character can also be encoded in CP125[0-2] and ISO 8859-1 – but then as 1 byte only.)

--
Greetings

  Pete

Chicago, n.:
        Where the dead still vote … early and often!




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-26 22:59     ` Peter Dyballa
@ 2013-01-26 23:23       ` drain
  2013-01-26 23:29         ` Peter Dyballa
  0 siblings, 1 reply; 15+ messages in thread
From: drain @ 2013-01-26 23:23 UTC (permalink / raw)
  To: Help-gnu-emacs

That was a bit tricky. The local buffer setting was "raw text", and I had
to change it to UTF-8. But the strings of codes were not automatically
converted (which would have been nice); I had to copy / paste the text into
the buffer again.

Is there a way to reload these characters once the encoding is changed? I
might have a few buffers like this, and it would save me copy / pasting
texts again. replace-string modus operandi would even work for me.

--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577p276591.html
Sent from the Emacs - Help mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-26 23:23       ` drain
@ 2013-01-26 23:29         ` Peter Dyballa
  2013-01-31 17:55           ` drain
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Dyballa @ 2013-01-26 23:29 UTC (permalink / raw)
  To: drain; +Cc: Help-gnu-emacs


Am 27.01.2013 um 00:23 schrieb drain:

> Is there a way to reload these characters once the encoding is changed?

Yes: revert-buffer-with-coding-system or C-x RET r <encoding> RET

--
Greetings

  Pete

Work is the curse of the drinking class.
				– Oscar Wilde




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-26 23:29         ` Peter Dyballa
@ 2013-01-31 17:55           ` drain
  2013-01-31 18:36             ` Doug Lewan
  2013-01-31 18:52             ` Eli Zaretskii
  0 siblings, 2 replies; 15+ messages in thread
From: drain @ 2013-01-31 17:55 UTC (permalink / raw)
  To: Help-gnu-emacs

Still problems.

(1) revert-buffer-with-coding system RET
(2) utf-8 RET
(3) "Revert buffer from file[...]" y RET
(4) [characters appear as they should now]
(5) [make change so I can save]
(6) save-buffer
(7) "Select coding system (default raw-text)" utf-8
(8) "wrote buffer [...]"
(9) kill-buffer RET foo.org RET
(10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
     characters mangled.



--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577p276925.html
Sent from the Emacs - Help mailing list archive at Nabble.com.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Emacs text bug
  2013-01-31 17:55           ` drain
@ 2013-01-31 18:36             ` Doug Lewan
  2013-01-31 18:45               ` drain
  2013-01-31 18:52             ` Eli Zaretskii
  1 sibling, 1 reply; 15+ messages in thread
From: Doug Lewan @ 2013-01-31 18:36 UTC (permalink / raw)
  To: drain, Help-gnu-emacs@gnu.org

> (9) kill-buffer RET foo.org RET
> (10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
>      characters mangled.

I think that's what you should expect. Once you kill the buffer, emacs forgets all about the file that it had held.

Apparently emacs can't figure out that the file is UTF-8. You'll need to provide a hint. `-*- coding: utf-8 -*-' in the first line is one way. You'll find more in the emacs info page, node `Coding Systems'.

I hope this helps.

,Douglas
Douglas Lewan
Shubert Ticketing
(201) 489-8600 ext 224

When I do good, I feel good. When I do bad, I feel bad and that's my religion. - Abraham Lincoln

> -----Original Message-----
> From: help-gnu-emacs-bounces+dougl=shubertticketing.com@gnu.org
> [mailto:help-gnu-emacs-bounces+dougl=shubertticketing.com@gnu.org] On
> Behalf Of drain
> Sent: Thursday, 2013 January 31 12:56
> To: Help-gnu-emacs@gnu.org
> Subject: Re: Emacs text bug
> 
> Still problems.
> 
> (1) revert-buffer-with-coding system RET
> (2) utf-8 RET
> (3) "Revert buffer from file[...]" y RET
> (4) [characters appear as they should now]
> (5) [make change so I can save]
> (6) save-buffer
> (7) "Select coding system (default raw-text)" utf-8
> (8) "wrote buffer [...]"
> (9) kill-buffer RET foo.org RET
> (10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
>      characters mangled.
> 
> 
> 
> --
> View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-
> text-bug-tp276577p276925.html
> Sent from the Emacs - Help mailing list archive at Nabble.com.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Emacs text bug
  2013-01-31 18:36             ` Doug Lewan
@ 2013-01-31 18:45               ` drain
  2013-01-31 19:08                 ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: drain @ 2013-01-31 18:45 UTC (permalink / raw)
  To: Help-gnu-emacs

Doug Lewan wrote
> You'll need to provide a hint. `-*- coding: utf-8 -*-' in the first line
> is one way. 

That appears to have worked. A bit ugly having that instruction at the top,
but better than manually reverting the buffer every single time.




--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577p276937.html
Sent from the Emacs - Help mailing list archive at Nabble.com.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-31 18:45               ` drain
@ 2013-01-31 19:08                 ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-31 19:08 UTC (permalink / raw)
  To: Help-gnu-emacs

> Date: Thu, 31 Jan 2013 10:45:31 -0800 (PST)
> From: drain <aeuster@gmail.com>
> 
> Doug Lewan wrote
> > You'll need to provide a hint. `-*- coding: utf-8 -*-' in the first line
> > is one way. 
> 
> That appears to have worked. A bit ugly having that instruction at the top,
> but better than manually reverting the buffer every single time.

You shouldn't need that.  You need to clean up your file instead.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-31 17:55           ` drain
  2013-01-31 18:36             ` Doug Lewan
@ 2013-01-31 18:52             ` Eli Zaretskii
  2013-01-31 19:28               ` drain
  1 sibling, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-31 18:52 UTC (permalink / raw)
  To: Help-gnu-emacs

> Date: Thu, 31 Jan 2013 09:55:52 -0800 (PST)
> From: drain <aeuster@gmail.com>
> 
> Still problems.
> 
> (1) revert-buffer-with-coding system RET
> (2) utf-8 RET
> (3) "Revert buffer from file[...]" y RET
> (4) [characters appear as they should now]
> (5) [make change so I can save]
> (6) save-buffer
> (7) "Select coding system (default raw-text)" utf-8
> (8) "wrote buffer [...]"
> (9) kill-buffer RET foo.org RET
> (10) find-file foo.org RET, sees it's back to raw-text, not utf-8, with
>      characters mangled.

Evidently, you have in that file bytes that are not valid UTF-8
sequences.  You need to fix them (the "Select coding system ..."
prompt tells you which characters cannot be encoded in UTF-8 -- those
are the ones you need to fix.).



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-31 18:52             ` Eli Zaretskii
@ 2013-01-31 19:28               ` drain
  2013-01-31 20:04                 ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: drain @ 2013-01-31 19:28 UTC (permalink / raw)
  To: Help-gnu-emacs

Now I see. This problem must have started when I copied an early 19th
century letter into the buffer, and the characters did not transliterate
properly into modern English. Whatever those characters were, they turned
into circumflexed /a/ (â), the pound sign (£), and a (special) right double
quotation mark (”). utf-8 apparently cannot handle these.

But why would this prevent utf-8 from encoding the rest of the buffer? Why
not just leave those three characters mangled, and display the rest
properly? It reverted fine; it just would not stay in utf-8 unless I (1)
put the instruction at the top of the buffer or (2) deleted those special
characters. So the functionality appears to be there: Emacs just would not
accept it as a saved state (absent instruction at the top).

Somehow that buffer got stuck with a limited encoding system. I'm composing
this message right now in a "scratch.org" buffer which is using utf-8-unix
-- and apparently handles those three characters fine (consequently I'm
switching the problem file from utf-8 to utf-8-unix).

Anyway, glad to get that sorted.

--
View this message in context: http://emacs.1067599.n5.nabble.com/Emacs-text-bug-tp276577p276942.html
Sent from the Emacs - Help mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-31 19:28               ` drain
@ 2013-01-31 20:04                 ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2013-01-31 20:04 UTC (permalink / raw)
  To: Help-gnu-emacs

> Date: Thu, 31 Jan 2013 11:28:47 -0800 (PST)
> From: drain <aeuster@gmail.com>
> 
> Now I see. This problem must have started when I copied an early 19th
> century letter into the buffer, and the characters did not transliterate
> properly into modern English. Whatever those characters were, they turned
> into circumflexed /a/ (â), the pound sign (£), and a (special) right double
> quotation mark (”). utf-8 apparently cannot handle these.

UTF-8 certainly _can_ handle them.  I suspect that these characters
got copied as raw bytes instead.

> But why would this prevent utf-8 from encoding the rest of the buffer? Why
> not just leave those three characters mangled, and display the rest
> properly? It reverted fine; it just would not stay in utf-8 unless I (1)
> put the instruction at the top of the buffer or (2) deleted those special
> characters. So the functionality appears to be there: Emacs just would not
> accept it as a saved state (absent instruction at the top).

Emacs auto-detects the encoding each time you visit a file, unless
either the file (by the 'coding:' cookie) or you (by using "C-x RET c")
tell it exactly how to decode the file.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Emacs text bug
  2013-01-26 22:26 ` Peter Dyballa
  2013-01-26 22:43   ` drain
@ 2013-01-26 22:48   ` Drew Adams
  2013-01-26 23:26     ` Peter Dyballa
  1 sibling, 1 reply; 15+ messages in thread
From: Drew Adams @ 2013-01-26 22:48 UTC (permalink / raw)
  To: 'Peter Dyballa', 'drain'; +Cc: Help-gnu-emacs

> But why and when does this happen? Without this knowledge
> it's kind of senseless to report.

I disagree with that claim.

While it is always better to base a bug report on more information, even just
reporting a problem can sometimes help.  At the very least it gives Emacs core
developers and other users a heads-up to look further wrt the problem and its
details (e.g. "why and when").

That's already happening, because the OP posted here, thanks to your reply and
his followup wrt encoding.

Staying in one's corner because one does not have all the info or understanding
is too often a brake on progress.

Not every user has the motivation or the means, including time, to dig deeper
and investigate a problem encountered, to determine the why & when.  Just
communicating that there seems to be a problem, even if one is not sure, is a
good start.

There is no way that Emacs developers can completely test every change they
make.  Users reporting questions and perceived problems are indispensable to
getting it right.

IMHO, it is better for users, especially new users or those who feel unsure, to
err on the side of reporting too much than too little.  It is definitely _not_
the case, IMO, that "it's kind of senseless to report" without knowledge of the
why & when.

The OP brought up the question here first, before reporting, in order to pose
ask whether he was missing something.  That's a good thing.  If the replies here
ultimately suggest that "it doesn't already have a solution", then I, for one,
encourage a bug report.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Emacs text bug
  2013-01-26 22:48   ` Drew Adams
@ 2013-01-26 23:26     ` Peter Dyballa
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Dyballa @ 2013-01-26 23:26 UTC (permalink / raw)
  To: Drew Adams; +Cc: 'drain', Help-gnu-emacs

Am 26.01.2013 um 23:48 schrieb Drew Adams:

> While it is always better to base a bug report on more information, even just
> reporting a problem can sometimes help.  At the very least it gives Emacs core
> developers and other users a heads-up to look further wrt the problem and its
> details (e.g. "why and when").

This happens as far as I can see rarely. Just some days ago it happened again and I was very soon there. C-h l did not show anything. While the compilation was still going on and showed UTF-8 encoding in the mode-line I tried to fix the way the buffer contents was presented by invoking revert-buffer-with-coding-system, C-x RET r, but it did not change anything. All other buffers (I visited) containing non-US ASCII characters showed the same fault: the UTF-8 encoding bytes were displayed.

This could be a Mac OS X problem. Here I can see that 'find … -ls' inserts ASCII NULs, ^@, into *shell* buffer at the transition from the column with the file size to the next one, the one with the date. Or it happens between the date column and the file name column – I am not completely sure about it. Something like these extra characters or bytes could be inserted into the *compilation* buffer as well and then the binary byte sequence gets out of sequence and order. But why does it hit all buffers and not only the faulty one with the extraneous bytes?

There seems to be one more indication: the hardware is PowerPC, 32-bit. The Mac OS X version is also close to ancient: Mac OS X 10.4 or 10.5 (Tiger or Leopard). On intel hardware it did occur yet…

--
Greetings

  Pete

A blizzard is when it snows sideways.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-01-31 20:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-26 20:23 Emacs text bug drain
2013-01-26 22:26 ` Peter Dyballa
2013-01-26 22:43   ` drain
2013-01-26 22:59     ` Peter Dyballa
2013-01-26 23:23       ` drain
2013-01-26 23:29         ` Peter Dyballa
2013-01-31 17:55           ` drain
2013-01-31 18:36             ` Doug Lewan
2013-01-31 18:45               ` drain
2013-01-31 19:08                 ` Eli Zaretskii
2013-01-31 18:52             ` Eli Zaretskii
2013-01-31 19:28               ` drain
2013-01-31 20:04                 ` Eli Zaretskii
2013-01-26 22:48   ` Drew Adams
2013-01-26 23:26     ` Peter Dyballa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).