unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* changing encoding of buffer
@ 2007-05-28 10:34 M G Berberich
  2007-05-28 20:21 ` Eli Zaretskii
       [not found] ` <mailman.1365.1180383723.32220.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 8+ messages in thread
From: M G Berberich @ 2007-05-28 10:34 UTC (permalink / raw)
  To: help-gnu-emacs

Hello,

I have a newsreader that conforms to the standard and assumes ansi
encoding if none is declared. Microsoft does'n care about standards so
MS-products produce postings that are encoded in windows-125* without
a declaration. A real problem in non-englisch-speaking countries.

When I reply, my newsreader appends my UTF-8 signature and starts
emacs. Emacs reads the file and sets the coding-system to latin1,
displaying the posting correctly but garbling up my signature and
worse saving it as latin1 while the newsreader expects UTF-8.

I wrote this function to solve the problem:

(defun fix-ms-posting ()
  "Fixes newsposting that are garbled up by Microsoft-Software"
  (interactive)
  (let ((coding-system-for-write 'raw-text)
	(coding-system-for-read 'utf-8)
	(end (progn (end-of-buffer) (search-backward "\n-- \n"))))
    (revert-buffer-with-coding-system 'utf-8)
    (set-buffer-file-coding-system 'utf-8)
    (shell-command-on-region (point-min) end 
			     "recode windows-1252..utf-8" nil t)))

- Is there realy no other way to change the encoding of the buffer
  than doing a revert? Can't this be done in-place?

- revert-buffer-with-coding-system always ask if it should do so, can
  this be switched off?

- I moved the search-backward to the variables list of let to make it
  fail before harm is done if there is no signature. Is this the way
  to do it?

MfG
bmg

-- 
„Des is völlig wurscht, was heut beschlos- | M G Berberich
 sen wird: I bin sowieso dagegn!“          | berberic@fmi.uni-passau.de
(SPD-Stadtrat Kurt Schindler; Regensburg)  | www.fmi.uni-passau.de/~berberic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: changing encoding of buffer
  2007-05-28 10:34 changing encoding of buffer M G Berberich
@ 2007-05-28 20:21 ` Eli Zaretskii
       [not found] ` <mailman.1365.1180383723.32220.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2007-05-28 20:21 UTC (permalink / raw)
  To: help-gnu-emacs

> From: M G Berberich <berberic@forwiss.uni-passau.de>
> Date: Mon, 28 May 2007 12:34:47 +0200
> 
> When I reply, my newsreader appends my UTF-8 signature and starts
> emacs.

If you append UTF-8 text unconditionally, I think you are guilty no
less than MS: no one said that arbitrary encoded text can be freely
mixed with UTF-8.  Suppose those unnamed "MS-products" did announce
they produce text in windows-1252, how would that help you avoid the
problem?

> I wrote this function to solve the problem:
> 
> (defun fix-ms-posting ()
>   "Fixes newsposting that are garbled up by Microsoft-Software"
>   (interactive)
>   (let ((coding-system-for-write 'raw-text)
> 	(coding-system-for-read 'utf-8)
> 	(end (progn (end-of-buffer) (search-backward "\n-- \n"))))
>     (revert-buffer-with-coding-system 'utf-8)
>     (set-buffer-file-coding-system 'utf-8)
>     (shell-command-on-region (point-min) end 
> 			     "recode windows-1252..utf-8" nil t)))

I see no need to call `recode': Emacs can do that itself.

> - Is there realy no other way to change the encoding of the buffer
>   than doing a revert? Can't this be done in-place?

No, it cannot be done in-place, because by the time you look at the
text in the buffer, it was already converted (a.k.a. "decoded") from
the external encoding on the disk file to the internal representation
Emacs uses in buffers and strings.  The original byte stream is gone,
vanished without a trace.

> - revert-buffer-with-coding-system always ask if it should do so, can
>   this be switched off?

Currently, the only practical way is to define a revert-buffer-function
that simply invokes revert-buffer with its NOCONFIRM arg non-nil.
(Don't forget to unbind revert-buffer-function before calling
revert-buffer, to avoid infinite recursion!)

> - I moved the search-backward to the variables list of let to make it
>   fail before harm is done if there is no signature. Is this the way
>   to do it?

Sorry, I don't understand what you want to do, and why is that a
problem.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: changing encoding of buffer
       [not found] ` <mailman.1365.1180383723.32220.help-gnu-emacs@gnu.org>
@ 2007-05-29 12:39   ` M G Berberich
  2007-05-29 19:52     ` Eli Zaretskii
       [not found]     ` <mailman.1402.1180468336.32220.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 8+ messages in thread
From: M G Berberich @ 2007-05-29 12:39 UTC (permalink / raw)
  To: help-gnu-emacs

On Mon, 28 May 2007 23:21:34 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: M G Berberich <berberic@forwiss.uni-passau.de>
>> Date: Mon, 28 May 2007 12:34:47 +0200
>> 
>> When I reply, my newsreader appends my UTF-8 signature and starts
>> emacs.
>
> If you append UTF-8 text unconditionally, I think you are guilty no
> less than MS: no one said that arbitrary encoded text can be freely
> mixed with UTF-8.  Suppose those unnamed "MS-products" did announce
> they produce text in windows-1252, how would that help you avoid the
> problem?

Then my newsreader would convert the text from windows-1252 to UTF-8
(which is my locale environment) before calling emacs and all would
work fine. The problem is that the encoding is not declared, so the
newsreader does not know that it is windows-1252 and not koi8-r or
EBCDIC or latin-2 or  

>> I wrote this function to solve the problem:
>> 
>> (defun fix-ms-posting ()
>>   "Fixes newsposting that are garbled up by Microsoft-Software"
>>   (interactive)
>>   (let ((coding-system-for-write 'raw-text)
>> 	(coding-system-for-read 'utf-8)
>> 	(end (progn (end-of-buffer) (search-backward "\n-- \n"))))
>>     (revert-buffer-with-coding-system 'utf-8)
>>     (set-buffer-file-coding-system 'utf-8)
>>     (shell-command-on-region (point-min) end 
>> 			     "recode windows-1252..utf-8" nil t)))
>
> I see no need to call `recode': Emacs can do that itself.

Fine, how can this be done?

      MfG
      bmg
-- 
Artikel 3 und 12a des Grundgesetzes  kurz  | M G Berberich
zusammengefaßt: MÀnner  und  Frauen  sind  | berberic@fmi.uni-passau.de
gleichberechtigt, ausgenommen MÀnner, die  |
mÃŒssen zur Bundeswehr.                     | http://www.uni-passau.de/~berberic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: changing encoding of buffer
  2007-05-29 12:39   ` M G Berberich
@ 2007-05-29 19:52     ` Eli Zaretskii
       [not found]     ` <mailman.1402.1180468336.32220.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2007-05-29 19:52 UTC (permalink / raw)
  To: help-gnu-emacs

> From: M G Berberich <berberic@forwiss.uni-passau.de>
> Date: Tue, 29 May 2007 14:39:20 +0200
> 
> > Suppose those unnamed "MS-products" did announce they produce text
> > in windows-1252, how would that help you avoid the problem?
> 
> Then my newsreader would convert the text from windows-1252 to UTF-8
> (which is my locale environment) before calling emacs and all would
> work fine.

And your newsreader cannot be told that, when the encoding is not
stated, to assume windows-1252, as your fix-ms-posting does?

(Btw, did you use that newsreader to post your article?  If so, it
also lies about the encoding: it claimed the message was in Latin-9
(iso-8859-15), when in fact it was in UTF-8.)

> The problem is that the encoding is not declared, so the
> newsreader does not know that it is windows-1252 and not koi8-r or
> EBCDIC or latin-2 or …

In my experience, when the encoding is not stated, or stated as
Latin-1, it is windows-1252.  I have yet to see a koi8-r encoded
message that doesn't say it, but I guess anything could happen.

> > I see no need to call `recode': Emacs can do that itself.
> 
> Fine, how can this be done?

Use encode-coding-region to encode the region in windows-1252, then
use decode-coding-region to decode it back as UTF-8.  That's it!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: changing encoding of buffer
       [not found]     ` <mailman.1402.1180468336.32220.help-gnu-emacs@gnu.org>
@ 2007-05-29 21:06       ` M G Berberich
  2007-05-30  3:22         ` Eli Zaretskii
       [not found]         ` <mailman.1416.1180495357.32220.help-gnu-emacs@gnu.org>
  2007-05-31 18:19       ` Giorgos Keramidas
  1 sibling, 2 replies; 8+ messages in thread
From: M G Berberich @ 2007-05-29 21:06 UTC (permalink / raw)
  To: help-gnu-emacs

Am Tue, 29 May 2007 22:52:11 +0300 schrieb Eli Zaretskii:
>> From: M G Berberich <berberic@forwiss.uni-passau.de>
>> Date: Tue, 29 May 2007 14:39:20 +0200
>> 
>> > Suppose those unnamed "MS-products" did announce they produce text
>> > in windows-1252, how would that help you avoid the problem?
>> 
>> Then my newsreader would convert the text from windows-1252 to UTF-8
>> (which is my locale environment) before calling emacs and all would
>> work fine.
>
> And your newsreader cannot be told that, when the encoding is not
> stated, to assume windows-1252, as your fix-ms-posting does?

Unfortunately not, it sticks to the standard and that says: No
declaration => ansi.

> (Btw, did you use that newsreader to post your article?  

No. This was the newsreader at work. 

> If so, it also lies about the encoding: it claimed the message was
> in Latin-9 (iso-8859-15), when in fact it was in UTF-8.)

You are right. It was definitely misconfigured. And yes, it's my
fault :(

>> > I see no need to call `recode': Emacs can do that itself.
>> 
>> Fine, how can this be done?
>
> Use encode-coding-region to encode the region in windows-1252, then
> use decode-coding-region to decode it back as UTF-8.  That's it!

Thanks. It has to be:

    (encode-coding-region (point-min) end 'utf-8)
    (decode-coding-region (point-min) end 'windows-1252)

to work for me.

   MfG
   bmg
-- 
„Des is völlig wurscht, was heut beschlos- | M G Berberich
 sen wird: I bin sowieso dagegn!“          | berberic@fmi.uni-passau.de
(SPD-Stadtrat Kurt Schindler; Regensburg)  | www.fmi.uni-passau.de/~berberic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: changing encoding of buffer
  2007-05-29 21:06       ` M G Berberich
@ 2007-05-30  3:22         ` Eli Zaretskii
       [not found]         ` <mailman.1416.1180495357.32220.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2007-05-30  3:22 UTC (permalink / raw)
  To: help-gnu-emacs

> From: M G Berberich <berberic@forwiss.uni-passau.de>
> Date: Tue, 29 May 2007 23:06:48 +0200
> 
> > Use encode-coding-region to encode the region in windows-1252, then
> > use decode-coding-region to decode it back as UTF-8.  That's it!
> 
> Thanks. It has to be:
> 
>     (encode-coding-region (point-min) end 'utf-8)
>     (decode-coding-region (point-min) end 'windows-1252)
> 
> to work for me.

That's strange: the function you showed that used recode did it the
other way around, no?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: changing encoding of buffer
       [not found]         ` <mailman.1416.1180495357.32220.help-gnu-emacs@gnu.org>
@ 2007-05-30  7:35           ` M G Berberich
  0 siblings, 0 replies; 8+ messages in thread
From: M G Berberich @ 2007-05-30  7:35 UTC (permalink / raw)
  To: help-gnu-emacs

On Wed, 30 May 2007 06:22:29 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: M G Berberich <berberic@forwiss.uni-passau.de>
>> Date: Tue, 29 May 2007 23:06:48 +0200
>> 
>> > Use encode-coding-region to encode the region in windows-1252, then
>> > use decode-coding-region to decode it back as UTF-8.  That's it!
>> 
>> Thanks. It has to be:
>> 
>>     (encode-coding-region (point-min) end 'utf-8)
>>     (decode-coding-region (point-min) end 'windows-1252)
>> 
>> to work for me.
>
> That's strange: the function you showed that used recode did it the
> other way around, no?

No. It works this way.

  Given: text in windows-1252-encoding in a utf-8-buffer
  Result: same text in UTF-8-encoding in a utf-8-buffer

Don't ask me why.

MfG
bmg

-- 
Artikel 3 und 12a des Grundgesetzes  kurz  | M G Berberich
zusammengefaßt: Männer  und  Frauen  sind  | berberic@fmi.uni-passau.de
gleichberechtigt, ausgenommen Männer, die  |
müssen zur Bundeswehr.                     | http://www.uni-passau.de/~berberic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: changing encoding of buffer
       [not found]     ` <mailman.1402.1180468336.32220.help-gnu-emacs@gnu.org>
  2007-05-29 21:06       ` M G Berberich
@ 2007-05-31 18:19       ` Giorgos Keramidas
  1 sibling, 0 replies; 8+ messages in thread
From: Giorgos Keramidas @ 2007-05-31 18:19 UTC (permalink / raw)
  To: help-gnu-emacs

On Tue, 29 May 2007 22:52:11 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
> In my experience, when the encoding is not stated, or stated as
> Latin-1, it is windows-1252.

That's true.  At least most of the time.

In Greek-speaking user groups and mailing lists, it's also common
to see messages which contain ISO 8859-7 text, but have their
'charset=' set to ISO 8859-1.

Most of the 'old timers' complain loudly for a while, the user
fixes his mailer, and then a few weeks pass until the next
newcomers hits the list with another misconfigured web-based mail
UI, or a misconfigured MUA.  Oh well... :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-05-31 18:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-28 10:34 changing encoding of buffer M G Berberich
2007-05-28 20:21 ` Eli Zaretskii
     [not found] ` <mailman.1365.1180383723.32220.help-gnu-emacs@gnu.org>
2007-05-29 12:39   ` M G Berberich
2007-05-29 19:52     ` Eli Zaretskii
     [not found]     ` <mailman.1402.1180468336.32220.help-gnu-emacs@gnu.org>
2007-05-29 21:06       ` M G Berberich
2007-05-30  3:22         ` Eli Zaretskii
     [not found]         ` <mailman.1416.1180495357.32220.help-gnu-emacs@gnu.org>
2007-05-30  7:35           ` M G Berberich
2007-05-31 18:19       ` Giorgos Keramidas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).