unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)]
       [not found] <qmhd3o780jy.fsf@hp.com>
@ 2011-01-09  5:52 ` Mark Lillibridge
  2011-01-09  8:25   ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Lillibridge @ 2011-01-09  5:52 UTC (permalink / raw)
  To: emacs-devel


    Ok, I think I now understand why unrmail fails with raw-text*.
Unrmail reads and decodes BABYL files in a subtly different way than
Rmail does.  In particular, it does:

  (with-temp-buffer
    ;; Read in the old Rmail file with no decoding.
    (let ((coding-system-for-read 'raw-text))
      (insert-file-contents file))
    ;; But make it multibyte.
    (set-buffer-multibyte t)
    (setq buffer-file-coding-system 'raw-text-unix)

Not obvious, but important: with-temp-buffer creates a multibyte buffer
so that insert-file-contents is decoding from raw-text to a multibyte
buffer, producing raw 8-bit bytes for x80-xff.  The
(set-buffer-multibyte t) here is a no-op as far as I can tell as the
buffer is already multibyte at that point.


    It then decodes the middle part as Rmail does:

      (unless (and coding-system
		   (coding-system-p coding-system))
	(setq coding-system
	      ;; Emacs 21.1 and later writes RMAIL files in emacs-mule, but
	      ;; earlier versions did that with the current buffer's encoding.
	      ;; So we want to favor detection of emacs-mule (whose normal
	      ;; priority is quite low), but still allow detection of other
	      ;; encodings if emacs-mule won't fit.  The call to
	      ;; detect-coding-with-priority below achieves that.
	      (car (detect-coding-with-priority
		    from to
		    '((coding-category-emacs-mule . emacs-mule))))))
      (unless (memq coding-system
		    '(undecided undecided-unix))
	(set-buffer-modified-p t)	; avoid locking when decoding
	(let ((buffer-undo-list t))
	  (decode-coding-region from to coding-system))
	(setq coding-system last-coding-system-used))


    So, Rmail is doing read unibyte, decode, then convert to multibyte
while unrmail does read multibyte then decode.  This produces the same
results for all coding systems except raw-text*.  The reason is that
read raw-text unibyte then convert to multibyte produces a different
result than reading raw-text directly to multibyte!  The later produces
raw bytes while the former produces code points.  Needless to say,
trying to encode from raw bytes instead of code points gives different
results.


    My testing so far shows that this problem can be fixed for version
22 by switching to the Rmail way of doing things (e.g., read unibyte and
only convert to multibyte at the end).  A more complicated solution will
be needed for version 23.  Should I produce a patch for version 22 given
that it will not work for version 23?

- Mark



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)]
  2011-01-09  5:52 ` why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)] Mark Lillibridge
@ 2011-01-09  8:25   ` Eli Zaretskii
  2011-01-09 10:01     ` Eli Zaretskii
  2011-01-09 18:41     ` Mark Lillibridge
  0 siblings, 2 replies; 4+ messages in thread
From: Eli Zaretskii @ 2011-01-09  8:25 UTC (permalink / raw)
  To: mark.lillibridge; +Cc: emacs-devel

> Date: Sat, 08 Jan 2011 21:52:27 -0800
> From: Mark Lillibridge <mark.lillibridge@hp.com>
> Reply-To: mark.lillibridge@hp.com
> 
>   (with-temp-buffer
>     ;; Read in the old Rmail file with no decoding.
>     (let ((coding-system-for-read 'raw-text))
>       (insert-file-contents file))
>     ;; But make it multibyte.
>     (set-buffer-multibyte t)
>     (setq buffer-file-coding-system 'raw-text-unix)
> 
> Not obvious, but important: with-temp-buffer creates a multibyte buffer
> so that insert-file-contents is decoding from raw-text to a multibyte
> buffer, producing raw 8-bit bytes for x80-xff.

But doesn't insert-file-contents make the buffer unibyte due to the
fact that raw-text is being used for decoding?

>     My testing so far shows that this problem can be fixed for version
> 22 by switching to the Rmail way of doing things (e.g., read unibyte and
> only convert to multibyte at the end).  A more complicated solution will
> be needed for version 23.  Should I produce a patch for version 22 given
> that it will not work for version 23?

Version 22 is no longer maintained, so providing patches for it would
be pointless, especially as Emacs 23 has fundamentally changed the way
raw bytes are represented and handled.

Emacs 23.3 is in pretest, so if you hurry, you could get the fix into
it (and into all the later versions).

Thanks.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)]
  2011-01-09  8:25   ` Eli Zaretskii
@ 2011-01-09 10:01     ` Eli Zaretskii
  2011-01-09 18:41     ` Mark Lillibridge
  1 sibling, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2011-01-09 10:01 UTC (permalink / raw)
  To: mark.lillibridge; +Cc: emacs-devel

> From: Eli Zaretskii <eliz@gnu.org>
> Date: Sun, 09 Jan 2011 03:25:11 -0500
> Cc: emacs-devel@gnu.org
> Reply-To: Eli Zaretskii <eliz@gnu.org>
> 
> > Not obvious, but important: with-temp-buffer creates a multibyte buffer
> > so that insert-file-contents is decoding from raw-text to a multibyte
> > buffer, producing raw 8-bit bytes for x80-xff.
> 
> But doesn't insert-file-contents make the buffer unibyte due to the
> fact that raw-text is being used for decoding?

Answering my own question here: yes, it does.  Here's the relevant
portion of insert-file-contents from the current development sources:

  if (!NILP (visit))
    {
      /* When we visit a file by raw-text, we change the buffer to
	 unibyte.  */
      if (CODING_FOR_UNIBYTE (&coding)
	  /* Can't do this if part of the buffer might be preserved.  */
	  && NILP (replace))
	/* Visiting a file with these coding system makes the buffer
	   unibyte. */
	current_buffer->enable_multibyte_characters = Qnil;
    }

The current sources of the Emacs 23.3 release branch have the same
code.

So I would expect this problem not to exist in Emacs 23 and later.  If
it does exist, then probably there's some other factor at work here.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)]
  2011-01-09  8:25   ` Eli Zaretskii
  2011-01-09 10:01     ` Eli Zaretskii
@ 2011-01-09 18:41     ` Mark Lillibridge
  1 sibling, 0 replies; 4+ messages in thread
From: Mark Lillibridge @ 2011-01-09 18:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel


Eli wrote:
>  I (Mark) wrote:
>  > Not obvious, but important: with-temp-buffer creates a multibyte buffer
>  > so that insert-file-contents is decoding from raw-text to a multibyte
>  > buffer, producing raw 8-bit bytes for x80-xff.
>  
>  But doesn't insert-file-contents make the buffer unibyte due to the
>  fact that raw-text is being used for decoding?

    I looked again at the source code of unrmail, and it does not call
insert-file-contents with visit set, hence the buffer still being
multibyte.  If the author had done so and also specified replace (see
your excerpted source code fragment), presumably the buffer would have
been converted to unibyte.  There would still have been a bug because
the author also had code to convert to multibyte before doing decoding.


>  Version 22 is no longer maintained, so providing patches for it would
>  be pointless, especially as Emacs 23 has fundamentally changed the way
>  raw bytes are represented and handled.

Ok.


>  Emacs 23.3 is in pretest, so if you hurry, you could get the fix into
>  it (and into all the later versions).

    Building a fix for version 23 is trickier.  We need to decode the
output of version 22's raw-text into the new buffer internal
representation.  Can we use emacs-mule for this purpose?  Is it true
that decoding an arbitrary byte stream and then writing it out via
emacs-mule (version 23) produces exactly the same byte sequence?

- Mark



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-01-09 18:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <qmhd3o780jy.fsf@hp.com>
2011-01-09  5:52 ` why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)] Mark Lillibridge
2011-01-09  8:25   ` Eli Zaretskii
2011-01-09 10:01     ` Eli Zaretskii
2011-01-09 18:41     ` Mark Lillibridge

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).