all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Mark Lillibridge <mark.lillibridge@hp.com>
To: emacs-devel@gnu.org
Subject: why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)]
Date: Sat, 08 Jan 2011 21:52:27 -0800	[thread overview]
Message-ID: <qmh7hee8vz8.fsf@hp.com> (raw)
In-Reply-To: <qmhd3o780jy.fsf@hp.com> (message from Mark Lillibridge on Sat, 8 Jan 2011 14:58:57 -0800)


    Ok, I think I now understand why unrmail fails with raw-text*.
Unrmail reads and decodes BABYL files in a subtly different way than
Rmail does.  In particular, it does:

  (with-temp-buffer
    ;; Read in the old Rmail file with no decoding.
    (let ((coding-system-for-read 'raw-text))
      (insert-file-contents file))
    ;; But make it multibyte.
    (set-buffer-multibyte t)
    (setq buffer-file-coding-system 'raw-text-unix)

Not obvious, but important: with-temp-buffer creates a multibyte buffer
so that insert-file-contents is decoding from raw-text to a multibyte
buffer, producing raw 8-bit bytes for x80-xff.  The
(set-buffer-multibyte t) here is a no-op as far as I can tell as the
buffer is already multibyte at that point.


    It then decodes the middle part as Rmail does:

      (unless (and coding-system
		   (coding-system-p coding-system))
	(setq coding-system
	      ;; Emacs 21.1 and later writes RMAIL files in emacs-mule, but
	      ;; earlier versions did that with the current buffer's encoding.
	      ;; So we want to favor detection of emacs-mule (whose normal
	      ;; priority is quite low), but still allow detection of other
	      ;; encodings if emacs-mule won't fit.  The call to
	      ;; detect-coding-with-priority below achieves that.
	      (car (detect-coding-with-priority
		    from to
		    '((coding-category-emacs-mule . emacs-mule))))))
      (unless (memq coding-system
		    '(undecided undecided-unix))
	(set-buffer-modified-p t)	; avoid locking when decoding
	(let ((buffer-undo-list t))
	  (decode-coding-region from to coding-system))
	(setq coding-system last-coding-system-used))


    So, Rmail is doing read unibyte, decode, then convert to multibyte
while unrmail does read multibyte then decode.  This produces the same
results for all coding systems except raw-text*.  The reason is that
read raw-text unibyte then convert to multibyte produces a different
result than reading raw-text directly to multibyte!  The later produces
raw bytes while the former produces code points.  Needless to say,
trying to encode from raw bytes instead of code points gives different
results.


    My testing so far shows that this problem can be fixed for version
22 by switching to the Rmail way of doing things (e.g., read unibyte and
only convert to multibyte at the end).  A more complicated solution will
be needed for version 23.  Should I produce a patch for version 22 given
that it will not work for version 23?

- Mark



       reply	other threads:[~2011-01-09  5:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <qmhd3o780jy.fsf@hp.com>
2011-01-09  5:52 ` Mark Lillibridge [this message]
2011-01-09  8:25   ` why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)] Eli Zaretskii
2011-01-09 10:01     ` Eli Zaretskii
2011-01-09 18:41     ` Mark Lillibridge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=qmh7hee8vz8.fsf@hp.com \
    --to=mark.lillibridge@hp.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.