* why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)] [not found] <qmhd3o780jy.fsf@hp.com> @ 2011-01-09 5:52 ` Mark Lillibridge 2011-01-09 8:25 ` Eli Zaretskii 0 siblings, 1 reply; 4+ messages in thread From: Mark Lillibridge @ 2011-01-09 5:52 UTC (permalink / raw) To: emacs-devel Ok, I think I now understand why unrmail fails with raw-text*. Unrmail reads and decodes BABYL files in a subtly different way than Rmail does. In particular, it does: (with-temp-buffer ;; Read in the old Rmail file with no decoding. (let ((coding-system-for-read 'raw-text)) (insert-file-contents file)) ;; But make it multibyte. (set-buffer-multibyte t) (setq buffer-file-coding-system 'raw-text-unix) Not obvious, but important: with-temp-buffer creates a multibyte buffer so that insert-file-contents is decoding from raw-text to a multibyte buffer, producing raw 8-bit bytes for x80-xff. The (set-buffer-multibyte t) here is a no-op as far as I can tell as the buffer is already multibyte at that point. It then decodes the middle part as Rmail does: (unless (and coding-system (coding-system-p coding-system)) (setq coding-system ;; Emacs 21.1 and later writes RMAIL files in emacs-mule, but ;; earlier versions did that with the current buffer's encoding. ;; So we want to favor detection of emacs-mule (whose normal ;; priority is quite low), but still allow detection of other ;; encodings if emacs-mule won't fit. The call to ;; detect-coding-with-priority below achieves that. (car (detect-coding-with-priority from to '((coding-category-emacs-mule . emacs-mule)))))) (unless (memq coding-system '(undecided undecided-unix)) (set-buffer-modified-p t) ; avoid locking when decoding (let ((buffer-undo-list t)) (decode-coding-region from to coding-system)) (setq coding-system last-coding-system-used)) So, Rmail is doing read unibyte, decode, then convert to multibyte while unrmail does read multibyte then decode. This produces the same results for all coding systems except raw-text*. The reason is that read raw-text unibyte then convert to multibyte produces a different result than reading raw-text directly to multibyte! The later produces raw bytes while the former produces code points. Needless to say, trying to encode from raw bytes instead of code points gives different results. My testing so far shows that this problem can be fixed for version 22 by switching to the Rmail way of doing things (e.g., read unibyte and only convert to multibyte at the end). A more complicated solution will be needed for version 23. Should I produce a patch for version 22 given that it will not work for version 23? - Mark ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)] 2011-01-09 5:52 ` why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)] Mark Lillibridge @ 2011-01-09 8:25 ` Eli Zaretskii 2011-01-09 10:01 ` Eli Zaretskii 2011-01-09 18:41 ` Mark Lillibridge 0 siblings, 2 replies; 4+ messages in thread From: Eli Zaretskii @ 2011-01-09 8:25 UTC (permalink / raw) To: mark.lillibridge; +Cc: emacs-devel > Date: Sat, 08 Jan 2011 21:52:27 -0800 > From: Mark Lillibridge <mark.lillibridge@hp.com> > Reply-To: mark.lillibridge@hp.com > > (with-temp-buffer > ;; Read in the old Rmail file with no decoding. > (let ((coding-system-for-read 'raw-text)) > (insert-file-contents file)) > ;; But make it multibyte. > (set-buffer-multibyte t) > (setq buffer-file-coding-system 'raw-text-unix) > > Not obvious, but important: with-temp-buffer creates a multibyte buffer > so that insert-file-contents is decoding from raw-text to a multibyte > buffer, producing raw 8-bit bytes for x80-xff. But doesn't insert-file-contents make the buffer unibyte due to the fact that raw-text is being used for decoding? > My testing so far shows that this problem can be fixed for version > 22 by switching to the Rmail way of doing things (e.g., read unibyte and > only convert to multibyte at the end). A more complicated solution will > be needed for version 23. Should I produce a patch for version 22 given > that it will not work for version 23? Version 22 is no longer maintained, so providing patches for it would be pointless, especially as Emacs 23 has fundamentally changed the way raw bytes are represented and handled. Emacs 23.3 is in pretest, so if you hurry, you could get the fix into it (and into all the later versions). Thanks. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)] 2011-01-09 8:25 ` Eli Zaretskii @ 2011-01-09 10:01 ` Eli Zaretskii 2011-01-09 18:41 ` Mark Lillibridge 1 sibling, 0 replies; 4+ messages in thread From: Eli Zaretskii @ 2011-01-09 10:01 UTC (permalink / raw) To: mark.lillibridge; +Cc: emacs-devel > From: Eli Zaretskii <eliz@gnu.org> > Date: Sun, 09 Jan 2011 03:25:11 -0500 > Cc: emacs-devel@gnu.org > Reply-To: Eli Zaretskii <eliz@gnu.org> > > > Not obvious, but important: with-temp-buffer creates a multibyte buffer > > so that insert-file-contents is decoding from raw-text to a multibyte > > buffer, producing raw 8-bit bytes for x80-xff. > > But doesn't insert-file-contents make the buffer unibyte due to the > fact that raw-text is being used for decoding? Answering my own question here: yes, it does. Here's the relevant portion of insert-file-contents from the current development sources: if (!NILP (visit)) { /* When we visit a file by raw-text, we change the buffer to unibyte. */ if (CODING_FOR_UNIBYTE (&coding) /* Can't do this if part of the buffer might be preserved. */ && NILP (replace)) /* Visiting a file with these coding system makes the buffer unibyte. */ current_buffer->enable_multibyte_characters = Qnil; } The current sources of the Emacs 23.3 release branch have the same code. So I would expect this problem not to exist in Emacs 23 and later. If it does exist, then probably there's some other factor at work here. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)] 2011-01-09 8:25 ` Eli Zaretskii 2011-01-09 10:01 ` Eli Zaretskii @ 2011-01-09 18:41 ` Mark Lillibridge 1 sibling, 0 replies; 4+ messages in thread From: Mark Lillibridge @ 2011-01-09 18:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli wrote: > I (Mark) wrote: > > Not obvious, but important: with-temp-buffer creates a multibyte buffer > > so that insert-file-contents is decoding from raw-text to a multibyte > > buffer, producing raw 8-bit bytes for x80-xff. > > But doesn't insert-file-contents make the buffer unibyte due to the > fact that raw-text is being used for decoding? I looked again at the source code of unrmail, and it does not call insert-file-contents with visit set, hence the buffer still being multibyte. If the author had done so and also specified replace (see your excerpted source code fragment), presumably the buffer would have been converted to unibyte. There would still have been a bug because the author also had code to convert to multibyte before doing decoding. > Version 22 is no longer maintained, so providing patches for it would > be pointless, especially as Emacs 23 has fundamentally changed the way > raw bytes are represented and handled. Ok. > Emacs 23.3 is in pretest, so if you hurry, you could get the fix into > it (and into all the later versions). Building a fix for version 23 is trickier. We need to decode the output of version 22's raw-text into the new buffer internal representation. Can we use emacs-mule for this purpose? Is it true that decoding an arbitrary byte stream and then writing it out via emacs-mule (version 23) produces exactly the same byte sequence? - Mark ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-01-09 18:41 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <qmhd3o780jy.fsf@hp.com> 2011-01-09 5:52 ` why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)] Mark Lillibridge 2011-01-09 8:25 ` Eli Zaretskii 2011-01-09 10:01 ` Eli Zaretskii 2011-01-09 18:41 ` Mark Lillibridge
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).