unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [HELP] (bug?) Saving a buffer without any conversion?
@ 2003-01-13 17:52 Mario Lang
  2003-01-14  1:00 ` Kim F. Storm
  2003-01-15  1:16 ` Kenichi Handa
  0 siblings, 2 replies; 19+ messages in thread
From: Mario Lang @ 2003-01-13 17:52 UTC (permalink / raw)


Hi there.

I'm trying to sort out a bug in erc-dcc.el since some time now and since
I'm really lost now, I thought I'd ask here, and maybe get some enlightenment:

We're receiving binary content via a network process.  After the
transfer is complete, this buffer should be saved to a file.

The effect I'm having is that we receive 1372422 bytes via the process
filter function STRING argument, and after insertion into a buffer,
we have a buffer with buffer-size 1372422, but after calling (save-buffer)
we get this:

-rw-r--r--    1 root     root      1865264 Jan 13 18:35 blah28.mp3

I'm using:

      (set-process-coding-system proc 'binary 'binary)
      (set-buffer-file-coding-system 'no-conversion t)

To set up my process/buffer.  I've tried virtually every possible combination
of 'binary, 'no-conversion, 'raw-text, 'raw-text-unix, and whatever the Elisp manual may
have suggested :-), but I could not find any combination which
would allow me to save this buffer without Emacs doing any magic modifications
in between.  I know and love the automagic way we're getting coding-systems
converted these days, but I'd also like to know how to consistantly turn
it off for such a case.

Can anyone help?

-- 
CYa,
  Mario

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-13 17:52 [HELP] (bug?) Saving a buffer without any conversion? Mario Lang
@ 2003-01-14  1:00 ` Kim F. Storm
  2003-01-14  6:06   ` Eli Zaretskii
  2003-01-14 16:19   ` Stefan Monnier
  2003-01-15  1:16 ` Kenichi Handa
  1 sibling, 2 replies; 19+ messages in thread
From: Kim F. Storm @ 2003-01-14  1:00 UTC (permalink / raw)
  Cc: Mario Lang

Mario Lang <mlang@delysid.org> writes:

> Hi there.
> 
> I'm trying to sort out a bug in erc-dcc.el since some time now and since
> I'm really lost now, I thought I'd ask here, and maybe get some enlightenment:
> 
> We're receiving binary content via a network process.  After the
> transfer is complete, this buffer should be saved to a file.
> 
> The effect I'm having is that we receive 1372422 bytes via the process
> filter function STRING argument, and after insertion into a buffer,
> we have a buffer with buffer-size 1372422, but after calling (save-buffer)
> we get this:
> 
> -rw-r--r--    1 root     root      1865264 Jan 13 18:35 blah28.mp3
> 
> I'm using:
> 
>       (set-process-coding-system proc 'binary 'binary)
>       (set-buffer-file-coding-system 'no-conversion t)
> 

I have looked at Mario's data before sending it to emacs and after
emacs has written it to a file.

It seems that every byte in the range 0xa0 .. 0xff that were in the
original file is prefixed with an 0x81 byte in the file containing the
received data.  To me, that looks like the internal multi-byte
representation for the binary data.

Actually the value returned by (buffer-size) in the buffer containing
the received binary data equals the size of the original data (so if
seen from the lisp level, the buffer contains the right number of
bytes).  But when written to disk, the internal representation of the
buffer is stored instead of the "data".

The buffer's coding system for save is no-conversion.  How did
that internal data end up in the file?

What coding systems should be set on the network process and on the
buffer to make it possible to have the received binary data in the
buffer make its way unmangled into the file on the disk?


-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-14  1:00 ` Kim F. Storm
@ 2003-01-14  6:06   ` Eli Zaretskii
  2003-01-14  6:46     ` Mario Lang
  2003-01-14 16:19   ` Stefan Monnier
  1 sibling, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2003-01-14  6:06 UTC (permalink / raw)
  Cc: emacs-devel


On 14 Jan 2003, Kim F. Storm wrote:

> > We're receiving binary content via a network process.  After the
> > transfer is complete, this buffer should be saved to a file.
> > 
> > The effect I'm having is that we receive 1372422 bytes via the process
> > filter function STRING argument, and after insertion into a buffer,
> > we have a buffer with buffer-size 1372422, but after calling (save-buffer)
> > we get this:
> > 
> > -rw-r--r--    1 root     root      1865264 Jan 13 18:35 blah28.mp3
> > 
> > I'm using:
> > 
> >       (set-process-coding-system proc 'binary 'binary)
> >       (set-buffer-file-coding-system 'no-conversion t)
> > 
> 
> I have looked at Mario's data before sending it to emacs and after
> emacs has written it to a file.
> 
> It seems that every byte in the range 0xa0 .. 0xff that were in the
> original file is prefixed with an 0x81 byte in the file containing the
> received data.  To me, that looks like the internal multi-byte
> representation for the binary data.

Yes.  That's what no-conversion does: it prevents encoding of the 
internal buffer's contents.

I suggest to use raw-text for both coding systems above, and see if that 
helps.

An alternative approach is to (set-buffer-multibyte nil) before reading 
the data into it and before saving it.

> The buffer's coding system for save is no-conversion.  How did
> that internal data end up in the file?

Probably because the buffer was a multibyte buffer, in which case 
no-conversion writes out the internal representation.  That's why I 
suggested using raw-text to save the buffer.

The reason you seem to see the right size is that Emacs tries very hard 
to conceal the fact that some characters in the 128-255 code range are 
stored in a multibyte buffer as multibyte sequences.  Using no-conversion 
to save such a buffer exposes the internal representation, so it is 
exactly the thing _not_to_do_ in this case.

> What coding systems should be set on the network process and on the
> buffer to make it possible to have the received binary data in the
> buffer make its way unmangled into the file on the disk?

As I said above, two ways: either force the receiving buffer to be 
unibyte, or use raw-text to save it.  Both ways should have the same 
effect; however, I'm personally biased towards not using unibyte buffers, 
so my preference would be to try the raw-text approach first.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-14  6:06   ` Eli Zaretskii
@ 2003-01-14  6:46     ` Mario Lang
  2003-01-14 18:37       ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Mario Lang @ 2003-01-14  6:46 UTC (permalink / raw)
  Cc: emacs-devel

Eli Zaretskii <eliz@is.elta.co.il> writes:

> On 14 Jan 2003, Kim F. Storm wrote:
>
>> > We're receiving binary content via a network process.  After the
>> > transfer is complete, this buffer should be saved to a file.
>> > 
>> > The effect I'm having is that we receive 1372422 bytes via the process
>> > filter function STRING argument, and after insertion into a buffer,
>> > we have a buffer with buffer-size 1372422, but after calling (save-buffer)
>> > we get this:
>> > 
>> > -rw-r--r--    1 root     root      1865264 Jan 13 18:35 blah28.mp3
>> > 
>> > I'm using:
>> > 
>> >       (set-process-coding-system proc 'binary 'binary)
>> >       (set-buffer-file-coding-system 'no-conversion t)
>> > 
>> 
>> I have looked at Mario's data before sending it to emacs and after
>> emacs has written it to a file.
>> 
>> It seems that every byte in the range 0xa0 .. 0xff that were in the
>> original file is prefixed with an 0x81 byte in the file containing the
>> received data.  To me, that looks like the internal multi-byte
>> representation for the binary data.
>
> Yes.  That's what no-conversion does: it prevents encoding of the 
> internal buffer's contents.
>
> I suggest to use raw-text for both coding systems above, and see if that 
> helps.

I've tried that now, and no.  'raw-text has the same effect as
'no-conversion or 'binary....

> An alternative approach is to (set-buffer-multibyte nil) before reading 
> the data into it and before saving it.

This works!  Thanks!  But I'm still a bit confused as to why
setting coding-system does not help.

>> The buffer's coding system for save is no-conversion.  How did
>> that internal data end up in the file?
>
> Probably because the buffer was a multibyte buffer, in which case 
> no-conversion writes out the internal representation.  That's why I 
> suggested using raw-text to save the buffer.

Hmm, well raw-text sounds right, but it doesnt work.  Any idea why?

-- 
CYa,
  Mario

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-14  1:00 ` Kim F. Storm
  2003-01-14  6:06   ` Eli Zaretskii
@ 2003-01-14 16:19   ` Stefan Monnier
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Monnier @ 2003-01-14 16:19 UTC (permalink / raw)
  Cc: emacs-devel

> > The effect I'm having is that we receive 1372422 bytes via the process
> > filter function STRING argument, and after insertion into a buffer,
> > we have a buffer with buffer-size 1372422, but after calling (save-buffer)
> > we get this:
> > 
> > -rw-r--r--    1 root     root      1865264 Jan 13 18:35 blah28.mp3
> > 
> > I'm using:
> > 
> >       (set-process-coding-system proc 'binary 'binary)
> >       (set-buffer-file-coding-system 'no-conversion t)
> 
> It seems that every byte in the range 0xa0 .. 0xff that were in the
> original file is prefixed with an 0x81 byte in the file containing the
> received data.  To me, that looks like the internal multi-byte
> representation for the binary data.

I think the mistake is to use a multibyte buffer.
A multibyte buffer stores a sequence of characters but what you
really want here is to store a sequence of bytes, which is what
a unibyte buffer does.
If you're careful, you can get things to work using a multibyte
buffer, but why bother when a unibyte buffer is simpler and
more efficient.


	Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-14  6:46     ` Mario Lang
@ 2003-01-14 18:37       ` Eli Zaretskii
  0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2003-01-14 18:37 UTC (permalink / raw)
  Cc: emacs-devel

> From: Mario Lang <mlang@delysid.org>
> Date: Tue, 14 Jan 2003 07:46:11 +0100
> 
> > I suggest to use raw-text for both coding systems above, and see if that 
> > helps.
> 
> I've tried that now, and no.  'raw-text has the same effect as
> 'no-conversion or 'binary....
> 
> > An alternative approach is to (set-buffer-multibyte nil) before reading 
> > the data into it and before saving it.
> 
> This works!  Thanks!  But I'm still a bit confused as to why
> setting coding-system does not help.

Probably because something was not set up quite as it should have
been.  What does Emacs say about non-ASCII characters in the buffer
(_before_ you save it) if you go to those characters and type
"C-u C-x ="?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-13 17:52 [HELP] (bug?) Saving a buffer without any conversion? Mario Lang
  2003-01-14  1:00 ` Kim F. Storm
@ 2003-01-15  1:16 ` Kenichi Handa
  2003-01-15 11:02   ` Kim F. Storm
  1 sibling, 1 reply; 19+ messages in thread
From: Kenichi Handa @ 2003-01-15  1:16 UTC (permalink / raw)
  Cc: emacs-devel

In article <87fzrxszs8.fsf@lexx.delysid.org>, Mario Lang <mlang@delysid.org> writes:
> We're receiving binary content via a network process.  After the
> transfer is complete, this buffer should be saved to a file.

> The effect I'm having is that we receive 1372422 bytes via the process
> filter function STRING argument, and after insertion into a buffer,
> we have a buffer with buffer-size 1372422, but after calling (save-buffer)
> we get this:

> -rw-r--r--    1 root     root      1865264 Jan 13 18:35 blah28.mp3

> I'm using:

>       (set-process-coding-system proc 'binary 'binary)
>       (set-buffer-file-coding-system 'no-conversion t)

storm@cua.dk (Kim F. Storm) writes:
> I have looked at Mario's data before sending it to emacs and after
> emacs has written it to a file.

> It seems that every byte in the range 0xa0 .. 0xff that were in the
> original file is prefixed with an 0x81 byte in the file containing the
> received data.  To me, that looks like the internal multi-byte
> representation for the binary data.

No.  0x81 means that 0xA0..0xFF are decoded as Latin-1
chars.  That's why raw-text and no-conversion write out 0x81
as is to a file.  And that means that somehow:
	(set-process-coding-system proc 'binary 'binary)
didn't take effect.  When did you execute this function?  It
should be before accepting any data from the process
(usually just after start-process or open-network-stream).

I tried the follwoing code and the written file "temp" was
the same as "temp.png".

(defun temp-sentinel (proc str)
  (if (string= str "finished\n")
      (save-excursion
	(set-buffer (process-buffer proc))
	(write-file "~/temp"))))

(let (proc)
  (save-excursion
    (set-buffer (get-buffer-create "temp"))
    (set-buffer-file-coding-system 'binary)
    (erase-buffer))
  (setq proc (start-process "cat" "temp" "cat" "/home/handa/temp.png"))
  (set-process-sentinel proc 'temp-sentinel)
  (set-process-coding-system proc 'binary 'binary))

Eli Zaretskii <eliz@is.elta.co.il> writes:
>>  It seems that every byte in the range 0xa0 .. 0xff that were in the
>>  original file is prefixed with an 0x81 byte in the file containing the
>>  received data.  To me, that looks like the internal multi-byte
>>  representation for the binary data.

> Yes.  That's what no-conversion does: it prevents encoding of the 
> internal buffer's contents.

> I suggest to use raw-text for both coding systems above, and see if that 
> helps.

The difference of no-conversion and raw-text is only in
handling of EOL format.  He should surely use no-conversion
because raw-text will convert both CRLF and LF into LF.

> An alternative approach is to (set-buffer-multibyte nil) before reading 
> the data into it and before saving it.

Yes.  For instance, by slightly modifying the above code as below:

(let (proc)
  (save-excursion
    (set-buffer (get-buffer-create "temp"))
    (set-buffer-file-coding-system 'binary)
    (erase-buffer)
    (set-buffer-multibyte nil))
  (setq proc (start-process "cat" "temp" "cat" "/home/handa/temp.png"))
  (set-process-sentinel proc 'temp-sentinel)
  (set-process-coding-system proc 'binary 'binary))

we get the same result more efficiently.

>>  The buffer's coding system for save is no-conversion.  How did
>>  that internal data end up in the file?

> Probably because the buffer was a multibyte buffer, in which case 
> no-conversion writes out the internal representation.  That's why I 
> suggested using raw-text to save the buffer.

Please note that the internal representation for raw-bytes
(eight-bit-control and eight-bit-graphic) are never exposed
in a file even by no-conversion.  As I wrote above, 0x81 is
not a leading-byte for raw-bytes but for Latin-1.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-15 11:02   ` Kim F. Storm
@ 2003-01-15 10:59     ` Kenichi Handa
  2003-01-15 13:27       ` Kim F. Storm
                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Kenichi Handa @ 2003-01-15 10:59 UTC (permalink / raw)
  Cc: emacs-devel

In article <5xk7h67k1b.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes:
> He uses a process filter to "insert" the received strings to the
> buffer like this [approximately]:

>         (defun filter (proc string)
>           (with-current-buffer (process-buffer proc)
>             (insert string)))

Ah!  Now I see what's going on.  If the coding system for
proc is no-conversion or raw-text, STRING above is unibyte,
thus, when inserted in a multibyte buffer, it is converted
to the corresponding multibyte string.  This conversion
converts all 0xA0..0xFF to Latin-1 (in Latin-1 lang. env.).

> Here is a small, selfcontained test case.  

> If you eval the following form, wait a few seconds, the result is
>         "BUFFER=10 FILE=20"
> meaning that the temp.out buffer is 10 "bytes", but the written
> file is 20 "bytes".

> Adding the "set-buffer-multibyte" line produces the right result.

Yes.  And, instead of adding that, chaging this:

> 	      :filter (lambda (proc string)
> 			(with-current-buffer (get-buffer "temp.out")
> 			  (insert string)))

to this:

> 	      :filter (lambda (proc string)
> 			(with-current-buffer (get-buffer "temp.out")
> 			  (insert (string-as-multibyte string))))

also produces the right result.

Which is the better solution?  It depends on how the buffer
is used later.  If it is just to save the received bytes in
a file, using a unibyte buffer is better.  But, in that
case, first of all, why is the process filter necessary?


---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-15  1:16 ` Kenichi Handa
@ 2003-01-15 11:02   ` Kim F. Storm
  2003-01-15 10:59     ` Kenichi Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Kim F. Storm @ 2003-01-15 11:02 UTC (permalink / raw)
  Cc: emacs-devel


Kenichi Handa <handa@m17n.org> writes:

> In article <87fzrxszs8.fsf@lexx.delysid.org>, Mario Lang <mlang@delysid.org> writes:
> > We're receiving binary content via a network process.  After the
> > transfer is complete, this buffer should be saved to a file.
> 
> > The effect I'm having is that we receive 1372422 bytes via the process
> > filter function STRING argument, and after insertion into a buffer,
> > we have a buffer with buffer-size 1372422, but after calling (save-buffer)
> > we get this:
> 
> > -rw-r--r--    1 root     root      1865264 Jan 13 18:35 blah28.mp3
> 
> > I'm using:
> 
> >       (set-process-coding-system proc 'binary 'binary)
> >       (set-buffer-file-coding-system 'no-conversion t)
> 
> storm@cua.dk (Kim F. Storm) writes:
> > I have looked at Mario's data before sending it to emacs and after
> > emacs has written it to a file.
> 
> > It seems that every byte in the range 0xa0 .. 0xff that were in the
> > original file is prefixed with an 0x81 byte in the file containing the
> > received data.  To me, that looks like the internal multi-byte
> > representation for the binary data.
> 
> No.  0x81 means that 0xA0..0xFF are decoded as Latin-1
> chars.  That's why raw-text and no-conversion write out 0x81
> as is to a file.  And that means that somehow:
> 	(set-process-coding-system proc 'binary 'binary)
> didn't take effect.  When did you execute this function?  It
> should be before accepting any data from the process
> (usually just after start-process or open-network-stream).
> 

Mario's code is using open-network-stream, and the set-process-... things are
done immediately after creating the process (IIRC).

He uses a process filter to "insert" the received strings to the
buffer like this [approximately]:

        (defun filter (proc string)
          (with-current-buffer (process-buffer proc)
            (insert string)))


Here is a small, selfcontained test case.  

If you eval the following form, wait a few seconds, the result is
        "BUFFER=10 FILE=20"
meaning that the temp.out buffer is 10 "bytes", but the written
file is 20 "bytes".

Adding the "set-buffer-multibyte" line produces the right result.


(let (proc proc2)
  (if (get-buffer "temp.out")
      (kill-buffer "temp.out"))

  (save-excursion
    (set-buffer (get-buffer-create "temp.out"))
    (set-buffer-file-coding-system 'binary)
    (erase-buffer)
;;    (set-buffer-multibyte nil)
    )

  (setq proc (make-network-process
	      :server t
	      :name "temp"
	      :buffer "temp.out"
	      :service 2000
	      :sentinel nil
	      :filter (lambda (proc string)
			(with-current-buffer (get-buffer "temp.out")
			  (insert string)))
	      :coding 'binary))

  (setq proc2 (make-network-process
	       :name "temp2"
	       :buffer nil
	       :service 2000
	       :host 'local
	       :coding 'binary))

  (sleep-for 1)
  (process-send-string proc2 (make-string 10 255))
  (sleep-for 1)
  (delete-process proc2)

  (save-excursion
    (set-buffer (get-buffer "temp.out"))
    (let (require-final-newline)
      (delete-file "/tmp/temp.out")
      (write-file "/tmp/temp.out")))

  (sleep-for 1)
  (delete-process proc)

  (format "BUFFER=%d FILE=%d" (buffer-size (get-buffer "temp.out"))
	  (nth 7 (file-attributes "/tmp/temp.out")))
)



-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-15 10:59     ` Kenichi Handa
@ 2003-01-15 13:27       ` Kim F. Storm
  2003-01-15 16:30         ` Eli Zaretskii
                           ` (2 more replies)
  2003-01-15 16:59       ` Mario Lang
  2003-01-15 23:27       ` Richard Stallman
  2 siblings, 3 replies; 19+ messages in thread
From: Kim F. Storm @ 2003-01-15 13:27 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <5xk7h67k1b.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes:
> > He uses a process filter to "insert" the received strings to the
> > buffer like this [approximately]:
> 
> >         (defun filter (proc string)
> >           (with-current-buffer (process-buffer proc)
> >             (insert string)))
> 
> Ah!  Now I see what's going on.  If the coding system for
> proc is no-conversion or raw-text, STRING above is unibyte,
> thus, when inserted in a multibyte buffer, it is converted
> to the corresponding multibyte string.  This conversion
> converts all 0xA0..0xFF to Latin-1 (in Latin-1 lang. env.).


I see.  Now I understand it too ... 


> Yes.  And, instead of adding that, chaging this:
> ...
> to this:
> 
> > 	      :filter (lambda (proc string)
> > 			(with-current-buffer (get-buffer "temp.out")
> > 			  (insert (string-as-multibyte string))))
> 
> also produces the right result.

I think we need to write something about this somewhere.

E.g. add this to the doc string for `set-process-filter':

If the process' input coding system is no-conversion or raw-text, the
string argument to the filter function is a unibyte string; otherwise
it is a multibyte string.  Use `string-as-multibyte' on a unibyte
string before inserting it in a multibyte buffer.

Note: If the sole purpose of the filter is to insert received data
into a specific buffer, it is better NOT to define a process filter,
but instead set the process' buffer to that buffer.



> 
> Which is the better solution?  It depends on how the buffer
> is used later.  If it is just to save the received bytes in
> a file, using a unibyte buffer is better.  But, in that
> case, first of all, why is the process filter necessary?
> 

I don't know ... I didn't write the code :-)

In any case, I have just confirmed that if you DON'T use a filter
function, but rather relies on emacs itself to insert received data
into the buffer, it works nicely even with a multibyte buffer.

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-15 13:27       ` Kim F. Storm
@ 2003-01-15 16:30         ` Eli Zaretskii
  2003-01-16 22:52           ` Kim F. Storm
  2003-01-16  1:18         ` Kenichi Handa
  2003-01-17  9:23         ` Richard Stallman
  2 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2003-01-15 16:30 UTC (permalink / raw)
  Cc: handa

> From: storm@cua.dk (Kim F. Storm)
> Date: 15 Jan 2003 14:27:23 +0100
> 
> E.g. add this to the doc string for `set-process-filter':
> 
> If the process' input coding system is no-conversion or raw-text, the
> string argument to the filter function is a unibyte string; otherwise
> it is a multibyte string.  Use `string-as-multibyte' on a unibyte
> string before inserting it in a multibyte buffer.
> 
> Note: If the sole purpose of the filter is to insert received data
> into a specific buffer, it is better NOT to define a process filter,
> but instead set the process' buffer to that buffer.

The issue involved is not specific to process filters, so it should
probably be mentioned in other doc strings as well, like in that of
`insert'.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-15 10:59     ` Kenichi Handa
  2003-01-15 13:27       ` Kim F. Storm
@ 2003-01-15 16:59       ` Mario Lang
  2003-01-15 23:27       ` Richard Stallman
  2 siblings, 0 replies; 19+ messages in thread
From: Mario Lang @ 2003-01-15 16:59 UTC (permalink / raw)
  Cc: storm

Kenichi Handa <handa@m17n.org> writes:

> In article <5xk7h67k1b.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes:
>> He uses a process filter to "insert" the received strings to the
>> buffer like this [approximately]:
>
>>         (defun filter (proc string)
>>           (with-current-buffer (process-buffer proc)
>>             (insert string)))
>
> Ah!  Now I see what's going on.  If the coding system for
> proc is no-conversion or raw-text, STRING above is unibyte,
> thus, when inserted in a multibyte buffer, it is converted
> to the corresponding multibyte string.  This conversion
> converts all 0xA0..0xFF to Latin-1 (in Latin-1 lang. env.).

Thanks for this explaination.  As Kim already pointed out, I think this
should be documented somewhere more clearly.  At least to me it was very
mysterious why some kind of conversion happend, even if I specified
'no-conversion in all possible places...

>> Here is a small, selfcontained test case.  
>
>> If you eval the following form, wait a few seconds, the result is
>>         "BUFFER=10 FILE=20"
>> meaning that the temp.out buffer is 10 "bytes", but the written
>> file is 20 "bytes".
>
>> Adding the "set-buffer-multibyte" line produces the right result.
>
> Yes.  And, instead of adding that, chaging this:
>
>> 	      :filter (lambda (proc string)
>> 			(with-current-buffer (get-buffer "temp.out")
>> 			  (insert string)))
>
> to this:
>
>> 	      :filter (lambda (proc string)
>> 			(with-current-buffer (get-buffer "temp.out")
>> 			  (insert (string-as-multibyte string))))
>
> also produces the right result.
>
> Which is the better solution?  It depends on how the buffer
> is used later.  If it is just to save the received bytes in
> a file, using a unibyte buffer is better.  But, in that
> case, first of all, why is the process filter necessary?

In that specific case, the filter function is necessary because the protocol
used requires sending confirmation-packets whenever we received data...

-- 
Thanks again,
  Mario

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-15 10:59     ` Kenichi Handa
  2003-01-15 13:27       ` Kim F. Storm
  2003-01-15 16:59       ` Mario Lang
@ 2003-01-15 23:27       ` Richard Stallman
  2003-01-16  6:45         ` Kenichi Handa
  2 siblings, 1 reply; 19+ messages in thread
From: Richard Stallman @ 2003-01-15 23:27 UTC (permalink / raw)
  Cc: storm

    to this:

    > 	      :filter (lambda (proc string)
    > 			(with-current-buffer (get-buffer "temp.out")
    > 			  (insert (string-as-multibyte string))))

    also produces the right result.

This would put the characters into the buffer.
Some of them would be represented with multibyte sequences.
What coding system would work then to save the file?
What coding system can we use to write out that text
converting it back to single-byte characters?

If we don't have one, perhaps we should add one.
It could be called `unibyte'.  What do you think?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-15 13:27       ` Kim F. Storm
  2003-01-15 16:30         ` Eli Zaretskii
@ 2003-01-16  1:18         ` Kenichi Handa
  2003-01-17  9:23         ` Richard Stallman
  2 siblings, 0 replies; 19+ messages in thread
From: Kenichi Handa @ 2003-01-16  1:18 UTC (permalink / raw)
  Cc: emacs-devel

In article <5xbs2i7dc4.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes:
> I think we need to write something about this somewhere.

Sure.

> E.g. add this to the doc string for `set-process-filter':

> If the process' input coding system is no-conversion or raw-text, the
> string argument to the filter function is a unibyte string; otherwise
> it is a multibyte string.

More precisely,

If the process' input coding system is no-conversion or raw-text, the
string argument to the filter function is a unibyte string; otherwise
if default-enable-multibyte-characters is nil, the string is
a unibyte string that is converted from a multibyte string
(that is the decoding result) by string-make-unibyte.  In
the other case, the string is a multibyte string.

Could you paraphrase the above into a better English and add
it in the docstring?

> Use `string-as-multibyte' on a unibyte
> string before inserting it in a multibyte buffer.

As Eli wrote, it is better to add something like the above
in the docstring of insert.

> Note: If the sole purpose of the filter is to insert received data
> into a specific buffer, it is better NOT to define a process filter,
> but instead set the process' buffer to that buffer.

I think the above should go to Elisp info, not to the docstring.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-15 23:27       ` Richard Stallman
@ 2003-01-16  6:45         ` Kenichi Handa
  0 siblings, 0 replies; 19+ messages in thread
From: Kenichi Handa @ 2003-01-16  6:45 UTC (permalink / raw)
  Cc: storm

In article <E18Ywws-0001Gr-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     to this:
>>  	      :filter (lambda (proc string)
>>  			(with-current-buffer (get-buffer "temp.out")
>>  			  (insert (string-as-multibyte string))))

>     also produces the right result.

> This would put the characters into the buffer.
> Some of them would be represented with multibyte sequences.
> What coding system would work then to save the file?

no-conversion and raw-text.

> What coding system can we use to write out that text
> converting it back to single-byte characters?

The same.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-15 16:30         ` Eli Zaretskii
@ 2003-01-16 22:52           ` Kim F. Storm
  2003-01-17  2:35             ` Kenichi Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Kim F. Storm @ 2003-01-16 22:52 UTC (permalink / raw)
  Cc: handa

"Eli Zaretskii" <eliz@is.elta.co.il> writes:

> > From: storm@cua.dk (Kim F. Storm)
> > Date: 15 Jan 2003 14:27:23 +0100
> > 
> > E.g. add this to the doc string for `set-process-filter':
> > 
> > If the process' input coding system is no-conversion or raw-text, the
> > string argument to the filter function is a unibyte string; otherwise
> > it is a multibyte string.  Use `string-as-multibyte' on a unibyte
> > string before inserting it in a multibyte buffer.
> > 
> > Note: If the sole purpose of the filter is to insert received data
> > into a specific buffer, it is better NOT to define a process filter,
> > but instead set the process' buffer to that buffer.
> 
> The issue involved is not specific to process filters, so it should
> probably be mentioned in other doc strings as well, like in that of
> `insert'.

Well, I added something to set-process-filter, but I'm quite unsure
what to add to `insert's doc string which already says this about the
issue:

  If the current buffer is multibyte, unibyte strings are converted
  to multibyte for insertion (see `unibyte-char-to-multibyte').
  If the current buffer is unibyte, multibyte strings are converted
  to unibyte for insertion.

It seems very odd that we have to suggest to use string-as-multibyte
(or string-as-unibyte) to convert strings prior to insertion when the
doc string says it does that automatically.  I guess it has to say
something about buffer coding systems here, but what ...?

Handa-san, maybe you can tell the "true story" ?

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-16 22:52           ` Kim F. Storm
@ 2003-01-17  2:35             ` Kenichi Handa
  0 siblings, 0 replies; 19+ messages in thread
From: Kenichi Handa @ 2003-01-17  2:35 UTC (permalink / raw)
  Cc: emacs-devel

In article <5xd6mwoggt.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes:
> Well, I added something to set-process-filter,

Thank you.

> but I'm quite unsure what to add to `insert's doc string
> which already says this about the issue:

>   If the current buffer is multibyte, unibyte strings are converted
>   to multibyte for insertion (see `unibyte-char-to-multibyte').
>   If the current buffer is unibyte, multibyte strings are converted
>   to unibyte for insertion.

> It seems very odd that we have to suggest to use string-as-multibyte
> (or string-as-unibyte) to convert strings prior to insertion when the
> doc string says it does that automatically.  I guess it has to say
> something about buffer coding systems here, but what ...?

> Handa-san, maybe you can tell the "true story" ?

Coding systems is not relevant in `insert'.

There are two ways to convert unibyte string to multibyte;
string-make-multibyte and string-as-multibyte .  Emacs'
default behaviour for converting unibyte to multibyte
(including the case of `insert') is by
string-make-multibyte.  But, if one want to preserve the
original bytes, he must use string-as-multibyte.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-15 13:27       ` Kim F. Storm
  2003-01-15 16:30         ` Eli Zaretskii
  2003-01-16  1:18         ` Kenichi Handa
@ 2003-01-17  9:23         ` Richard Stallman
  2003-01-17 11:07           ` Kenichi Handa
  2 siblings, 1 reply; 19+ messages in thread
From: Richard Stallman @ 2003-01-17  9:23 UTC (permalink / raw)
  Cc: handa

    If the process' input coding system is no-conversion or raw-text, the
    string argument to the filter function is a unibyte string; otherwise
    it is a multibyte string.  Use `string-as-multibyte' on a unibyte
    string before inserting it in a multibyte buffer.

You might or might no want to use string-as-multibyte, depending
on what results you want to get.

    In any case, I have just confirmed that if you DON'T use a filter
    function, but rather relies on emacs itself to insert received data
    into the buffer, it works nicely even with a multibyte buffer.

Does that mean it does the equivalent of using string-as-multibyte?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [HELP] (bug?) Saving a buffer without any conversion?
  2003-01-17  9:23         ` Richard Stallman
@ 2003-01-17 11:07           ` Kenichi Handa
  0 siblings, 0 replies; 19+ messages in thread
From: Kenichi Handa @ 2003-01-17 11:07 UTC (permalink / raw)
  Cc: storm

In article <E18ZSiR-0007vD-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     If the process' input coding system is no-conversion or raw-text, the
>     string argument to the filter function is a unibyte string; otherwise
>     it is a multibyte string.  Use `string-as-multibyte' on a unibyte
>     string before inserting it in a multibyte buffer.

> You might or might no want to use string-as-multibyte, depending
> on what results you want to get.

Right.

>     In any case, I have just confirmed that if you DON'T use a filter
>     function, but rather relies on emacs itself to insert received data
>     into the buffer, it works nicely even with a multibyte buffer.

> Does that mean it does the equivalent of using string-as-multibyte?

Yes.   This is the relevant code (in read_process_output).

      /* Adjust the multibyteness of TEXT to that of the buffer.  */
      if (NILP (current_buffer->enable_multibyte_characters)
	  != ! STRING_MULTIBYTE (text))
	text = (STRING_MULTIBYTE (text)
		? Fstring_as_unibyte (text)
		: Fstring_as_multibyte (text));
      nbytes = SBYTES (text);
      nchars = SCHARS (text);
      /* Insert before markers in case we are inserting where
	 the buffer's mark is, and the user's next command is Meta-y.  */
      insert_from_string_before_markers (text, 0, 0, nchars, nbytes, 0);

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2003-01-17 11:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-13 17:52 [HELP] (bug?) Saving a buffer without any conversion? Mario Lang
2003-01-14  1:00 ` Kim F. Storm
2003-01-14  6:06   ` Eli Zaretskii
2003-01-14  6:46     ` Mario Lang
2003-01-14 18:37       ` Eli Zaretskii
2003-01-14 16:19   ` Stefan Monnier
2003-01-15  1:16 ` Kenichi Handa
2003-01-15 11:02   ` Kim F. Storm
2003-01-15 10:59     ` Kenichi Handa
2003-01-15 13:27       ` Kim F. Storm
2003-01-15 16:30         ` Eli Zaretskii
2003-01-16 22:52           ` Kim F. Storm
2003-01-17  2:35             ` Kenichi Handa
2003-01-16  1:18         ` Kenichi Handa
2003-01-17  9:23         ` Richard Stallman
2003-01-17 11:07           ` Kenichi Handa
2003-01-15 16:59       ` Mario Lang
2003-01-15 23:27       ` Richard Stallman
2003-01-16  6:45         ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).