unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
@ 2021-09-13  6:58 Augusto Stoffel
  2021-09-13  7:10 ` Lars Ingebrigtsen
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Augusto Stoffel @ 2021-09-13  6:58 UTC (permalink / raw)
  To: 50560

I thought 'insert-file-contents-literally' literally just inserted the
file contents, as bytes, but I noticed that in the following code

    (create-image
     (with-temp-buffer
       (set-buffer-multibyte nil)
       (insert-file-contents-literally "picure.jpg")
       (buffer-substring-no-properties (point-min) (point-max)))
     nil t)

the call to 'set-buffer-multibyte' is really essential.

Is this intended?  If so, I think a note in the doctring is due.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13  6:58 bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers Augusto Stoffel
@ 2021-09-13  7:10 ` Lars Ingebrigtsen
  2021-09-13  7:16   ` Lars Ingebrigtsen
  2021-09-13  8:13   ` Augusto Stoffel
  2021-09-13  8:42 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-09-13 11:52 ` Eli Zaretskii
  2 siblings, 2 replies; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-13  7:10 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 50560

Augusto Stoffel <arstoffel@gmail.com> writes:

> I thought 'insert-file-contents-literally' literally just inserted the
> file contents, as bytes, but I noticed that in the following code
>
>     (create-image
>      (with-temp-buffer
>        (set-buffer-multibyte nil)
>        (insert-file-contents-literally "picure.jpg")
>        (buffer-substring-no-properties (point-min) (point-max)))
>      nil t)
>
> the call to 'set-buffer-multibyte' is really essential.

In what way?  If the first byte in a binary file is #xff, inserting the
file literally in a buffer and saying `(following-char)' on the first
character in the buffer will say #xff.

But, yes, when dealing with octet streams, it's a lot less confusing if
you're using unibyte buffers (and strings).

> Is this intended?  If so, I think a note in the doctring is due.

The doc string doesn't say anything about bytes, so I think that's an
interpretation on your side.

`insert-file-contents-literally' does insert "literally" -- but the byte
contents of the internal buffer structure can't be violated (emacs uses
utf-8 (plus extensions) for multibyte buffers).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13  7:10 ` Lars Ingebrigtsen
@ 2021-09-13  7:16   ` Lars Ingebrigtsen
  2021-09-13  8:13   ` Augusto Stoffel
  1 sibling, 0 replies; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-13  7:16 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 50560

Lars Ingebrigtsen <larsi@gnus.org> writes:

> In what way?  If the first byte in a binary file is #xff, inserting the
> file literally in a buffer and saying `(following-char)' on the first
> character in the buffer will say #xff.

Sorry, I meant `(get-byte (point))'.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13  7:10 ` Lars Ingebrigtsen
  2021-09-13  7:16   ` Lars Ingebrigtsen
@ 2021-09-13  8:13   ` Augusto Stoffel
  2021-09-13  8:19     ` Lars Ingebrigtsen
  1 sibling, 1 reply; 12+ messages in thread
From: Augusto Stoffel @ 2021-09-13  8:13 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 50560

On Mon, 13 Sep 2021 at 09:10, Lars Ingebrigtsen <larsi@gnus.org> wrote:

> Augusto Stoffel <arstoffel@gmail.com> writes:
>
>> I thought 'insert-file-contents-literally' literally just inserted the
>> file contents, as bytes, but I noticed that in the following code
>>
>>     (create-image
>>      (with-temp-buffer
>>        (set-buffer-multibyte nil)
>>        (insert-file-contents-literally "picure.jpg")
>>        (buffer-substring-no-properties (point-min) (point-max)))
>>      nil t)
>>
>> the call to 'set-buffer-multibyte' is really essential.
>
> In what way?  If the first byte in a binary file is #xff, inserting the
> file literally in a buffer and saying `(following-char)' on the first
> character in the buffer will say #xff.
>
> But, yes, when dealing with octet streams, it's a lot less confusing if
> you're using unibyte buffers (and strings).
>
>> Is this intended?  If so, I think a note in the doctring is due.
>
> The doc string doesn't say anything about bytes, so I think that's an
> interpretation on your side.
>
> `insert-file-contents-literally' does insert "literally" -- but the byte
> contents of the internal buffer structure can't be violated (emacs uses
> utf-8 (plus extensions) for multibyte buffers).

Ah, sure, there is no coding _conversion_, but the bytes are still
interpreted according to the buffer's coding system.

I guess that's obvious in hindsight.  Still, reading the bytes from a
file is slightly trickier than it might seem, so there could be a word
of caution somewhere.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13  8:13   ` Augusto Stoffel
@ 2021-09-13  8:19     ` Lars Ingebrigtsen
  0 siblings, 0 replies; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-13  8:19 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 50560

Augusto Stoffel <arstoffel@gmail.com> writes:

>> `insert-file-contents-literally' does insert "literally" -- but the byte
>> contents of the internal buffer structure can't be violated (emacs uses
>> utf-8 (plus extensions) for multibyte buffers).
>
> Ah, sure, there is no coding _conversion_, but the bytes are still
> interpreted according to the buffer's coding system.

No, quite the opposite -- `insert-file-contents-literally' inserts the
octets from the file in a way that makes them not be interpreted as
characters:  You end up with a buffer where each point in the buffer has
something that represents one octet.  (In reality, there's usually more
than one byte "in the background", since it takes several bytes to
represent an octet like #x90 in a multibyte buffer.)

> I guess that's obvious in hindsight.  Still, reading the bytes from a
> file is slightly trickier than it might seem, so there could be a word
> of caution somewhere.

I think this is all covered in the lispref manual.  It's a very
complicated and confusing subject, and I don't think this docstring is
the place to get into it.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13  6:58 bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers Augusto Stoffel
  2021-09-13  7:10 ` Lars Ingebrigtsen
@ 2021-09-13  8:42 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-09-13 12:05   ` Eli Zaretskii
  2021-09-13 11:52 ` Eli Zaretskii
  2 siblings, 1 reply; 12+ messages in thread
From: Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-09-13  8:42 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 50560

Augusto Stoffel <arstoffel@gmail.com> writes:

> I thought 'insert-file-contents-literally' literally just inserted the
> file contents, as bytes, but I noticed that in the following code
>
>     (create-image
>      (with-temp-buffer
>        (set-buffer-multibyte nil)
>        (insert-file-contents-literally "picure.jpg")
>        (buffer-substring-no-properties (point-min) (point-max)))
>      nil t)
>
> the call to 'set-buffer-multibyte' is really essential.
>
> Is this intended?  If so, I think a note in the doctring is due.

It is intended, and the source of confusion may be the apparently
symmetric `find-file-literally`, which _does_ make the buffer unibyte
before filling the new buffer with the contents from a file (and
documents this behavior).

But if you think about it, it makes sense that
`insert-file-contents-literally` does not set the buffer as unibyte,
because it's intended for programmatic cases where you insert the
content inside a buffer that may already have other content, so making
the buffer unibyte unconditionally may cause unexpected results.

So yeah, perhaps we can add a small sentence that clarifies this
behavior.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13  6:58 bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers Augusto Stoffel
  2021-09-13  7:10 ` Lars Ingebrigtsen
  2021-09-13  8:42 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-09-13 11:52 ` Eli Zaretskii
  2021-09-13 12:44   ` Augusto Stoffel
  2 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2021-09-13 11:52 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 50560

> From: Augusto Stoffel <arstoffel@gmail.com>
> Date: Mon, 13 Sep 2021 08:58:06 +0200
> 
> I thought 'insert-file-contents-literally' literally just inserted the
> file contents, as bytes, but I noticed that in the following code
> 
>     (create-image
>      (with-temp-buffer
>        (set-buffer-multibyte nil)
>        (insert-file-contents-literally "picure.jpg")
>        (buffer-substring-no-properties (point-min) (point-max)))
>      nil t)
> 
> the call to 'set-buffer-multibyte' is really essential.

It is only essential for some very specific uses of the resulting
buffer, but definitely not for all.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13  8:42 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-09-13 12:05   ` Eli Zaretskii
  2021-09-13 13:18     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2021-09-13 12:05 UTC (permalink / raw)
  To: Daniel Martín; +Cc: arstoffel, 50560

> Cc: 50560@debbugs.gnu.org
> Date: Mon, 13 Sep 2021 10:42:51 +0200
> From:  Daniel Martín via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> So yeah, perhaps we can add a small sentence that clarifies this
> behavior.

What kind of sentence would you like to add there?  IME, this stuff
can rarely be explained by small sentences ;-)





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13 11:52 ` Eli Zaretskii
@ 2021-09-13 12:44   ` Augusto Stoffel
  2021-09-13 13:26     ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Augusto Stoffel @ 2021-09-13 12:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50560

On Mon, 13 Sep 2021 at 14:52, Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Augusto Stoffel <arstoffel@gmail.com>
>> Date: Mon, 13 Sep 2021 08:58:06 +0200
>> 
>> I thought 'insert-file-contents-literally' literally just inserted the
>> file contents, as bytes, but I noticed that in the following code
>> 
>>     (create-image
>>      (with-temp-buffer
>>        (set-buffer-multibyte nil)
>>        (insert-file-contents-literally "picure.jpg")
>>        (buffer-substring-no-properties (point-min) (point-max)))
>>      nil t)
>> 
>> the call to 'set-buffer-multibyte' is really essential.
>
> It is only essential for some very specific uses of the resulting
> buffer, but definitely not for all.

That's a good point.  Maybe the issue is actually with 'create-image',
which seems to only work correctly when the data is passed as a unibyte
string, but gives no warning if you pass a multibyte one.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13 12:05   ` Eli Zaretskii
@ 2021-09-13 13:18     ` Lars Ingebrigtsen
  2021-09-13 21:37       ` Augusto Stoffel
  0 siblings, 1 reply; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-13 13:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: arstoffel, 50560, Daniel Martín

Eli Zaretskii <eliz@gnu.org> writes:

>> So yeah, perhaps we can add a small sentence that clarifies this
>> behavior.
>
> What kind of sentence would you like to add there?  IME, this stuff
> can rarely be explained by small sentences ;-)

I've added a paragraph to the doc string mentioning that there might be
issues, but referring the user to `(elisp)Character Codes'.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13 12:44   ` Augusto Stoffel
@ 2021-09-13 13:26     ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2021-09-13 13:26 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 50560

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: 50560@debbugs.gnu.org
> Date: Mon, 13 Sep 2021 14:44:24 +0200
> 
> > It is only essential for some very specific uses of the resulting
> > buffer, but definitely not for all.
> 
> That's a good point.  Maybe the issue is actually with 'create-image',
> which seems to only work correctly when the data is passed as a unibyte
> string, but gives no warning if you pass a multibyte one.

Maybe we should have create-image convert the :data string to unibyte
if it isn't already so.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers
  2021-09-13 13:18     ` Lars Ingebrigtsen
@ 2021-09-13 21:37       ` Augusto Stoffel
  0 siblings, 0 replies; 12+ messages in thread
From: Augusto Stoffel @ 2021-09-13 21:37 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 50560, Daniel Martín

On Mon, 13 Sep 2021 at 15:18, Lars Ingebrigtsen <larsi@gnus.org> wrote:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> So yeah, perhaps we can add a small sentence that clarifies this
>>> behavior.
>>
>> What kind of sentence would you like to add there?  IME, this stuff
>> can rarely be explained by small sentences ;-)
>
> I've added a paragraph to the doc string mentioning that there might be
> issues, but referring the user to `(elisp)Character Codes'.

Thanks, I think it's a good clarification.





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-09-13 21:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-13  6:58 bug#50560: 28.0.50; 'insert-file-contents-literally' on multibyte buffers Augusto Stoffel
2021-09-13  7:10 ` Lars Ingebrigtsen
2021-09-13  7:16   ` Lars Ingebrigtsen
2021-09-13  8:13   ` Augusto Stoffel
2021-09-13  8:19     ` Lars Ingebrigtsen
2021-09-13  8:42 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-09-13 12:05   ` Eli Zaretskii
2021-09-13 13:18     ` Lars Ingebrigtsen
2021-09-13 21:37       ` Augusto Stoffel
2021-09-13 11:52 ` Eli Zaretskii
2021-09-13 12:44   ` Augusto Stoffel
2021-09-13 13:26     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).