all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Ted Zlatanov <tzz@lifelogs.com>
To: emacs-devel@gnu.org
Subject: Re: docs for insert-file-contents use 'bytes'
Date: Mon, 29 Sep 2008 16:04:13 -0500	[thread overview]
Message-ID: <86hc7y64vm.fsf@lifelogs.com> (raw)
In-Reply-To: ur672k8xh.fsf@gnu.org

On Mon, 29 Sep 2008 23:12:58 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

>> From: Ted Zlatanov <tzz@lifelogs.com>
>> Date: Mon, 29 Sep 2008 14:58:17 -0500
>> 
>> The docs for insert-file-contents say the range is in bytes, but that
>> function does decoding of the contents.  Can it, therefore, read from an
>> undesirable position (e.g. the middle of a UTF-8 sequence)?

EZ> The range _is_ in bytes (you will see in fileio.c that Emacs uses
EZ> `lseek' to get to the required file positions).  Yes, reading a part
EZ> of a multibyte sequence is a possibility.

>> How does Emacs handle that?

EZ> Like with any other random bytes, I think: it will produce eight-bit-*
EZ> characters in the buffer.  IOW, you get garbled text.

This is not a safe operation mode with multibyte sequences; is there a
way to DTRT?  I'm specifically thinking about a paged buffer mode where
you only see a small portion of the file (for editing large files, as we
discussed in another newsgroup a while ago).

>> Either way the docs need to state the operation mode clearly.

EZ> Assuming I don't miss anything, and the above is indeed correct, what
EZ> would you like the doc string to say, exactly?

Maybe add:

"Warning: this is not safe with variable-length multibyte encodings such
as UTF-8, because it works by byte offset without encoding awareness, so
you may get garbled data.  See ??? instead."

I don't know if this is the right wording, but it's a pretty essential
operation so it should give some warning about this common (nowadays)
case.

Ted





  reply	other threads:[~2008-09-29 21:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-29 19:58 docs for insert-file-contents use 'bytes' Ted Zlatanov
2008-09-29 20:12 ` Eli Zaretskii
2008-09-29 21:04   ` Ted Zlatanov [this message]
2008-09-30  6:06     ` Miles Bader
2008-09-30  7:19     ` Eli Zaretskii
2008-09-30 13:48       ` Ted Zlatanov
2008-09-30 15:58         ` Stefan Monnier
2008-09-30 16:29         ` Eli Zaretskii
2008-10-01  0:44         ` Kenichi Handa
2008-10-01 16:54           ` Ted Zlatanov
2008-10-02  1:33             ` Kenichi Handa
2008-10-02 13:42               ` Ted Zlatanov
2008-10-02 18:55                 ` Stefan Monnier
2008-10-03 13:55                   ` Ted Zlatanov
2008-10-02  1:58             ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86hc7y64vm.fsf@lifelogs.com \
    --to=tzz@lifelogs.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.