all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Ted Zlatanov <tzz@lifelogs.com>
To: emacs-devel@gnu.org
Subject: Re: docs for insert-file-contents use 'bytes'
Date: Tue, 30 Sep 2008 08:48:28 -0500	[thread overview]
Message-ID: <8663od68yb.fsf@lifelogs.com> (raw)
In-Reply-To: uod26je2p.fsf@gnu.org

On Tue, 30 Sep 2008 10:19:26 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

>> From: Ted Zlatanov <tzz@lifelogs.com>
>> Date: Mon, 29 Sep 2008 16:04:13 -0500
>> 
>> This is not a safe operation mode with multibyte sequences; is there a
>> way to DTRT?  I'm specifically thinking about a paged buffer mode where
>> you only see a small portion of the file (for editing large files, as we
>> discussed in another newsgroup a while ago).

EZ> How about this idea: read a bit more than you want, then find safe
EZ> place to end this page-full?

How do I find the next safe position in the byte flow?

>> I don't know if this is the right wording, but it's a pretty essential
>> operation so it should give some warning about this common (nowadays)
>> case.

EZ> Is it really a common case that insert-file-contents is used to read a
EZ> portion of a file?  Where is this used?

I want to use it to implement a paged view of large files.  We discussed
this in emacs-help and you suggested using insert-file-contents IIRC.

Anyhow, the point is the docs don't mention this issue, let's fix that
first.  I mention one possible way to do the code below.

On Tue, 30 Sep 2008 15:06:17 +0900 Miles Bader <miles.bader@necel.com> wrote: 

MB> Ted Zlatanov <tzz@lifelogs.com> writes:
EZ> Like with any other random bytes, I think: it will produce eight-bit-*
EZ> characters in the buffer.  IOW, you get garbled text.
>> 
>> This is not a safe operation mode with multibyte sequences; is there a
>> way to DTRT?  I'm specifically thinking about a paged buffer mode where
>> you only see a small portion of the file (for editing large files, as we
>> discussed in another newsgroup a while ago).

MB> Why is it "not safe"?  

Because the text will be corrupted if you seek in the middle of a
multibyte sequence, and there's no way to know in advance if a position
is safe without at least some scanning.

MB> How would you do things differently?

I don't know, I'm just saying the docs don't mention the possibility of
corrupted text.  Can we fix that, if possible?  The docs just need to
warn, not solve the issue.

MB> In conjunction with _file_ contents, a byte offset seems certainly the
MB> most natural thing.  An "encoded character offset", for instance, would
MB> be far less efficient, much more complex to implement (and thus
MB> buggier), and harder to use in general.

Agreed.  Still, encoding schemes like UTF-8 are so popular today that
the docs should at least warn about careless seeking to a byte offset.

There could be a insert-file-decoded-contents that seeks to a byte
position and gets the next character at or after that position.  That's
not too hard to implement and it's fast.

Ted





  reply	other threads:[~2008-09-30 13:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-29 19:58 docs for insert-file-contents use 'bytes' Ted Zlatanov
2008-09-29 20:12 ` Eli Zaretskii
2008-09-29 21:04   ` Ted Zlatanov
2008-09-30  6:06     ` Miles Bader
2008-09-30  7:19     ` Eli Zaretskii
2008-09-30 13:48       ` Ted Zlatanov [this message]
2008-09-30 15:58         ` Stefan Monnier
2008-09-30 16:29         ` Eli Zaretskii
2008-10-01  0:44         ` Kenichi Handa
2008-10-01 16:54           ` Ted Zlatanov
2008-10-02  1:33             ` Kenichi Handa
2008-10-02 13:42               ` Ted Zlatanov
2008-10-02 18:55                 ` Stefan Monnier
2008-10-03 13:55                   ` Ted Zlatanov
2008-10-02  1:58             ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8663od68yb.fsf@lifelogs.com \
    --to=tzz@lifelogs.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.