From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
Newsgroups: gmane.emacs.devel
Subject: Re: docs for insert-file-contents use 'bytes'
Date: Tue, 30 Sep 2008 11:58:12 -0400
Message-ID: <jwv63odd4hr.fsf-monnier+emacs@gnu.org>
References: <86ljxa67xi.fsf@lifelogs.com> <ur672k8xh.fsf@gnu.org>
	<86hc7y64vm.fsf@lifelogs.com> <uod26je2p.fsf@gnu.org>
	<8663od68yb.fsf@lifelogs.com>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1222790325 5330 80.91.229.12 (30 Sep 2008 15:58:45 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 30 Sep 2008 15:58:45 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Ted Zlatanov <tzz@lifelogs.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Sep 30 17:59:43 2008
connect(): Connection refused
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1KkhdI-0003QV-Mi
	for ged-emacs-devel@m.gmane.org; Tue, 30 Sep 2008 17:59:28 +0200
Original-Received: from localhost ([127.0.0.1]:39792 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KkhcG-0000YE-1Z
	for ged-emacs-devel@m.gmane.org; Tue, 30 Sep 2008 11:58:24 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KkhcA-0000UG-Lb
	for emacs-devel@gnu.org; Tue, 30 Sep 2008 11:58:18 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Kkhc8-0000Pj-DD
	for emacs-devel@gnu.org; Tue, 30 Sep 2008 11:58:17 -0400
Original-Received: from [199.232.76.173] (port=45192 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Kkhc8-0000PY-A9
	for emacs-devel@gnu.org; Tue, 30 Sep 2008 11:58:16 -0400
Original-Received: from chene.dit.umontreal.ca ([132.204.246.20]:44875)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <monnier@IRO.UMontreal.CA>) id 1Kkhc7-0007l1-TH
	for emacs-devel@gnu.org; Tue, 30 Sep 2008 11:58:16 -0400
Original-Received: from alfajor.home (vpn-132-204-232-41.acd.umontreal.ca
	[132.204.232.41])
	by chene.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id m8UFwD9N006970; 
	Tue, 30 Sep 2008 11:58:13 -0400
Original-Received: by alfajor.home (Postfix, from userid 20848)
	id 35CF31C3E8; Tue, 30 Sep 2008 11:58:12 -0400 (EDT)
In-Reply-To: <8663od68yb.fsf@lifelogs.com> (Ted Zlatanov's message of "Tue, 30
	Sep 2008 08:48:28 -0500")
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux)
X-NAI-Spam-Score: 0
X-NAI-Spam-Rules: 1 Rules triggered
	RV3114=0
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:104260
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/104260>

>>> This is not a safe operation mode with multibyte sequences; is there a
>>> way to DTRT?  I'm specifically thinking about a paged buffer mode where
>>> you only see a small portion of the file (for editing large files, as we
>>> discussed in another newsgroup a while ago).

EZ> How about this idea: read a bit more than you want, then find safe
EZ> place to end this page-full?

> How do I find the next safe position in the byte flow?

It's a dificult problem for everyone.  Which is why Emacs doesn't do it
for you, basically: I don't think anyone has made serious use of that
feature yet, so nobody has gone to the trouble of coming up with
a good solution.

Maybe you can simply look at the end of the previous insertion, count
the number of eight-bit-* chars that were inserted (these correspond to
bytes that belong to the char that straddles the boundary) so as to find
the end of the last complete char you encountred.

> I want to use it to implement a paged view of large files.  We discussed
> this in emacs-help and you suggested using insert-file-contents IIRC.

This is a very good application indeed.

> Because the text will be corrupted if you seek in the middle of a
> multibyte sequence, and there's no way to know in advance if a position
> is safe without at least some scanning.

It's not exactly "corrupted" in the sense that, while it is not
displayed correctly, it should be correctly saved back so no information
is lost.  Basically, some of the bytes are decoded with the wrong
coding-system, but this coding system is supposed to be safe.

No doubt that it's not "good enough" in general.

> There could be a insert-file-decoded-contents that seeks to a byte
> position and gets the next character at or after that position.  That's
> not too hard to implement and it's fast.

It wouldn't be good enough for your application because you might then
lose the chars that straddle a boundary.


        Stefan