From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: docs for insert-file-contents use 'bytes' Date: Tue, 30 Sep 2008 11:58:12 -0400 Message-ID: References: <86ljxa67xi.fsf@lifelogs.com> <86hc7y64vm.fsf@lifelogs.com> <8663od68yb.fsf@lifelogs.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1222790325 5330 80.91.229.12 (30 Sep 2008 15:58:45 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 30 Sep 2008 15:58:45 +0000 (UTC) Cc: emacs-devel@gnu.org To: Ted Zlatanov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Sep 30 17:59:43 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KkhdI-0003QV-Mi for ged-emacs-devel@m.gmane.org; Tue, 30 Sep 2008 17:59:28 +0200 Original-Received: from localhost ([127.0.0.1]:39792 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KkhcG-0000YE-1Z for ged-emacs-devel@m.gmane.org; Tue, 30 Sep 2008 11:58:24 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KkhcA-0000UG-Lb for emacs-devel@gnu.org; Tue, 30 Sep 2008 11:58:18 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Kkhc8-0000Pj-DD for emacs-devel@gnu.org; Tue, 30 Sep 2008 11:58:17 -0400 Original-Received: from [199.232.76.173] (port=45192 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kkhc8-0000PY-A9 for emacs-devel@gnu.org; Tue, 30 Sep 2008 11:58:16 -0400 Original-Received: from chene.dit.umontreal.ca ([132.204.246.20]:44875) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Kkhc7-0007l1-TH for emacs-devel@gnu.org; Tue, 30 Sep 2008 11:58:16 -0400 Original-Received: from alfajor.home (vpn-132-204-232-41.acd.umontreal.ca [132.204.232.41]) by chene.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id m8UFwD9N006970; Tue, 30 Sep 2008 11:58:13 -0400 Original-Received: by alfajor.home (Postfix, from userid 20848) id 35CF31C3E8; Tue, 30 Sep 2008 11:58:12 -0400 (EDT) In-Reply-To: <8663od68yb.fsf@lifelogs.com> (Ted Zlatanov's message of "Tue, 30 Sep 2008 08:48:28 -0500") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV3114=0 X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:104260 Archived-At: >>> This is not a safe operation mode with multibyte sequences; is there a >>> way to DTRT? I'm specifically thinking about a paged buffer mode where >>> you only see a small portion of the file (for editing large files, as we >>> discussed in another newsgroup a while ago). EZ> How about this idea: read a bit more than you want, then find safe EZ> place to end this page-full? > How do I find the next safe position in the byte flow? It's a dificult problem for everyone. Which is why Emacs doesn't do it for you, basically: I don't think anyone has made serious use of that feature yet, so nobody has gone to the trouble of coming up with a good solution. Maybe you can simply look at the end of the previous insertion, count the number of eight-bit-* chars that were inserted (these correspond to bytes that belong to the char that straddles the boundary) so as to find the end of the last complete char you encountred. > I want to use it to implement a paged view of large files. We discussed > this in emacs-help and you suggested using insert-file-contents IIRC. This is a very good application indeed. > Because the text will be corrupted if you seek in the middle of a > multibyte sequence, and there's no way to know in advance if a position > is safe without at least some scanning. It's not exactly "corrupted" in the sense that, while it is not displayed correctly, it should be correctly saved back so no information is lost. Basically, some of the bytes are decoded with the wrong coding-system, but this coding system is supposed to be safe. No doubt that it's not "good enough" in general. > There could be a insert-file-decoded-contents that seeks to a byte > position and gets the next character at or after that position. That's > not too hard to implement and it's fast. It wouldn't be good enough for your application because you might then lose the chars that straddle a boundary. Stefan