From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: docs for insert-file-contents use 'bytes'
Date: Thu, 02 Oct 2008 10:33:49 +0900
Message-ID: <E1KlD4f-0002xS-U9@etlken.m17n.org>
References: <86ljxa67xi.fsf@lifelogs.com>
	<ur672k8xh.fsf@gnu.org>	<86hc7y64vm.fsf@lifelogs.com>
	<uod26je2p.fsf@gnu.org>	<8663od68yb.fsf@lifelogs.com>
	<E1KkppP-0005vj-S6@etlken.m17n.org> <868wt845op.fsf@lifelogs.com>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: ger.gmane.org 1222911251 3311 80.91.229.12 (2 Oct 2008 01:34:11 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 2 Oct 2008 01:34:11 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Ted Zlatanov <tzz@lifelogs.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Oct 02 03:35:09 2008
connect(): Connection refused
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1KlD5w-00016l-PK
	for ged-emacs-devel@m.gmane.org; Thu, 02 Oct 2008 03:35:09 +0200
Original-Received: from localhost ([127.0.0.1]:46229 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KlD4t-0000g8-Ik
	for ged-emacs-devel@m.gmane.org; Wed, 01 Oct 2008 21:34:03 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KlD4p-0000g3-KY
	for emacs-devel@gnu.org; Wed, 01 Oct 2008 21:33:59 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KlD4o-0000fr-AU
	for emacs-devel@gnu.org; Wed, 01 Oct 2008 21:33:59 -0400
Original-Received: from [199.232.76.173] (port=59172 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KlD4o-0000fm-4L
	for emacs-devel@gnu.org; Wed, 01 Oct 2008 21:33:58 -0400
Original-Received: from mx1.aist.go.jp ([150.29.246.133]:42654)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <handa@m17n.org>) id 1KlD4n-0002ze-Fi
	for emacs-devel@gnu.org; Wed, 01 Oct 2008 21:33:57 -0400
Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115])
	by mx1.aist.go.jp  with ESMTP id m921XqP2024338;
	Thu, 2 Oct 2008 10:33:52 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from smtp3.aist.go.jp
	by rqsmtp1.aist.go.jp  with ESMTP id m921XqqK013711;
	Thu, 2 Oct 2008 10:33:52 +0900 (JST) env-from (handa@m17n.org)
Original-Received: by smtp3.aist.go.jp  with ESMTP id m921Xosd006895;
	Thu, 2 Oct 2008 10:33:50 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from handa by etlken.m17n.org with local (Exim 4.69)
	(envelope-from <handa@m17n.org>)
	id 1KlD4f-0002xS-U9; Thu, 02 Oct 2008 10:33:49 +0900
In-reply-to: <868wt845op.fsf@lifelogs.com> (message from Ted Zlatanov on Wed, 
	01 Oct 2008 11:54:14 -0500)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
X-detected-operating-system: by monty-python.gnu.org: Solaris 9
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:104288
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/104288>

In article <868wt845op.fsf@lifelogs.com>, Ted Zlatanov <tzz@lifelogs.com> writes:

KH> It's not that easy.  Some encoding requires to seek back an
KH> escape sequence to get the next character.  And, for UTF-16
KH> with BOM, we have to check the first 2-byte.

> OK.  Does it ever require going more than N*2 (where N = max sequence
> length for the encoding) bytes back?  Is N ever bigger than 10?  If not,
> it may be complicated code but at least it will be fairly fast.

N can be much much longer than 10.  For instance, the
following is the byte sequence of iso-2022-jp for a Japanese
sentence (ESC code is represented by "^[").

^[$BA0$N2hLL$H<!$N2hLL$H$G$O!I=<($5$l$kFbMF$K2?9T$+$N=E$J$j$,$$j$^$9!#$3^[(B
^[$B$l$O!I=<($5$l$F$$$kFbMF$,OB3$7$F$$$k$3$H$,$9$0H=$k$h$&$K$9$k$?$a$G$9!#^[(B

We must search back the sequence ^[$B or ^[(B for
iso-2022-jp.  Which pattern to search depends on the
coding-system.

> The semantics could be (given N as above):

> 1) jump to character number C: scan from beginning of file and count
> characters up to C if the encoding has a variable length.  Otherwise the
> offset is obvious.

> 2) jump to character around/at byte B: jump to B-N*2 and scan characters
> forward until you find the one that straddles or begins at B.  Also
> should have a way to report that character's actual starting byte
> position.

> 3) jump to byte: operate as now, just a fseek

> For my purposes (2) is most useful, but I can use (3) and bypass
> encodings.  (1) is not good for me, since the application is to view
> large files, but (1) is OK for small files.

As you now see from the above example, implementing (2) is
very difficult.  And, for small files, we don't need (1).
We can just read the whole file.

---
Kenichi Handa
handa@ni.aist.go.jp