From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: docs for insert-file-contents use 'bytes' Date: Thu, 02 Oct 2008 10:33:49 +0900 Message-ID: References: <86ljxa67xi.fsf@lifelogs.com> <86hc7y64vm.fsf@lifelogs.com> <8663od68yb.fsf@lifelogs.com> <868wt845op.fsf@lifelogs.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: ger.gmane.org 1222911251 3311 80.91.229.12 (2 Oct 2008 01:34:11 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 2 Oct 2008 01:34:11 +0000 (UTC) Cc: emacs-devel@gnu.org To: Ted Zlatanov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Oct 02 03:35:09 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KlD5w-00016l-PK for ged-emacs-devel@m.gmane.org; Thu, 02 Oct 2008 03:35:09 +0200 Original-Received: from localhost ([127.0.0.1]:46229 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KlD4t-0000g8-Ik for ged-emacs-devel@m.gmane.org; Wed, 01 Oct 2008 21:34:03 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KlD4p-0000g3-KY for emacs-devel@gnu.org; Wed, 01 Oct 2008 21:33:59 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KlD4o-0000fr-AU for emacs-devel@gnu.org; Wed, 01 Oct 2008 21:33:59 -0400 Original-Received: from [199.232.76.173] (port=59172 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KlD4o-0000fm-4L for emacs-devel@gnu.org; Wed, 01 Oct 2008 21:33:58 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:42654) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KlD4n-0002ze-Fi for emacs-devel@gnu.org; Wed, 01 Oct 2008 21:33:57 -0400 Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id m921XqP2024338; Thu, 2 Oct 2008 10:33:52 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp3.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id m921XqqK013711; Thu, 2 Oct 2008 10:33:52 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp3.aist.go.jp with ESMTP id m921Xosd006895; Thu, 2 Oct 2008 10:33:50 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken.m17n.org with local (Exim 4.69) (envelope-from ) id 1KlD4f-0002xS-U9; Thu, 02 Oct 2008 10:33:49 +0900 In-reply-to: <868wt845op.fsf@lifelogs.com> (message from Ted Zlatanov on Wed, 01 Oct 2008 11:54:14 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) X-detected-operating-system: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:104288 Archived-At: In article <868wt845op.fsf@lifelogs.com>, Ted Zlatanov writes: KH> It's not that easy. Some encoding requires to seek back an KH> escape sequence to get the next character. And, for UTF-16 KH> with BOM, we have to check the first 2-byte. > OK. Does it ever require going more than N*2 (where N = max sequence > length for the encoding) bytes back? Is N ever bigger than 10? If not, > it may be complicated code but at least it will be fairly fast. N can be much much longer than 10. For instance, the following is the byte sequence of iso-2022-jp for a Japanese sentence (ESC code is represented by "^["). ^[$BA0$N2hLL$H The semantics could be (given N as above): > 1) jump to character number C: scan from beginning of file and count > characters up to C if the encoding has a variable length. Otherwise the > offset is obvious. > 2) jump to character around/at byte B: jump to B-N*2 and scan characters > forward until you find the one that straddles or begins at B. Also > should have a way to report that character's actual starting byte > position. > 3) jump to byte: operate as now, just a fseek > For my purposes (2) is most useful, but I can use (3) and bypass > encodings. (1) is not good for me, since the application is to view > large files, but (1) is OK for small files. As you now see from the above example, implementing (2) is very difficult. And, for small files, we don't need (1). We can just read the whole file. --- Kenichi Handa handa@ni.aist.go.jp