From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Klaus-Dieter Bauer Newsgroups: gmane.emacs.help Subject: Re: Handling large files with emacs lisp? Date: Wed, 5 Jun 2013 12:47:10 +0200 Message-ID: References: <87d2s26ihw.fsf@gmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1370429287 16106 80.91.229.3 (5 Jun 2013 10:48:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 5 Jun 2013 10:48:07 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: Jambunathan K Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Jun 05 12:48:08 2013 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UkBGC-00009H-1E for geh-help-gnu-emacs@m.gmane.org; Wed, 05 Jun 2013 12:48:08 +0200 Original-Received: from localhost ([::1]:52799 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UkBGB-0003Xq-HZ for geh-help-gnu-emacs@m.gmane.org; Wed, 05 Jun 2013 06:48:07 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59271) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UkBFt-0003WY-IL for help-gnu-emacs@gnu.org; Wed, 05 Jun 2013 06:47:57 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UkBFl-0005Qb-1E for help-gnu-emacs@gnu.org; Wed, 05 Jun 2013 06:47:49 -0400 Original-Received: from mail-vb0-x22f.google.com ([2607:f8b0:400c:c02::22f]:62911) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UkBFk-0005QW-Pl for help-gnu-emacs@gnu.org; Wed, 05 Jun 2013 06:47:40 -0400 Original-Received: by mail-vb0-f47.google.com with SMTP id x14so972940vbb.34 for ; Wed, 05 Jun 2013 03:47:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=wtAgLuiuv37OMPHwDQO+o15Q7jN4YVW0hFGlQIdETSg=; b=fzDCAtyySSriBY7U+Qo9PoNjOB1EbET6vuVEXnzU/m0posibl1MNmkOq/XevwW4blJ sYZjfuFQAWGjxNQQGv/U9OP1UUMt6S0h5eVPy6+kD2MEq0tr9ZgaHiK/4+WaNM8UgCuP 0TMpepTVc0yddFD1tSkaDm3MoAVMj/P/H7Z7gS2IRQYdS5I+G4ypqQDq7gy2EK0Xcwax zNGQLtb62O7QtLwMOCPAf4BgEEkSesE5StgmSIwGOjMaDJOZy0IOkVbhqgf/fHZjwHQw 0lNK1cD0AIfWmb0+0bZsz/8xIXjWTfYDIRF4OhZ7JBDfPiFvZFXMLSg/jo06XhaxQO+D 0p/A== X-Received: by 10.58.90.5 with SMTP id bs5mr20067197veb.60.1370429260245; Wed, 05 Jun 2013 03:47:40 -0700 (PDT) Original-Received: by 10.220.33.200 with HTTP; Wed, 5 Jun 2013 03:47:10 -0700 (PDT) In-Reply-To: <87d2s26ihw.fsf@gmail.com> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:400c:c02::22f X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:91318 Archived-At: Oddly, when I tried today again, I saw constant time file access with ~ 80MB/s across the 183MB installer of Libreoffice and 240-290 MB/s on a repetitive text file. Most likely explanation: A bug in my test function (e.g. accidentially inserted text length not being constant). A bit embarassing here ^^' On the other hand this shows me that Emacs Lisp is indeed usable for general purpose processing. kind regards, Klaus 2013/6/4 Jambunathan K > > May be you can steal some stuff from here. > > http://elpa.gnu.org/packages/vlf.html > > It is a GNU ELPA package that you can install with M-x list-packages > RET. > > > > Klaus-Dieter Bauer writes: > > > Hello! > > > > Is there a method in emacs lisp to handle large files (hundreds of MB) > > efficiently? I am looking specifically for a function that allows > > processing file contents either sequentially or (better) with random > > access. > > > > Looking through the code of `find-file' I found that > > `insert-file-contents' and `insert-file-contents-literally' seem to be > > pretty much the most low-level functions available to emacs-lisp. When > > files go towards GB size however, inserting file contents is > > undesirable even assuming 32bit emacs were able to handle such large > > buffers. > > > > Using the BEG and END parameters of `insert-file-contents' however has > > a linear time-dependence on BEG. So implementing buffered file > > processing for large files by keeping only parts of the file in a > > temporary buffer doesn't seem feasible either. > > > > I'd also be interested why there is this linear time dependence. Is > > this a limitation of how fseek works or of how `insert-file-contents' > > is implemented? I've read[1] that fseek "just updates pointers", so > > random reads in a large file, especially on an SSD, should be > > constant-time, but I couldn't find further verification. > > > > kind regards, Klaus > > > > PS: I'm well aware that I'm asking for something, that likely wasn't > > within the design goals of emacs lisp. It is interesting to push > > the limits though ;) > > > > ------------------------------------------------------------ > > > > [1] > https://groups.google.com/d/msg/comp.unix.aix/AXInTbcjsKo/qt-XnL12upgJ >