unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Efficiently checking the initial contents of a file
@ 2008-05-16 10:16 Nordlöw
  2008-05-16 11:08 ` Juanma Barranquero
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Nordlöw @ 2008-05-16 10:16 UTC (permalink / raw)
  To: help-gnu-emacs

How can I efficiently using pure emacs-lisp (without calling any
external process) investigate the first bytes of a file?

My guess is
- Open parts of the file into a buffer or string.
- Alt 1. Switch to the buffer and do things.
  Alt 2. Or do stuff directly the string?

Which operations should be performed via a buffer and which should
operate on a string?

/Nordlöw


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Efficiently checking the initial contents of a file
  2008-05-16 10:16 Efficiently checking the initial contents of a file Nordlöw
@ 2008-05-16 11:08 ` Juanma Barranquero
  2008-05-16 11:12 ` David Hansen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Juanma Barranquero @ 2008-05-16 11:08 UTC (permalink / raw)
  To: Nordlöw; +Cc: help-gnu-emacs

On Fri, May 16, 2008 at 12:16 PM, Nordlöw <per.nordlow@gmail.com> wrote:
> How can I efficiently using pure emacs-lisp (without calling any
> external process) investigate the first bytes of a file?
>
> My guess is
> - Open parts of the file into a buffer or string.
> - Alt 1. Switch to the buffer and do things.
>  Alt 2. Or do stuff directly the string?

  (with-temp-buffer
     (insert-file-contents "my-file" nil BEG END)
     ;; etc
     )

should be pretty fast, and more so with wisely chosen BEG / END
values. For additional speed you can use
insert-file-contents-literally, if you don't need code conversions,
decompression, etc.

> Which operations should be performed via a buffer and which should
> operate on a string?

Once the required part of the file is in the buffer, manipulating it
directly or using buffer-(sub)string depends on what do you want to do
with the text, but I wouldn't convert it to a string unless necessary,
i.e., I would use looking-at and re-search-forward rather than
buffer-string + string-match, for example.

 Juanma

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Efficiently checking the initial contents of a file
  2008-05-16 10:16 Efficiently checking the initial contents of a file Nordlöw
  2008-05-16 11:08 ` Juanma Barranquero
@ 2008-05-16 11:12 ` David Hansen
  2008-05-16 11:15 ` Eli Zaretskii
       [not found] ` <mailman.11680.1210937109.18990.help-gnu-emacs@gnu.org>
  3 siblings, 0 replies; 8+ messages in thread
From: David Hansen @ 2008-05-16 11:12 UTC (permalink / raw)
  To: help-gnu-emacs

On Fri, 16 May 2008 03:16:13 -0700 (PDT) Nordlöw wrote:

> How can I efficiently using pure emacs-lisp (without calling any
> external process) investigate the first bytes of a file?
>
> My guess is
> - Open parts of the file into a buffer or string.
> - Alt 1. Switch to the buffer and do things.

(with-temp-buffer
  ;; Read the first 42 characters (not bytes) into the temp buffer.
  (insert-file-contents filename nil 0 42)
  ;; Do whatever you want to do here.
  )

David





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Efficiently checking the initial contents of a file
  2008-05-16 10:16 Efficiently checking the initial contents of a file Nordlöw
  2008-05-16 11:08 ` Juanma Barranquero
  2008-05-16 11:12 ` David Hansen
@ 2008-05-16 11:15 ` Eli Zaretskii
       [not found] ` <mailman.11680.1210937109.18990.help-gnu-emacs@gnu.org>
  3 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2008-05-16 11:15 UTC (permalink / raw)
  To: help-gnu-emacs

> From: =?ISO-8859-1?Q?Nordl=F6w?= <per.nordlow@gmail.com>
> Date: Fri, 16 May 2008 03:16:13 -0700 (PDT)
> 
> Which operations should be performed via a buffer and which should
> operate on a string?

In Emacs, it's generally better (faster and more convenient) to work
with a buffer than with a string.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Efficiently checking the initial contents of a file
       [not found] ` <mailman.11680.1210937109.18990.help-gnu-emacs@gnu.org>
@ 2008-05-16 12:52   ` Nordlöw
  2008-05-16 14:03     ` Juanma Barranquero
       [not found]     ` <mailman.11696.1210946594.18990.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 8+ messages in thread
From: Nordlöw @ 2008-05-16 12:52 UTC (permalink / raw)
  To: help-gnu-emacs

On 16 Maj, 13:12, David Hansen <david.han...@gmx.net> wrote:
> On Fri, 16 May 2008 03:16:13 -0700 (PDT) Nordlöw wrote:
>
> > How can I efficiently using pure emacs-lisp (without calling any
> > external process) investigate the first bytes of a file?
>
> > My guess is
> > - Open parts of the file into a buffer or string.
> > - Alt 1. Switch to the buffer and do things.
>
> (with-temp-buffer
>   ;; Read the first 42 characters (not bytes) into the temp buffer.
>   (insert-file-contents filename nil 0 42)
>   ;; Do whatever you want to do here.
>   )
>
> David

Great!

Thanks for all the help!

I wanted a quick way of discarding ELFs from my tags-query-replace()
operations.

This is the result of my coding.
Is it ok to use string-width() and looking-at() if don't care about
different string encodings, that is I just want to compare binary byte-
arrays?

;; For additional speed you can use
;; `insert-file-contents-literally', if you don't need code
;; conversions, decompression, etc.
(defun file-begin-p (filename beg)
  "Determine if FILENAME begins with BEG."
  (interactive "fFile to investigate: ")
  (if (and (file-exists-p filename)
	   (file-readable-p filename))
      (with-temp-buffer
        (let ((width (string-width beg)))
          (insert-file-contents-literally filename nil 0 width)
          (looking-at beg)
          ))))
;; TEST: (file-begin-p "/bin/ls" "\x7fELF")

(defun file-begin-ELF-p (filename)
  "Return non-nil if FILENAME is an ELF (Executable and Linkable
Format)"
  (interactive "fFile to investigate: ")
  (file-begin-p filename "\x7fELF")
  )
;; TEST: (file-begin-ELF-p "/bin/ls")


Thanks again,
Nordlöw


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Efficiently checking the initial contents of a file
  2008-05-16 12:52   ` Nordlöw
@ 2008-05-16 14:03     ` Juanma Barranquero
       [not found]     ` <mailman.11696.1210946594.18990.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 8+ messages in thread
From: Juanma Barranquero @ 2008-05-16 14:03 UTC (permalink / raw)
  To: Nordlöw; +Cc: help-gnu-emacs

On Fri, May 16, 2008 at 2:52 PM, Nordlöw <per.nordlow@gmail.com> wrote:

> (defun file-begin-p (filename beg)
>  "Determine if FILENAME begins with BEG."
>  (interactive "fFile to investigate: ")
>  (if (and (file-exists-p filename)
>           (file-readable-p filename))
>      (with-temp-buffer
>        (let ((width (string-width beg)))
>          (insert-file-contents-literally filename nil 0 width)
>          (looking-at beg)
>          ))))

A few additional comments:

 - BEG can be a regular expression, in which case the length of it can
be a red herring; for example (file-begin-p "[ABC]\\{20\\}") will
always return nil. Perhaps you could do

   (defun file-begin-p (filename beg &optional len)
      ...
     (let ((width (or len (string-width beg))))
       ...

so you can pass a length if needed.

 - If you don't want to pass a regexp, it is advisable to remember
using regexp-quote, otherwise (file-begin-p "A*") is always going to
return t.

 - Mixing `insert-file-contents-literally' and `string-width' does not
seem like a good idea. Better use `string-bytes', or, if BEG can
contain non-ASCII chars, use `insert-file-contents' and `length'. I'd
recommend that second route.

Hope this helps,

   Juanma

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Efficiently checking the initial contents of a file
       [not found]     ` <mailman.11696.1210946594.18990.help-gnu-emacs@gnu.org>
@ 2008-05-19  7:20       ` Nordlöw
  2008-05-22 16:58         ` Thien-Thi Nguyen
  0 siblings, 1 reply; 8+ messages in thread
From: Nordlöw @ 2008-05-19  7:20 UTC (permalink / raw)
  To: help-gnu-emacs

On 16 Maj, 16:03, "Juanma Barranquero" <lek...@gmail.com> wrote:
> On Fri, May 16, 2008 at 2:52 PM, Nordlöw <per.nord...@gmail.com> wrote:
> > (defun file-begin-p (filename beg)
> >  "Determine if FILENAME begins with BEG."
> >  (interactive "fFile to investigate: ")
> >  (if (and (file-exists-p filename)
> >           (file-readable-p filename))
> >      (with-temp-buffer
> >        (let ((width (string-width beg)))
> >          (insert-file-contents-literally filename nil 0 width)
> >          (looking-at beg)
> >          ))))
>
> A few additional comments:
>
>  - BEG can be a regular expression, in which case the length of it can
> be a red herring; for example (file-begin-p "[ABC]\\{20\\}") will
> always return nil. Perhaps you could do
>
>    (defun file-begin-p (filename beg &optional len)
>       ...
>      (let ((width (or len (string-width beg))))
>        ...
>
> so you can pass a length if needed.
>
>  - If you don't want to pass a regexp, it is advisable to remember
> using regexp-quote, otherwise (file-begin-p "A*") is always going to
> return t.
>
>  - Mixing `insert-file-contents-literally' and `string-width' does not
> seem like a good idea. Better use `string-bytes', or, if BEG can
> contain non-ASCII chars, use `insert-file-contents' and `length'. I'd
> recommend that second route.
>
> Hope this helps,
>
>    Juanma

Hey again!

Is I see it the most general and efficient solution to this problem
would be to

Make the looking-at() logic stream based as we want to prevent the
logic from requiring the whole buffer to be read from file into memory
regardless of the length of BEG. Is there some way of opening a file
into a buffer without actually reading the whole contents of the file
into memory before it is actually used by, in our case, looking-at() ?

A less optimal solution could make use of a function say regexp-max-
match-length(REGEXP) the determines the longest possible pattern a
regexp can match, possibly infinity. The return value from this
function could then be used as length-argument to insert-file-contents-
literally().

By the way I am surprised that my sought-of-function does not already
exist in GNU Emacs. Can it be because it is difficult to design a
solution that satisfies *all* of the needs given above.

/Nordlöw


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Efficiently checking the initial contents of a file
  2008-05-19  7:20       ` Nordlöw
@ 2008-05-22 16:58         ` Thien-Thi Nguyen
  0 siblings, 0 replies; 8+ messages in thread
From: Thien-Thi Nguyen @ 2008-05-22 16:58 UTC (permalink / raw)
  To: Nordlöw; +Cc: help-gnu-emacs

() Nordlöw <per.nordlow@gmail.com>
() Mon, 19 May 2008 00:20:26 -0700 (PDT)

   Is I see it the most general and efficient solution
   to this problem would be to

Hmm, i tend to find "most general" and "most efficient"
to be mutually exclusive.  Perhaps we have different
ideas of what is general and what is efficient.

Here is a relatively efficient and highly parsimonious
way, made by squeezing the ideas previously presented
by others into a lower-bound wrap-check:

(defun elf-p (filename)
  (when (< 4 (nth 7 (file-attributes filename)))
    (with-temp-buffer
      (insert-file-contents filename nil 0 4)
      (string= "\x7fELF" (buffer-string)))))

For a "most general" way, i'm in the process of
implementing a file(1)-workalike in Emacs Lisp.  For a
peek at its design, you can see the sexp-based ruleset
(and a Scheme prototype linked therefrom) at:

http://www.gnuvola.org/data/  (de-uglified magic file)

Lastly, if the original problem uses `elf-p' as a
filter, unless you labor under a pitifully degrading
(eg., usloth) environment, you may find the "best way"
is to use `file-executable-p' instead.

thi




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-05-22 16:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-16 10:16 Efficiently checking the initial contents of a file Nordlöw
2008-05-16 11:08 ` Juanma Barranquero
2008-05-16 11:12 ` David Hansen
2008-05-16 11:15 ` Eli Zaretskii
     [not found] ` <mailman.11680.1210937109.18990.help-gnu-emacs@gnu.org>
2008-05-16 12:52   ` Nordlöw
2008-05-16 14:03     ` Juanma Barranquero
     [not found]     ` <mailman.11696.1210946594.18990.help-gnu-emacs@gnu.org>
2008-05-19  7:20       ` Nordlöw
2008-05-22 16:58         ` Thien-Thi Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).