* Efficiently checking the initial contents of a file @ 2008-05-16 10:16 Nordlöw 2008-05-16 11:08 ` Juanma Barranquero ` (3 more replies) 0 siblings, 4 replies; 8+ messages in thread From: Nordlöw @ 2008-05-16 10:16 UTC (permalink / raw) To: help-gnu-emacs How can I efficiently using pure emacs-lisp (without calling any external process) investigate the first bytes of a file? My guess is - Open parts of the file into a buffer or string. - Alt 1. Switch to the buffer and do things. Alt 2. Or do stuff directly the string? Which operations should be performed via a buffer and which should operate on a string? /Nordlöw ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Efficiently checking the initial contents of a file 2008-05-16 10:16 Efficiently checking the initial contents of a file Nordlöw @ 2008-05-16 11:08 ` Juanma Barranquero 2008-05-16 11:12 ` David Hansen ` (2 subsequent siblings) 3 siblings, 0 replies; 8+ messages in thread From: Juanma Barranquero @ 2008-05-16 11:08 UTC (permalink / raw) To: Nordlöw; +Cc: help-gnu-emacs On Fri, May 16, 2008 at 12:16 PM, Nordlöw <per.nordlow@gmail.com> wrote: > How can I efficiently using pure emacs-lisp (without calling any > external process) investigate the first bytes of a file? > > My guess is > - Open parts of the file into a buffer or string. > - Alt 1. Switch to the buffer and do things. > Alt 2. Or do stuff directly the string? (with-temp-buffer (insert-file-contents "my-file" nil BEG END) ;; etc ) should be pretty fast, and more so with wisely chosen BEG / END values. For additional speed you can use insert-file-contents-literally, if you don't need code conversions, decompression, etc. > Which operations should be performed via a buffer and which should > operate on a string? Once the required part of the file is in the buffer, manipulating it directly or using buffer-(sub)string depends on what do you want to do with the text, but I wouldn't convert it to a string unless necessary, i.e., I would use looking-at and re-search-forward rather than buffer-string + string-match, for example. Juanma ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Efficiently checking the initial contents of a file 2008-05-16 10:16 Efficiently checking the initial contents of a file Nordlöw 2008-05-16 11:08 ` Juanma Barranquero @ 2008-05-16 11:12 ` David Hansen 2008-05-16 11:15 ` Eli Zaretskii [not found] ` <mailman.11680.1210937109.18990.help-gnu-emacs@gnu.org> 3 siblings, 0 replies; 8+ messages in thread From: David Hansen @ 2008-05-16 11:12 UTC (permalink / raw) To: help-gnu-emacs On Fri, 16 May 2008 03:16:13 -0700 (PDT) Nordlöw wrote: > How can I efficiently using pure emacs-lisp (without calling any > external process) investigate the first bytes of a file? > > My guess is > - Open parts of the file into a buffer or string. > - Alt 1. Switch to the buffer and do things. (with-temp-buffer ;; Read the first 42 characters (not bytes) into the temp buffer. (insert-file-contents filename nil 0 42) ;; Do whatever you want to do here. ) David ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Efficiently checking the initial contents of a file 2008-05-16 10:16 Efficiently checking the initial contents of a file Nordlöw 2008-05-16 11:08 ` Juanma Barranquero 2008-05-16 11:12 ` David Hansen @ 2008-05-16 11:15 ` Eli Zaretskii [not found] ` <mailman.11680.1210937109.18990.help-gnu-emacs@gnu.org> 3 siblings, 0 replies; 8+ messages in thread From: Eli Zaretskii @ 2008-05-16 11:15 UTC (permalink / raw) To: help-gnu-emacs > From: =?ISO-8859-1?Q?Nordl=F6w?= <per.nordlow@gmail.com> > Date: Fri, 16 May 2008 03:16:13 -0700 (PDT) > > Which operations should be performed via a buffer and which should > operate on a string? In Emacs, it's generally better (faster and more convenient) to work with a buffer than with a string. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <mailman.11680.1210937109.18990.help-gnu-emacs@gnu.org>]
* Re: Efficiently checking the initial contents of a file [not found] ` <mailman.11680.1210937109.18990.help-gnu-emacs@gnu.org> @ 2008-05-16 12:52 ` Nordlöw 2008-05-16 14:03 ` Juanma Barranquero [not found] ` <mailman.11696.1210946594.18990.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 8+ messages in thread From: Nordlöw @ 2008-05-16 12:52 UTC (permalink / raw) To: help-gnu-emacs On 16 Maj, 13:12, David Hansen <david.han...@gmx.net> wrote: > On Fri, 16 May 2008 03:16:13 -0700 (PDT) Nordlöw wrote: > > > How can I efficiently using pure emacs-lisp (without calling any > > external process) investigate the first bytes of a file? > > > My guess is > > - Open parts of the file into a buffer or string. > > - Alt 1. Switch to the buffer and do things. > > (with-temp-buffer > ;; Read the first 42 characters (not bytes) into the temp buffer. > (insert-file-contents filename nil 0 42) > ;; Do whatever you want to do here. > ) > > David Great! Thanks for all the help! I wanted a quick way of discarding ELFs from my tags-query-replace() operations. This is the result of my coding. Is it ok to use string-width() and looking-at() if don't care about different string encodings, that is I just want to compare binary byte- arrays? ;; For additional speed you can use ;; `insert-file-contents-literally', if you don't need code ;; conversions, decompression, etc. (defun file-begin-p (filename beg) "Determine if FILENAME begins with BEG." (interactive "fFile to investigate: ") (if (and (file-exists-p filename) (file-readable-p filename)) (with-temp-buffer (let ((width (string-width beg))) (insert-file-contents-literally filename nil 0 width) (looking-at beg) )))) ;; TEST: (file-begin-p "/bin/ls" "\x7fELF") (defun file-begin-ELF-p (filename) "Return non-nil if FILENAME is an ELF (Executable and Linkable Format)" (interactive "fFile to investigate: ") (file-begin-p filename "\x7fELF") ) ;; TEST: (file-begin-ELF-p "/bin/ls") Thanks again, Nordlöw ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Efficiently checking the initial contents of a file 2008-05-16 12:52 ` Nordlöw @ 2008-05-16 14:03 ` Juanma Barranquero [not found] ` <mailman.11696.1210946594.18990.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 8+ messages in thread From: Juanma Barranquero @ 2008-05-16 14:03 UTC (permalink / raw) To: Nordlöw; +Cc: help-gnu-emacs On Fri, May 16, 2008 at 2:52 PM, Nordlöw <per.nordlow@gmail.com> wrote: > (defun file-begin-p (filename beg) > "Determine if FILENAME begins with BEG." > (interactive "fFile to investigate: ") > (if (and (file-exists-p filename) > (file-readable-p filename)) > (with-temp-buffer > (let ((width (string-width beg))) > (insert-file-contents-literally filename nil 0 width) > (looking-at beg) > )))) A few additional comments: - BEG can be a regular expression, in which case the length of it can be a red herring; for example (file-begin-p "[ABC]\\{20\\}") will always return nil. Perhaps you could do (defun file-begin-p (filename beg &optional len) ... (let ((width (or len (string-width beg)))) ... so you can pass a length if needed. - If you don't want to pass a regexp, it is advisable to remember using regexp-quote, otherwise (file-begin-p "A*") is always going to return t. - Mixing `insert-file-contents-literally' and `string-width' does not seem like a good idea. Better use `string-bytes', or, if BEG can contain non-ASCII chars, use `insert-file-contents' and `length'. I'd recommend that second route. Hope this helps, Juanma ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <mailman.11696.1210946594.18990.help-gnu-emacs@gnu.org>]
* Re: Efficiently checking the initial contents of a file [not found] ` <mailman.11696.1210946594.18990.help-gnu-emacs@gnu.org> @ 2008-05-19 7:20 ` Nordlöw 2008-05-22 16:58 ` Thien-Thi Nguyen 0 siblings, 1 reply; 8+ messages in thread From: Nordlöw @ 2008-05-19 7:20 UTC (permalink / raw) To: help-gnu-emacs On 16 Maj, 16:03, "Juanma Barranquero" <lek...@gmail.com> wrote: > On Fri, May 16, 2008 at 2:52 PM, Nordlöw <per.nord...@gmail.com> wrote: > > (defun file-begin-p (filename beg) > > "Determine if FILENAME begins with BEG." > > (interactive "fFile to investigate: ") > > (if (and (file-exists-p filename) > > (file-readable-p filename)) > > (with-temp-buffer > > (let ((width (string-width beg))) > > (insert-file-contents-literally filename nil 0 width) > > (looking-at beg) > > )))) > > A few additional comments: > > - BEG can be a regular expression, in which case the length of it can > be a red herring; for example (file-begin-p "[ABC]\\{20\\}") will > always return nil. Perhaps you could do > > (defun file-begin-p (filename beg &optional len) > ... > (let ((width (or len (string-width beg)))) > ... > > so you can pass a length if needed. > > - If you don't want to pass a regexp, it is advisable to remember > using regexp-quote, otherwise (file-begin-p "A*") is always going to > return t. > > - Mixing `insert-file-contents-literally' and `string-width' does not > seem like a good idea. Better use `string-bytes', or, if BEG can > contain non-ASCII chars, use `insert-file-contents' and `length'. I'd > recommend that second route. > > Hope this helps, > > Juanma Hey again! Is I see it the most general and efficient solution to this problem would be to Make the looking-at() logic stream based as we want to prevent the logic from requiring the whole buffer to be read from file into memory regardless of the length of BEG. Is there some way of opening a file into a buffer without actually reading the whole contents of the file into memory before it is actually used by, in our case, looking-at() ? A less optimal solution could make use of a function say regexp-max- match-length(REGEXP) the determines the longest possible pattern a regexp can match, possibly infinity. The return value from this function could then be used as length-argument to insert-file-contents- literally(). By the way I am surprised that my sought-of-function does not already exist in GNU Emacs. Can it be because it is difficult to design a solution that satisfies *all* of the needs given above. /Nordlöw ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Efficiently checking the initial contents of a file 2008-05-19 7:20 ` Nordlöw @ 2008-05-22 16:58 ` Thien-Thi Nguyen 0 siblings, 0 replies; 8+ messages in thread From: Thien-Thi Nguyen @ 2008-05-22 16:58 UTC (permalink / raw) To: Nordlöw; +Cc: help-gnu-emacs () Nordlöw <per.nordlow@gmail.com> () Mon, 19 May 2008 00:20:26 -0700 (PDT) Is I see it the most general and efficient solution to this problem would be to Hmm, i tend to find "most general" and "most efficient" to be mutually exclusive. Perhaps we have different ideas of what is general and what is efficient. Here is a relatively efficient and highly parsimonious way, made by squeezing the ideas previously presented by others into a lower-bound wrap-check: (defun elf-p (filename) (when (< 4 (nth 7 (file-attributes filename))) (with-temp-buffer (insert-file-contents filename nil 0 4) (string= "\x7fELF" (buffer-string))))) For a "most general" way, i'm in the process of implementing a file(1)-workalike in Emacs Lisp. For a peek at its design, you can see the sexp-based ruleset (and a Scheme prototype linked therefrom) at: http://www.gnuvola.org/data/ (de-uglified magic file) Lastly, if the original problem uses `elf-p' as a filter, unless you labor under a pitifully degrading (eg., usloth) environment, you may find the "best way" is to use `file-executable-p' instead. thi ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-05-22 16:58 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-16 10:16 Efficiently checking the initial contents of a file Nordlöw 2008-05-16 11:08 ` Juanma Barranquero 2008-05-16 11:12 ` David Hansen 2008-05-16 11:15 ` Eli Zaretskii [not found] ` <mailman.11680.1210937109.18990.help-gnu-emacs@gnu.org> 2008-05-16 12:52 ` Nordlöw 2008-05-16 14:03 ` Juanma Barranquero [not found] ` <mailman.11696.1210946594.18990.help-gnu-emacs@gnu.org> 2008-05-19 7:20 ` Nordlöw 2008-05-22 16:58 ` Thien-Thi Nguyen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).