unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* recognizing a file by scanning it
@ 2008-04-27 11:36 Thien-Thi Nguyen
  2008-04-27 21:06 ` Stephen J. Turnbull
  2008-04-27 21:15 ` Jason Rumney
  0 siblings, 2 replies; 11+ messages in thread
From: Thien-Thi Nguyen @ 2008-04-27 11:36 UTC (permalink / raw)
  To: emacs-devel

Some time back (within the last half-year or so) there was discussion
about Emacs being able to recognize file types by scanning content
rather than (or in addition to) using name-based heuristics.

One model for such a capability is the external command file(1), which
takes as its data a magic(5) file containing (possibly-chained) rules
specifying where and what to look for in the target file in order to
make a match, and additionally what to display on match.

For example, here is a fragment of ~/.magic ("|"-prefixed):

|# Emacs 18 - this is always correct, but not very magical.
|0      string  \012(         Emacs v18 byte-compiled Lisp data
|# Emacs 19+ - ver. recognition added by Ian Springer
|# Also applies to XEmacs 19+ .elc files; could tell them apart if we had regexp
|# support or similar - Chris Chittleborough <cchittleborough@yahoo.com.au>
|0      string  ;ELC
|>4     byte    >19
|>4     byte    <32           Emacs/XEmacs v%d byte-compiled Lisp data

I have written a Scheme program to translate this into sexps amenable
to both Scheme and Emacs Lisp `read'.  To continue the example:

|(0 0 string (= . "\n(") "Emacs v18 byte-compiled Lisp data")
|(0 0 string (= . ";ELC") "")
|(1 4 byte (> 19) "")
|(1 4 byte (< 32) "Emacs/XEmacs v%d byte-compiled Lisp data")

(See <http://www.gnuvola.org/data/> for the complete translation.)

The Scheme program also mimics basic file(1) functionality; it can
recognize an unknown bag of bytes using the rules in either the original
magic(5) format or the translated-to-sexps variant, displaying output
indistinguishable (for the most part) from that of "file -n -N".

|$ ls="src/temacs etc/images/info.pbm lisp/startup.el lisp/startup.elc"
|$ for f in $ls ; do file -n -N $f ; ttn-do magic $f ; done
|src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, not stripped
|src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV)
|etc/images/info.pbm: Netpbm PBM "rawbits" image data
|etc/images/info.pbm: Netpbm PBM "rawbits" image data
|lisp/startup.el: Lisp/Scheme program text
|lisp/startup.el: Lisp/Scheme program text
|lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data
|lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data

Although it lacks advanced file(1) functionality (integrated ELF
grokking, charset guesstimation, fancy printf(3) output, etc), i
consider it complete enough to be a good starting point for a port to
Emacs Lisp.  (Indeed, Emacs is much nicer for implementing such features
as charset guesstimation.)

But before continuing, i would like to discover if anyone else is
working on something similar, to avoid (more?) duplicate effort.

thi




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-04-28 18:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-27 11:36 recognizing a file by scanning it Thien-Thi Nguyen
2008-04-27 21:06 ` Stephen J. Turnbull
2008-04-28  3:02   ` Thien-Thi Nguyen
2008-04-28 14:52     ` Stefan Monnier
2008-04-27 21:15 ` Jason Rumney
2008-04-27 22:52   ` Chong Yidong
2008-04-27 23:28     ` Jason Rumney
2008-04-28  3:16       ` Thien-Thi Nguyen
2008-04-28  8:05         ` Jason Rumney
2008-04-28 18:07           ` Reiner Steib
2008-04-28 18:13             ` Jason Rumney

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).