unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Thien-Thi Nguyen <ttn@gnuvola.org>
To: emacs-devel@gnu.org
Subject: recognizing a file by scanning it
Date: Sun, 27 Apr 2008 13:36:50 +0200	[thread overview]
Message-ID: <87od7vr0kt.fsf@ambire.localdomain> (raw)

Some time back (within the last half-year or so) there was discussion
about Emacs being able to recognize file types by scanning content
rather than (or in addition to) using name-based heuristics.

One model for such a capability is the external command file(1), which
takes as its data a magic(5) file containing (possibly-chained) rules
specifying where and what to look for in the target file in order to
make a match, and additionally what to display on match.

For example, here is a fragment of ~/.magic ("|"-prefixed):

|# Emacs 18 - this is always correct, but not very magical.
|0      string  \012(         Emacs v18 byte-compiled Lisp data
|# Emacs 19+ - ver. recognition added by Ian Springer
|# Also applies to XEmacs 19+ .elc files; could tell them apart if we had regexp
|# support or similar - Chris Chittleborough <cchittleborough@yahoo.com.au>
|0      string  ;ELC
|>4     byte    >19
|>4     byte    <32           Emacs/XEmacs v%d byte-compiled Lisp data

I have written a Scheme program to translate this into sexps amenable
to both Scheme and Emacs Lisp `read'.  To continue the example:

|(0 0 string (= . "\n(") "Emacs v18 byte-compiled Lisp data")
|(0 0 string (= . ";ELC") "")
|(1 4 byte (> 19) "")
|(1 4 byte (< 32) "Emacs/XEmacs v%d byte-compiled Lisp data")

(See <http://www.gnuvola.org/data/> for the complete translation.)

The Scheme program also mimics basic file(1) functionality; it can
recognize an unknown bag of bytes using the rules in either the original
magic(5) format or the translated-to-sexps variant, displaying output
indistinguishable (for the most part) from that of "file -n -N".

|$ ls="src/temacs etc/images/info.pbm lisp/startup.el lisp/startup.elc"
|$ for f in $ls ; do file -n -N $f ; ttn-do magic $f ; done
|src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, not stripped
|src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV)
|etc/images/info.pbm: Netpbm PBM "rawbits" image data
|etc/images/info.pbm: Netpbm PBM "rawbits" image data
|lisp/startup.el: Lisp/Scheme program text
|lisp/startup.el: Lisp/Scheme program text
|lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data
|lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data

Although it lacks advanced file(1) functionality (integrated ELF
grokking, charset guesstimation, fancy printf(3) output, etc), i
consider it complete enough to be a good starting point for a port to
Emacs Lisp.  (Indeed, Emacs is much nicer for implementing such features
as charset guesstimation.)

But before continuing, i would like to discover if anyone else is
working on something similar, to avoid (more?) duplicate effort.

thi




             reply	other threads:[~2008-04-27 11:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-27 11:36 Thien-Thi Nguyen [this message]
2008-04-27 21:06 ` recognizing a file by scanning it Stephen J. Turnbull
2008-04-28  3:02   ` Thien-Thi Nguyen
2008-04-28 14:52     ` Stefan Monnier
2008-04-27 21:15 ` Jason Rumney
2008-04-27 22:52   ` Chong Yidong
2008-04-27 23:28     ` Jason Rumney
2008-04-28  3:16       ` Thien-Thi Nguyen
2008-04-28  8:05         ` Jason Rumney
2008-04-28 18:07           ` Reiner Steib
2008-04-28 18:13             ` Jason Rumney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87od7vr0kt.fsf@ambire.localdomain \
    --to=ttn@gnuvola.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).