From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Thien-Thi Nguyen Newsgroups: gmane.emacs.devel Subject: recognizing a file by scanning it Date: Sun, 27 Apr 2008 13:36:50 +0200 Message-ID: <87od7vr0kt.fsf@ambire.localdomain> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1209296400 26653 80.91.229.12 (27 Apr 2008 11:40:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 27 Apr 2008 11:40:00 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Apr 27 13:40:35 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Jq5FC-0000WV-E4 for ged-emacs-devel@m.gmane.org; Sun, 27 Apr 2008 13:40:34 +0200 Original-Received: from localhost ([127.0.0.1]:37614 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Jq5EV-0003Y3-Rg for ged-emacs-devel@m.gmane.org; Sun, 27 Apr 2008 07:39:51 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Jq5EQ-0003X5-Ns for emacs-devel@gnu.org; Sun, 27 Apr 2008 07:39:46 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Jq5EM-0003UW-E7 for emacs-devel@gnu.org; Sun, 27 Apr 2008 07:39:45 -0400 Original-Received: from [199.232.76.173] (port=33981 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Jq5EM-0003UH-8a for emacs-devel@gnu.org; Sun, 27 Apr 2008 07:39:42 -0400 Original-Received: from [151.61.143.146] (helo=ambire.localdomain) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Jq5EL-0005lq-R3 for emacs-devel@gnu.org; Sun, 27 Apr 2008 07:39:42 -0400 Original-Received: from ttn by ambire.localdomain with local (Exim 4.63) (envelope-from ) id 1Jq5Ba-0005EC-Se for emacs-devel@gnu.org; Sun, 27 Apr 2008 13:36:50 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-detected-kernel: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:96003 Archived-At: Some time back (within the last half-year or so) there was discussion about Emacs being able to recognize file types by scanning content rather than (or in addition to) using name-based heuristics. One model for such a capability is the external command file(1), which takes as its data a magic(5) file containing (possibly-chained) rules specifying where and what to look for in the target file in order to make a match, and additionally what to display on match. For example, here is a fragment of ~/.magic ("|"-prefixed): |# Emacs 18 - this is always correct, but not very magical. |0 string \012( Emacs v18 byte-compiled Lisp data |# Emacs 19+ - ver. recognition added by Ian Springer |# Also applies to XEmacs 19+ .elc files; could tell them apart if we had regexp |# support or similar - Chris Chittleborough |0 string ;ELC |>4 byte >19 |>4 byte <32 Emacs/XEmacs v%d byte-compiled Lisp data I have written a Scheme program to translate this into sexps amenable to both Scheme and Emacs Lisp `read'. To continue the example: |(0 0 string (= . "\n(") "Emacs v18 byte-compiled Lisp data") |(0 0 string (= . ";ELC") "") |(1 4 byte (> 19) "") |(1 4 byte (< 32) "Emacs/XEmacs v%d byte-compiled Lisp data") (See for the complete translation.) The Scheme program also mimics basic file(1) functionality; it can recognize an unknown bag of bytes using the rules in either the original magic(5) format or the translated-to-sexps variant, displaying output indistinguishable (for the most part) from that of "file -n -N". |$ ls="src/temacs etc/images/info.pbm lisp/startup.el lisp/startup.elc" |$ for f in $ls ; do file -n -N $f ; ttn-do magic $f ; done |src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, not stripped |src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV) |etc/images/info.pbm: Netpbm PBM "rawbits" image data |etc/images/info.pbm: Netpbm PBM "rawbits" image data |lisp/startup.el: Lisp/Scheme program text |lisp/startup.el: Lisp/Scheme program text |lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data |lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data Although it lacks advanced file(1) functionality (integrated ELF grokking, charset guesstimation, fancy printf(3) output, etc), i consider it complete enough to be a good starting point for a port to Emacs Lisp. (Indeed, Emacs is much nicer for implementing such features as charset guesstimation.) But before continuing, i would like to discover if anyone else is working on something similar, to avoid (more?) duplicate effort. thi