unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* recognizing a file by scanning it
@ 2008-04-27 11:36 Thien-Thi Nguyen
  2008-04-27 21:06 ` Stephen J. Turnbull
  2008-04-27 21:15 ` Jason Rumney
  0 siblings, 2 replies; 11+ messages in thread
From: Thien-Thi Nguyen @ 2008-04-27 11:36 UTC (permalink / raw)
  To: emacs-devel

Some time back (within the last half-year or so) there was discussion
about Emacs being able to recognize file types by scanning content
rather than (or in addition to) using name-based heuristics.

One model for such a capability is the external command file(1), which
takes as its data a magic(5) file containing (possibly-chained) rules
specifying where and what to look for in the target file in order to
make a match, and additionally what to display on match.

For example, here is a fragment of ~/.magic ("|"-prefixed):

|# Emacs 18 - this is always correct, but not very magical.
|0      string  \012(         Emacs v18 byte-compiled Lisp data
|# Emacs 19+ - ver. recognition added by Ian Springer
|# Also applies to XEmacs 19+ .elc files; could tell them apart if we had regexp
|# support or similar - Chris Chittleborough <cchittleborough@yahoo.com.au>
|0      string  ;ELC
|>4     byte    >19
|>4     byte    <32           Emacs/XEmacs v%d byte-compiled Lisp data

I have written a Scheme program to translate this into sexps amenable
to both Scheme and Emacs Lisp `read'.  To continue the example:

|(0 0 string (= . "\n(") "Emacs v18 byte-compiled Lisp data")
|(0 0 string (= . ";ELC") "")
|(1 4 byte (> 19) "")
|(1 4 byte (< 32) "Emacs/XEmacs v%d byte-compiled Lisp data")

(See <http://www.gnuvola.org/data/> for the complete translation.)

The Scheme program also mimics basic file(1) functionality; it can
recognize an unknown bag of bytes using the rules in either the original
magic(5) format or the translated-to-sexps variant, displaying output
indistinguishable (for the most part) from that of "file -n -N".

|$ ls="src/temacs etc/images/info.pbm lisp/startup.el lisp/startup.elc"
|$ for f in $ls ; do file -n -N $f ; ttn-do magic $f ; done
|src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, not stripped
|src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV)
|etc/images/info.pbm: Netpbm PBM "rawbits" image data
|etc/images/info.pbm: Netpbm PBM "rawbits" image data
|lisp/startup.el: Lisp/Scheme program text
|lisp/startup.el: Lisp/Scheme program text
|lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data
|lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data

Although it lacks advanced file(1) functionality (integrated ELF
grokking, charset guesstimation, fancy printf(3) output, etc), i
consider it complete enough to be a good starting point for a port to
Emacs Lisp.  (Indeed, Emacs is much nicer for implementing such features
as charset guesstimation.)

But before continuing, i would like to discover if anyone else is
working on something similar, to avoid (more?) duplicate effort.

thi




^ permalink raw reply	[flat|nested] 11+ messages in thread

* recognizing a file by scanning it
  2008-04-27 11:36 recognizing a file by scanning it Thien-Thi Nguyen
@ 2008-04-27 21:06 ` Stephen J. Turnbull
  2008-04-28  3:02   ` Thien-Thi Nguyen
  2008-04-27 21:15 ` Jason Rumney
  1 sibling, 1 reply; 11+ messages in thread
From: Stephen J. Turnbull @ 2008-04-27 21:06 UTC (permalink / raw)
  To: Thien-Thi Nguyen; +Cc: emacs-devel

Thien-Thi Nguyen writes:

 > But before continuing, i would like to discover if anyone else is
 > working on something similar, to avoid (more?) duplicate effort.

file(1) contains a rather complete set of such heuristics and has been
stable for a long time.  AFAIK libmagic is free and GPL-compatible.

Also, it might be nice if you could call this facility from coding
systems (after all, what else is a BOM but file magic?)  This kind of
thing really doesn't need to be in Lisp, and might benefit from being
in C.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recognizing a file by scanning it
  2008-04-27 11:36 recognizing a file by scanning it Thien-Thi Nguyen
  2008-04-27 21:06 ` Stephen J. Turnbull
@ 2008-04-27 21:15 ` Jason Rumney
  2008-04-27 22:52   ` Chong Yidong
  1 sibling, 1 reply; 11+ messages in thread
From: Jason Rumney @ 2008-04-27 21:15 UTC (permalink / raw)
  To: Thien-Thi Nguyen; +Cc: emacs-devel

Thien-Thi Nguyen wrote:
> But before continuing, i would like to discover if anyone else is
> working on something similar, to avoid (more?) duplicate effort.
>   

magic-mode-alist was added early in the development of Emacs 22. All 
entries were removed from it shortly before Emacs 22.1 was released due 
to security issues that were raised during pretest.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recognizing a file by scanning it
  2008-04-27 21:15 ` Jason Rumney
@ 2008-04-27 22:52   ` Chong Yidong
  2008-04-27 23:28     ` Jason Rumney
  0 siblings, 1 reply; 11+ messages in thread
From: Chong Yidong @ 2008-04-27 22:52 UTC (permalink / raw)
  To: Jason Rumney; +Cc: Thien-Thi Nguyen, emacs-devel

Jason Rumney <jasonr@gnu.org> writes:

> Thien-Thi Nguyen wrote:
>> But before continuing, i would like to discover if anyone else is
>> working on something similar, to avoid (more?) duplicate effort.
>>   
>
> magic-mode-alist was added early in the development of Emacs 22. All
> entries were removed from it shortly before Emacs 22.1 was released
> due to security issues that were raised during pretest.

I don't recall that there was any security issues.  Rather, these
entries were moved to magic-fallback-mode-alist, so that they have a
lower priority than filename autodetection.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recognizing a file by scanning it
  2008-04-27 22:52   ` Chong Yidong
@ 2008-04-27 23:28     ` Jason Rumney
  2008-04-28  3:16       ` Thien-Thi Nguyen
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Rumney @ 2008-04-27 23:28 UTC (permalink / raw)
  To: Chong Yidong; +Cc: Thien-Thi Nguyen, emacs-devel

Chong Yidong wrote:
> Jason Rumney <jasonr@gnu.org> writes:
>
>   
>> Thien-Thi Nguyen wrote:
>>     
>>> But before continuing, i would like to discover if anyone else is
>>> working on something similar, to avoid (more?) duplicate effort.
>>>   
>>>       
>> magic-mode-alist was added early in the development of Emacs 22. All
>> entries were removed from it shortly before Emacs 22.1 was released
>> due to security issues that were raised during pretest.
>>     
>
> I don't recall that there was any security issues.  Rather, these
> entries were moved to magic-fallback-mode-alist, so that they have a
> lower priority than filename autodetection.
>   

The image entries were all dropped completely because of the supposed 
security issues.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recognizing a file by scanning it
  2008-04-27 21:06 ` Stephen J. Turnbull
@ 2008-04-28  3:02   ` Thien-Thi Nguyen
  2008-04-28 14:52     ` Stefan Monnier
  0 siblings, 1 reply; 11+ messages in thread
From: Thien-Thi Nguyen @ 2008-04-28  3:02 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: emacs-devel

() "Stephen J. Turnbull" <stephen@xemacs.org>
() Mon, 28 Apr 2008 06:06:42 +0900

   file(1) contains a rather complete set of such heuristics and
   has been stable for a long time.  AFAIK libmagic is free and
   GPL-compatible.

Yes, i believe so.  My concern w/ using libmagic would be
portability, and handling the case where libmagic is not
available.  I envision the feature to be non-optional.

   Also, it might be nice if you could call this facility from
   coding systems (after all, what else is a BOM but file magic?)
   This kind of thing really doesn't need to be in Lisp, and might
   benefit from being in C.

Certainly, in C it can be much faster.  

thi




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recognizing a file by scanning it
  2008-04-27 23:28     ` Jason Rumney
@ 2008-04-28  3:16       ` Thien-Thi Nguyen
  2008-04-28  8:05         ` Jason Rumney
  0 siblings, 1 reply; 11+ messages in thread
From: Thien-Thi Nguyen @ 2008-04-28  3:16 UTC (permalink / raw)
  To: Jason Rumney; +Cc: Chong Yidong, emacs-devel

() Jason Rumney <jasonr@gnu.org>
() Mon, 28 Apr 2008 00:28:03 +0100

   The image entries were all dropped completely because of the
   supposed security issues.

Searching gmane, i see a huge list of postings w/ the subject
"Image mode".  Is that what you are referring to?

A brief read (handful of pages) gives me the impression that the
issue there is the security of image libraries (libpng, etc).  Am
i missing something?

thi




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recognizing a file by scanning it
  2008-04-28  3:16       ` Thien-Thi Nguyen
@ 2008-04-28  8:05         ` Jason Rumney
  2008-04-28 18:07           ` Reiner Steib
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Rumney @ 2008-04-28  8:05 UTC (permalink / raw)
  To: Thien-Thi Nguyen; +Cc: Chong Yidong, emacs-devel

Thien-Thi Nguyen wrote:
> () Jason Rumney <jasonr@gnu.org>
> () Mon, 28 Apr 2008 00:28:03 +0100
>
>    The image entries were all dropped completely because of the
>    supposed security issues.
>
> Searching gmane, i see a huge list of postings w/ the subject
> "Image mode".  Is that what you are referring to?
>
> A brief read (handful of pages) gives me the impression that the
> issue there is the security of image libraries (libpng, etc).  Am
> i missing something?
>   

Yes, that was the issue. Rereading that thread and others on the 
subject, it seems that RMS's solution was different than I remembered - 
he made image-mode prompt the user by default in all cases, and it was 
later restored when Lars Magnusson convinced him that no other 
application was this paranoid.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recognizing a file by scanning it
  2008-04-28  3:02   ` Thien-Thi Nguyen
@ 2008-04-28 14:52     ` Stefan Monnier
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Monnier @ 2008-04-28 14:52 UTC (permalink / raw)
  To: Thien-Thi Nguyen; +Cc: Stephen J. Turnbull, emacs-devel

>    file(1) contains a rather complete set of such heuristics and
>    has been stable for a long time.  AFAIK libmagic is free and
>    GPL-compatible.

> Yes, i believe so.  My concern w/ using libmagic would be
> portability, and handling the case where libmagic is not
> available.  I envision the feature to be non-optional.

Would could like we do for `ls': use libmagic if it's available and
fallback on an Elisp replacement if not.  The Elisp replacement would
need to work even in the absence of /etc/magic.

The higher-level functionality also needs adjusting: we want to bring
magic-mode-alist and auto-mode-alist to the same level (rather than give
absolute priority to one of the two).  Basically, try them both, each
one can return a set/list of potential modes, then try and figure out
which mode to use (I've done something similar in doc-view-mode).
Hopefully this can be used to disambiguate the .arc archives from the
.arc Lisp files as well.

>    Also, it might be nice if you could call this facility from
>    coding systems (after all, what else is a BOM but file magic?)
>    This kind of thing really doesn't need to be in Lisp, and might
>    benefit from being in C.

> Certainly, in C it can be much faster.  

If it's used just to choose the major-mode, speed is a non-issue.


        Stefan




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recognizing a file by scanning it
  2008-04-28  8:05         ` Jason Rumney
@ 2008-04-28 18:07           ` Reiner Steib
  2008-04-28 18:13             ` Jason Rumney
  0 siblings, 1 reply; 11+ messages in thread
From: Reiner Steib @ 2008-04-28 18:07 UTC (permalink / raw)
  To: Jason Rumney; +Cc: emacs-devel

On Mon, Apr 28 2008, Jason Rumney wrote:

> he made image-mode prompt the user by default in all cases, and it
> was later restored when Lars Magnusson convinced him that no other
> application was this paranoid.

s/Lars Magnusson/Lars Magne Ingebrigtsen/ ;-)

It's not only paranoid, but a security risk itself...

,----[ <http://article.gmane.org/gmane.emacs.devel/66004> ]
| Warning users about something that's almost certainly not dangerous is
| a huge security risk in itself, because you're inuring the users to
| warnings.  The user will answer "Yeah, whatever" when being bothered
| with these things, and then when Emacs asks the user "Are you sure you
| wish to do an rm -rf?" (or whatever the genuinely dangerous thing it
| is), they won't bother to read the warning. 
`----

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: recognizing a file by scanning it
  2008-04-28 18:07           ` Reiner Steib
@ 2008-04-28 18:13             ` Jason Rumney
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Rumney @ 2008-04-28 18:13 UTC (permalink / raw)
  To: Jason Rumney, emacs-devel

Reiner Steib wrote:
> s/Lars Magnusson/Lars Magne Ingebrigtsen/ ;-)
>   

Sorry Lars.





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-04-28 18:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-27 11:36 recognizing a file by scanning it Thien-Thi Nguyen
2008-04-27 21:06 ` Stephen J. Turnbull
2008-04-28  3:02   ` Thien-Thi Nguyen
2008-04-28 14:52     ` Stefan Monnier
2008-04-27 21:15 ` Jason Rumney
2008-04-27 22:52   ` Chong Yidong
2008-04-27 23:28     ` Jason Rumney
2008-04-28  3:16       ` Thien-Thi Nguyen
2008-04-28  8:05         ` Jason Rumney
2008-04-28 18:07           ` Reiner Steib
2008-04-28 18:13             ` Jason Rumney

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).