all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Suggestions? Better filetype sniffing -- XHTML vs. HTML
@ 2004-02-24 14:56 D. D. Brierton
  2004-02-24 16:47 ` Kin Cho
  2004-02-24 17:11 ` Stefan Monnier
  0 siblings, 2 replies; 11+ messages in thread
From: D. D. Brierton @ 2004-02-24 14:56 UTC (permalink / raw)


I'd like to be able to have emacs autodetect whether a file is an HTML
file or an XHTML file. Standardly, file extension is not enough for this
as HTML and XHTML files tend to have the same file extensions. I have to
edit a lot of files created by other people, and they are often hopelessly
invalid, so I have no hope of perfectly differentiating XHTML from HTML.
However, there are some good clues to go on:

If a file ends in one of the following:

\.inc$
\.php[34]?$
\.[sjp]?html?$

Then  (in my case) it is *either* HTML *or* XHTML.

If a file with one of the above extensions has very near the beginning one
or both of:

<?xml
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML

then it is XHTML. Otherwise it is probably just HTML.

I know how to (add-to-list 'auto-mode-alist ... the file extensions, but I
don't know how to also check the first few lines of the file. Can anyone
offer any suggestions?

Further details:

I use psgml, and I define two derived modes:

(define-derived-mode xml-html-mode xml-mode "XHTML"
  "This version of html mode is just a wrapper around xml mode."
  (make-local-variable 'sgml-declaration)
  (make-local-variable 'sgml-default-doctype-name)
  (setq
   sgml-default-doctype-name    "html"
   sgml-declaration             "/usr/share/sgml/xml.dcl"
   sgml-always-quote-attributes t
   sgml-indent-step             2
   sgml-indent-data             t
   sgml-minimize-attributes     nil
   sgml-omittag                 nil
   sgml-shorttag                nil
   )
  )

(define-derived-mode sgml-html-mode sgml-mode "HTML"
  "This version of html mode is just a wrapper around sgml mode."
  (make-local-variable 'sgml-declaration)
  (make-local-variable 'sgml-default-doctype-name)
  (setq
   sgml-default-doctype-name    "html"
   sgml-declaration             "~/lib/DTD/html401/HTML4.decl"
   sgml-always-quote-attributes t
   sgml-indent-step             2
   sgml-indent-data             t
   sgml-minimize-attributes     nil
   sgml-omittag                 nil
   sgml-shorttag                nil
   )
  )

I also have the following:

; What files to invoke the new html-mode for?
(add-to-list 'auto-mode-alist '("\\.inc\\'" . sgml-html-mode))
(add-to-list 'auto-mode-alist '("\\.php[34]?\\'" . sgml-html-mode))
(add-to-list 'auto-mode-alist '("\\.[sj]?html?\\'" . sgml-html-mode))

So that basically I end up in sgml-html-mode when I open an (X)HTML file,
and then if it is an XHTML file I have to manually M-x xml-html-mode.

TIA, Darren

-- 
======================================================================
D. D. Brierton            darren@dzr-web.com           www.dzr-web.com
       Trying is the first step towards failure (Homer Simpson)
======================================================================

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Suggestions? Better filetype sniffing -- XHTML vs. HTML
  2004-02-24 14:56 Suggestions? Better filetype sniffing -- XHTML vs. HTML D. D. Brierton
@ 2004-02-24 16:47 ` Kin Cho
  2004-02-24 17:16   ` D. D. Brierton
  2004-02-24 17:11 ` Stefan Monnier
  1 sibling, 1 reply; 11+ messages in thread
From: Kin Cho @ 2004-02-24 16:47 UTC (permalink / raw)


(add-hook 'find-file-hooks 'my-find-file-hooks t)

(defun my-find-file-hooks ()
  (when (save-excursion (search-forward-regexp "\?xml\\|XHTML" 80 t))
    ;; do whatever you need to do
    ))

-kin

"D. D. Brierton" <darren@dzr-web.com> writes:

> I'd like to be able to have emacs autodetect whether a file is an HTML
> file or an XHTML file. Standardly, file extension is not enough for this
> as HTML and XHTML files tend to have the same file extensions. I have to
> edit a lot of files created by other people, and they are often hopelessly
> invalid, so I have no hope of perfectly differentiating XHTML from HTML.
> However, there are some good clues to go on:
> 
> If a file ends in one of the following:
> 
> \.inc$
> \.php[34]?$
> \.[sjp]?html?$
> 
> Then  (in my case) it is *either* HTML *or* XHTML.
> 
> If a file with one of the above extensions has very near the beginning one
> or both of:
> 
> <?xml
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML
> 
> then it is XHTML. Otherwise it is probably just HTML.
> 
> I know how to (add-to-list 'auto-mode-alist ... the file extensions, but I
> don't know how to also check the first few lines of the file. Can anyone
> offer any suggestions?
> 
> Further details:
> 
> I use psgml, and I define two derived modes:
> 
> (define-derived-mode xml-html-mode xml-mode "XHTML"
>   "This version of html mode is just a wrapper around xml mode."
>   (make-local-variable 'sgml-declaration)
>   (make-local-variable 'sgml-default-doctype-name)
>   (setq
>    sgml-default-doctype-name    "html"
>    sgml-declaration             "/usr/share/sgml/xml.dcl"
>    sgml-always-quote-attributes t
>    sgml-indent-step             2
>    sgml-indent-data             t
>    sgml-minimize-attributes     nil
>    sgml-omittag                 nil
>    sgml-shorttag                nil
>    )
>   )
> 
> (define-derived-mode sgml-html-mode sgml-mode "HTML"
>   "This version of html mode is just a wrapper around sgml mode."
>   (make-local-variable 'sgml-declaration)
>   (make-local-variable 'sgml-default-doctype-name)
>   (setq
>    sgml-default-doctype-name    "html"
>    sgml-declaration             "~/lib/DTD/html401/HTML4.decl"
>    sgml-always-quote-attributes t
>    sgml-indent-step             2
>    sgml-indent-data             t
>    sgml-minimize-attributes     nil
>    sgml-omittag                 nil
>    sgml-shorttag                nil
>    )
>   )
> 
> I also have the following:
> 
> ; What files to invoke the new html-mode for?
> (add-to-list 'auto-mode-alist '("\\.inc\\'" . sgml-html-mode))
> (add-to-list 'auto-mode-alist '("\\.php[34]?\\'" . sgml-html-mode))
> (add-to-list 'auto-mode-alist '("\\.[sj]?html?\\'" . sgml-html-mode))
> 
> So that basically I end up in sgml-html-mode when I open an (X)HTML file,
> and then if it is an XHTML file I have to manually M-x xml-html-mode.
> 
> TIA, Darren
> 
> -- 
> ======================================================================
> D. D. Brierton            darren@dzr-web.com           www.dzr-web.com
>        Trying is the first step towards failure (Homer Simpson)
> ======================================================================

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Suggestions? Better filetype sniffing -- XHTML vs. HTML
  2004-02-24 14:56 Suggestions? Better filetype sniffing -- XHTML vs. HTML D. D. Brierton
  2004-02-24 16:47 ` Kin Cho
@ 2004-02-24 17:11 ` Stefan Monnier
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Monnier @ 2004-02-24 17:11 UTC (permalink / raw)


> I'd like to be able to have emacs autodetect whether a file is an HTML
> file or an XHTML file. Standardly, file extension is not enough for this

The sgml-mode.el in the CVS version of Emacs does that.
You can browse the CVS repository from http://savannah.gnu.org/projects/emacs.


        Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Suggestions? Better filetype sniffing -- XHTML vs. HTML
  2004-02-24 16:47 ` Kin Cho
@ 2004-02-24 17:16   ` D. D. Brierton
  2004-02-24 17:31     ` Kin Cho
  0 siblings, 1 reply; 11+ messages in thread
From: D. D. Brierton @ 2004-02-24 17:16 UTC (permalink / raw)


On Tue, 24 Feb 2004 08:47:18 -0800, Kin Cho wrote:

> (add-hook 'find-file-hooks 'my-find-file-hooks t)
> 
> (defun my-find-file-hooks ()
>   (when (save-excursion (search-forward-regexp "\?xml\\|XHTML" 80 t))
>     ;; do whatever you need to do
>     ))

Thanks for this suggestion, Kin. Following on from your suggestion, I
guess that rather than adding a hook to find-file-hooks, I guess I could
add it to 'sgml-html-mode-hook instead, so .php or .html files etc
initially open in sgml-html-mode, but then if a <?xml or //W3C//DTD XHTML
string was found it would then switch into xml-html-mode. So I guess, that
would go something like:

(add-hook 'sgml-html-mode-hook 'check-for-xhtml-hook t)

(defun check-for-xhtml-hook ()
  (when (save-excursion (search-forward-regexp "<[?]xml\\|//W3C//DTD XHTML" 80 t))
    'xml-html-mode ;; looks like this line isn't right
    ))

As you may be able to tell, though, my lisp is pretty crappy. The above
doesn't seem to work. It seems that 'xml-html-mode is not sufficient to
change the mode of the buffer. What am I doing wrong there?

Thanks for your help.

Best, Darren

-- 
======================================================================
D. D. Brierton            darren@dzr-web.com           www.dzr-web.com
       Trying is the first step towards failure (Homer Simpson)
======================================================================

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Suggestions? Better filetype sniffing -- XHTML vs. HTML
  2004-02-24 17:16   ` D. D. Brierton
@ 2004-02-24 17:31     ` Kin Cho
  2004-02-24 17:46       ` D. D. Brierton
  0 siblings, 1 reply; 11+ messages in thread
From: Kin Cho @ 2004-02-24 17:31 UTC (permalink / raw)


"D. D. Brierton" <darren@dzr-web.com> writes:

> On Tue, 24 Feb 2004 08:47:18 -0800, Kin Cho wrote:
> 
> > (add-hook 'find-file-hooks 'my-find-file-hooks t)
> > 
> > (defun my-find-file-hooks ()
> >   (when (save-excursion (search-forward-regexp "\?xml\\|XHTML" 80 t))
> >     ;; do whatever you need to do
> >     ))
> 
> Thanks for this suggestion, Kin. Following on from your suggestion, I
> guess that rather than adding a hook to find-file-hooks, I guess I could
> add it to 'sgml-html-mode-hook instead, so .php or .html files etc
> initially open in sgml-html-mode, but then if a <?xml or //W3C//DTD XHTML
> string was found it would then switch into xml-html-mode. So I guess, that
> would go something like:
> 
> (add-hook 'sgml-html-mode-hook 'check-for-xhtml-hook t)
> 
> (defun check-for-xhtml-hook ()
>   (when (save-excursion (search-forward-regexp "<[?]xml\\|//W3C//DTD XHTML" 80 t))
>     'xml-html-mode ;; looks like this line isn't right
>     ))

Change 'xml-html-mode to (xml-html-mode).

If you know C, 'xml-html-mode is like taking the address of a
function.

-kin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Suggestions? Better filetype sniffing -- XHTML vs. HTML
  2004-02-24 17:31     ` Kin Cho
@ 2004-02-24 17:46       ` D. D. Brierton
  2005-05-27 14:29         ` slashdevslashnull
       [not found]         ` <mailman.2088.1117208718.25862.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 11+ messages in thread
From: D. D. Brierton @ 2004-02-24 17:46 UTC (permalink / raw)


On Tue, 24 Feb 2004 09:31:51 -0800, Kin Cho wrote:

> Change 'xml-html-mode to (xml-html-mode).
> 
> If you know C, 'xml-html-mode is like taking the address of a
> function.

That did the trick. Thanks a lot!

Best, Darren

-- 
======================================================================
D. D. Brierton            darren@dzr-web.com           www.dzr-web.com
       Trying is the first step towards failure (Homer Simpson)
======================================================================

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Suggestions? Better filetype sniffing -- XHTML vs. HTML
  2004-02-24 17:46       ` D. D. Brierton
@ 2005-05-27 14:29         ` slashdevslashnull
       [not found]         ` <mailman.2088.1117208718.25862.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 11+ messages in thread
From: slashdevslashnull @ 2005-05-27 14:29 UTC (permalink / raw)


"D. D. Brierton" <darren@dzr-web.com> writes:

> On Tue, 24 Feb 2004 09:31:51 -0800, Kin Cho wrote:
>
>> Change 'xml-html-mode to (xml-html-mode).
>> 
>> If you know C, 'xml-html-mode is like taking the address of a
>> function.
>
> That did the trick. Thanks a lot!
>
> Best, Darren
>
> -- 
> ======================================================================
> D. D. Brierton            darren@dzr-web.com           www.dzr-web.com
>        Trying is the first step towards failure (Homer Simpson)
> ======================================================================

It still doesnt work for me. The hook prevents MMM-Mode from being
activated for html - files (non xhtml) for me. This may be due to
splitting your Emacs WebDev (thank you for that) across multiple files
or the MMM-Mode version i am using (0.4.8) or something else.

I guess its because MMM-Mode reruns the sgml-html-mode hooks while activating
which somehow fails for html files when point is not at the beginning of
the buffer which seems to be the case.

(defun guess-xhtml-hook ()
  "Guess whether the current buffer is XHTML."
  (when
      (save-excursion
        (goto-char 1)
        (search-forward-regexp "<[?]xml\\|//W3C//DTD XHTML" 80 t))
    (xml-html-mode)))

works for me.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Suggestions? Better filetype sniffing -- XHTML vs. HTML
       [not found]         ` <mailman.2088.1117208718.25862.help-gnu-emacs@gnu.org>
@ 2005-05-27 23:39           ` Thien-Thi Nguyen
  2005-05-31  6:52             ` don provan
  0 siblings, 1 reply; 11+ messages in thread
From: Thien-Thi Nguyen @ 2005-05-27 23:39 UTC (permalink / raw)


On Tue, 24 Feb 2004 09:31:51 -0800, Kin Cho wrote:

> If you know C, 'xml-html-mode is like taking the
> address of a function.

that's a stretch!

did anyone who:
      (a) did know C previously
  and (b) did NOT know emacs lisp previously
  and (c) learned something about emacs lisp eventually
find this comparison to be helpful in going from (b) to (c)?

i'd like to expand my repertoire of didactic methods, so
if this does indeed work (contrary to my intuition), please
let me know!  it won't be the first time i've been wrong...

thi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Suggestions? Better filetype sniffing -- XHTML vs. HTML
  2005-05-27 23:39           ` Thien-Thi Nguyen
@ 2005-05-31  6:52             ` don provan
  2005-09-12  7:59               ` Thien-Thi Nguyen
  0 siblings, 1 reply; 11+ messages in thread
From: don provan @ 2005-05-31  6:52 UTC (permalink / raw)


Thien-Thi Nguyen <ttn@glug.org> writes:

> On Tue, 24 Feb 2004 09:31:51 -0800, Kin Cho wrote:
> 
> > If you know C, 'xml-html-mode is like taking the
> > address of a function.
> 
> that's a stretch!
> 
> did anyone who:
>       (a) did know C previously
>   and (b) did NOT know emacs lisp previously
>   and (c) learned something about emacs lisp eventually
> find this comparison to be helpful in going from (b) to (c)?

It's hard to remember back that far, but I think I probably found that
comparison helpful way back when. I'm not sure why you think it's such
a stretch. Sure, there are significant differences, but none-the-less,

    (setq f 'function)
    (funcall f)

is how you accomplish in emacs the same feat as in C with

    f = function;   /* implicitely takes the address of function */
    (*f)();

I suppose not all C programmers are familiar with function pointers,
yet you really aren't an emacs programmer until you're familiar with
quoted function names, so it might be considered a stretch in the
sense that some C programmers wouldn't have the concept to begin with.
Is that what you meant?

-don provan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Suggestions? Better filetype sniffing -- XHTML vs. HTML
  2005-05-31  6:52             ` don provan
@ 2005-09-12  7:59               ` Thien-Thi Nguyen
  2005-09-15 16:25                 ` don provan
  0 siblings, 1 reply; 11+ messages in thread
From: Thien-Thi Nguyen @ 2005-09-12  7:59 UTC (permalink / raw)


don provan <dprovan@comcast.net> writes:

> It's hard to remember back that far, but I think I probably found that
> comparison helpful way back when. I'm not sure why you think it's such
> a stretch. Sure, there are significant differences, but none-the-less,
>
>     (setq f 'function)
>     (funcall f)
>
> is how you accomplish in emacs the same feat as in C with
>
>     f = function;   /* implicitely takes the address of function */
>     (*f)();
>
> I suppose not all C programmers are familiar with function pointers,
> yet you really aren't an emacs programmer until you're familiar with
> quoted function names, so it might be considered a stretch in the
> sense that some C programmers wouldn't have the concept to begin with.
> Is that what you meant?

adopting analogies is a great way to learn, but i have been burned by
adopting ones that i mistakenly took to be more insightful (general)
than they actually were.  in this case, i might have as a newbie
(conjecture because i have forgotten the precise steps of my learning
process, unfortunately), taken the above to also imply that:

  (setq f '(+ 1 2 3))
  (funcall f)

would also be "valid", which it is not.  it certainly is valid when the
quoted object is a function, i'm not arguing against that.  i'm just
pointing out how easily i confuse myself w/ a little imprecision.

thi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Suggestions? Better filetype sniffing -- XHTML vs. HTML
  2005-09-12  7:59               ` Thien-Thi Nguyen
@ 2005-09-15 16:25                 ` don provan
  0 siblings, 0 replies; 11+ messages in thread
From: don provan @ 2005-09-15 16:25 UTC (permalink / raw)


Thien-Thi Nguyen <ttn@glug.org> writes:

>   (setq f '(+ 1 2 3))
>   (funcall f)
>
> would also be "valid", which it is not.

I've kinda forgotten what we're talking about now, but I think the
question was something about using concepts about function addressing
learned from C as a way of explaining quoted function names when
learning lisp. So you would only think the above is valid if you
thought

    extern int func(int, int, int);
    int (*f)(int, int, int);
    f = func(1, 2, 3); /* or perhaps "&func(1, 2, 3)" */
    (*f)();

was valid in C, which it is not. Quoting an expression is really quite
a bit different than quoting a function name, so I don't really see
any reason to worry about the two being confused.

Granted, what is completely unexpected to a C programer is that
    (setq f '(+ 1 2 3))
    (eval f)
is valid and does do just what it looks like it does.

-don

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-09-15 16:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-24 14:56 Suggestions? Better filetype sniffing -- XHTML vs. HTML D. D. Brierton
2004-02-24 16:47 ` Kin Cho
2004-02-24 17:16   ` D. D. Brierton
2004-02-24 17:31     ` Kin Cho
2004-02-24 17:46       ` D. D. Brierton
2005-05-27 14:29         ` slashdevslashnull
     [not found]         ` <mailman.2088.1117208718.25862.help-gnu-emacs@gnu.org>
2005-05-27 23:39           ` Thien-Thi Nguyen
2005-05-31  6:52             ` don provan
2005-09-12  7:59               ` Thien-Thi Nguyen
2005-09-15 16:25                 ` don provan
2004-02-24 17:11 ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.