unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* auto-detecting encoding for XML
@ 2002-05-19  2:27 Colin Walters
  2002-05-19 23:13 ` Stefan Monnier
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Colin Walters @ 2002-05-19  2:27 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

Hi,

Someone asked on the xml-resume-devel list about the problems they'd had
because Emacs decided that their XML file (encoded in utf-8) should be
encoded using `raw-text'.  

XML has an optional encoding="foo" parameter, used like:

<?xml version="1.0" encoding="UTF-8"?>

Wouldn't it be nice if Emacs could see that and automatically Do The
Right Thing?

The attached patch is an implementation.  The reason why I had to put
`sgml-xml-auto-coding-function' inside mule.el is because 1) we can't
autoload it from sgml-mode.el, since that would be more or less
equivalent to always loading sgml-mode.el, and 2) If we added it to
`auto-coding-functions' when sgml-mode.el was loaded, then it wouldn't
work the first time a user visited an XML file, since the encoding has
already been determined.  (Subsequently it would work, though).

2002-05-18  Colin Walters  <walters@gnu.org>

	* international/mule.el (make-coding-system): Doc fixes.

	* international/mule.el (auto-coding-functions): New variable.
	(auto-coding-from-file-contents): Use it.
	(set-auto-coding): Update docs.
	(sgml-xml-auto-coding-function): New function.



[-- Attachment #2: mule.patch --]
[-- Type: text/plain, Size: 6123 bytes --]

--- man/mule.texi.~1.60.~	Fri Apr  5 04:37:54 2002
+++ man/mule.texi	Sat May 18 21:35:48 2002
@@ -793,17 +793,19 @@
 
 @vindex auto-coding-alist
 @vindex auto-coding-regexp-alist
-  The variables @code{auto-coding-alist} and
-@code{auto-coding-regexp-alist} are the strongest way to specify the
-coding system for certain patterns of file names, or for files
-containing certain patterns; these variables even override
-@samp{-*-coding:-*-} tags in the file itself.  Emacs uses
-@code{auto-coding-alist} for tar and archive files, to prevent it
+@vindex auto-coding-functions
+  The variables @code{auto-coding-alist},
+@code{auto-coding-regexp-alist} and @code{auto-coding-functions} are
+the strongest way to specify the coding system for certain patterns of
+file names, or for files containing certain patterns; these variables
+even override @samp{-*-coding:-*-} tags in the file itself.  Emacs
+uses @code{auto-coding-alist} for tar and archive files, to prevent it
 from being confused by a @samp{-*-coding:-*-} tag in a member of the
 archive and thinking it applies to the archive file as a whole.
 Likewise, Emacs uses @code{auto-coding-regexp-alist} to ensure that
-RMAIL files, whose names in general don't match any particular pattern,
-are decoded correctly.
+RMAIL files, whose names in general don't match any particular
+pattern, are decoded correctly.  One of the builtin
+@code{auto-coding-functions} detects the encoding for XML files.
 
   If Emacs recognizes the encoding of a file incorrectly, you can
 reread the file using the correct coding system by typing @kbd{C-x
--- lisp/international/mule.el.~1.141.~	Tue Feb 26 11:27:47 2002
+++ lisp/international/mule.el	Sat May 18 22:16:00 2002
@@ -725,9 +725,9 @@
 
 TYPE is an integer value indicating the type of the coding system as follows:
   0: Emacs internal format,
-  1: Shift-JIS (or MS-Kanji) used mainly on Japanese PC,
+  1: Shift-JIS (or MS-Kanji) used mainly on Japanese PCs,
   2: ISO-2022 including many variants,
-  3: Big5 used mainly on Chinese PC,
+  3: Big5 used mainly on Chinese PCs,
   4: private, CCL programs provide encoding/decoding algorithm,
   5: Raw-text, which means that text contains random 8-bit codes.
 
@@ -822,7 +822,7 @@
  
   o mime-charset
  
-  The value is a symbol of which name is `MIME-charset' parameter of
+  The value is a symbol whose name is the `MIME-charset' parameter of
   the coding system.
  
   o valid-codes (meaningful only for a coding system based on CCL)
@@ -1488,6 +1488,22 @@
   :type '(repeat (cons (regexp :tag "Regexp")
 		       (symbol :tag "Coding system"))))
 
+;; See the bottom of this file for built-in auto coding functions.
+(defcustom auto-coding-functions '(sgml-xml-auto-coding-function)
+  "A list of functions which attempt to determine a coding system.
+
+Each function in this list should be written to operate on the current
+buffer, but should not modify it in any way.  It should take one
+argument SIZE, past which it should not search.  If a function
+succeeds in determining a coding system, it should return that coding
+system.  Otherwise, it should return nil.
+
+The functions in this list take priority over `coding:' tags in the
+file, just as for `auto-coding-regexp-alist'."
+  :group 'files
+  :group 'mule
+  :type '(repeat function))
+
 (defvar set-auto-coding-for-load nil
   "Non-nil means look for `load-coding' property instead of `coding'.
 This is used for loading and byte-compiling Emacs Lisp files.")
@@ -1503,21 +1519,25 @@
 	(setq alist (cdr alist))))
     coding-system))
 
-
 (defun auto-coding-from-file-contents (size)
   "Determine a coding system from the contents of the current buffer.
 The current buffer contains SIZE bytes starting at point.
 Value is either a coding system or nil."
   (save-excursion
     (let ((alist auto-coding-regexp-alist)
+	  (funcs auto-coding-functions)
 	  coding-system)
       (while (and alist (not coding-system))
 	(let ((regexp (car (car alist))))
 	  (when (re-search-forward regexp (+ (point) size) t)
 	    (setq coding-system (cdr (car alist)))))
 	(setq alist (cdr alist)))
+      (while (and funcs (not coding-system))
+	(setq coding-system (condition-case e
+				(save-excursion
+				  (funcall (pop funcs) size))
+			      (error nil))))
       coding-system)))
-		
 
 (defun set-auto-coding (filename size)
   "Return coding system for a file FILENAME of which SIZE bytes follow point.
@@ -1527,7 +1547,8 @@
 It checks FILENAME against the variable `auto-coding-alist'.  If
 FILENAME doesn't match any entries in the variable, it checks the
 contents of the current buffer following point against
-`auto-coding-regexp-alist'.  If no match is found, it checks for a
+`auto-coding-regexp-alist', and tries calling each function in
+`auto-coding-functions'.  If no match is found, it checks for a
 `coding:' tag in the first one or two lines following point.  If no
 `coding:' tag is found, it checks for local variables list in the last
 3K bytes out of the SIZE bytes.
@@ -1896,6 +1917,28 @@
 (put 'ignore-relative-composition 'char-table-extra-slots 0)
 (setq ignore-relative-composition
       (make-char-table 'ignore-relative-composition))
+
+
+;;; Built-in auto-coding-functions:
+
+(defun sgml-xml-auto-coding-function (size)
+  "Determine whether the buffer is XML, and if so, its encoding.
+This function is intended to be added to `auto-coding-functions'."
+  (when (re-search-forward "\\s-*<\\?xml" size t)
+    (let ((end (save-excursion
+		 ;; This is a hack.
+		 (search-forward "?>" size t))))
+      (when end
+	(if (re-search-forward "encoding=\"\\(.+?\\)\"" end t)
+	    (let ((match (downcase (match-string 1))))
+	      ;; FIXME: what other encodings are valid, and how can we
+	      ;; translate them to the names of coding systems?
+	      (cond ((string= match "utf-8")
+		     'utf-8)
+		    ((string-match "iso-8859-[[:digit:]]+" match)
+		     (intern match))
+		    (t nil)))
+	  'utf-8)))))
 
 ;;;
 (provide 'mule)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-19  2:27 auto-detecting encoding for XML Colin Walters
@ 2002-05-19 23:13 ` Stefan Monnier
  2002-05-20  4:53   ` Eli Zaretskii
                     ` (2 more replies)
  2002-05-20  4:48 ` Eli Zaretskii
  2002-05-20 14:48 ` Richard Stallman
  2 siblings, 3 replies; 20+ messages in thread
From: Stefan Monnier @ 2002-05-19 23:13 UTC (permalink / raw)
  Cc: emacs-devel

> 	* international/mule.el (auto-coding-functions): New variable.

Why not extend auto-coding-regexp-alist so it can associate a regexp
to a function (rather than a coding-system) ?
Or why not do what po.el does (i.e. use file-coding-system-alist) ?
Admittedly, the file-coding-system-alist approach is pretty
hairy/heavy-weight.
In any case we should come up with some way to do those things conveniently,
because it applies to po-mode, to sgml-mode to tex-mode and probably
a lot more.  Note that these are always associated with a mode, so
it would be good if the implementation also was mode-specific so
that it automatically works if you open an xml file called
foo.myxmlextension (as long as "\\.myxmlextension\\'" is in the
auto-mode-alist).


	Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-19  2:27 auto-detecting encoding for XML Colin Walters
  2002-05-19 23:13 ` Stefan Monnier
@ 2002-05-20  4:48 ` Eli Zaretskii
  2002-05-20  7:07   ` Colin Walters
  2002-05-20 14:48 ` Richard Stallman
  2 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2002-05-20  4:48 UTC (permalink / raw)
  Cc: emacs-devel


On 18 May 2002, Colin Walters wrote:

> +	(if (re-search-forward "encoding=\"\\(.+?\\)\"" end t)
> +	    (let ((match (downcase (match-string 1))))
> +	      ;; FIXME: what other encodings are valid, and how can we
> +	      ;; translate them to the names of coding systems?
> +	      (cond ((string= match "utf-8")
> +		     'utf-8)
> +		    ((string-match "iso-8859-[[:digit:]]+" match)
> +		     (intern match))
> +		    (t nil)))

Why didn't you use `intern' in all cases?  If you are bothered by the 
possibility that the resulting symbol is not a valid coding system, you 
can check that with coding-system-p.

Btw, does this change honor "C-x RET c"?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-19 23:13 ` Stefan Monnier
@ 2002-05-20  4:53   ` Eli Zaretskii
  2002-05-20  7:04   ` Colin Walters
  2002-05-20 10:10   ` Kai Großjohann
  2 siblings, 0 replies; 20+ messages in thread
From: Eli Zaretskii @ 2002-05-20  4:53 UTC (permalink / raw)
  Cc: Colin Walters, emacs-devel


On Sun, 19 May 2002, Stefan Monnier wrote:

> In any case we should come up with some way to do those things conveniently,
> because it applies to po-mode, to sgml-mode to tex-mode and probably
> a lot more.

Yes, I agree.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-19 23:13 ` Stefan Monnier
  2002-05-20  4:53   ` Eli Zaretskii
@ 2002-05-20  7:04   ` Colin Walters
  2002-05-20 14:23     ` Stefan Monnier
  2002-05-20 10:10   ` Kai Großjohann
  2 siblings, 1 reply; 20+ messages in thread
From: Colin Walters @ 2002-05-20  7:04 UTC (permalink / raw)


On Sun, 2002-05-19 at 19:13, Stefan Monnier wrote:
> > 	* international/mule.el (auto-coding-functions): New variable.
> 
> Why not extend auto-coding-regexp-alist so it can associate a regexp
> to a function (rather than a coding-system) ?

Hm.  It seems cleaner to just have the function do the searching in the
first place, instead of in this case matching against a regexp, then
callling a function which will probably have to do the same searching...

> Or why not do what po.el does (i.e. use file-coding-system-alist) ?
> Admittedly, the file-coding-system-alist approach is pretty
> hairy/heavy-weight.

Well, it also has the disadvantage in this case that it depends on file
extensions; XML tends to be used as an encoding for other types of
files, which use their own extension.  So using file names as a way to
detect XML is probably a bad approach.

Just as a random sample on my system:

~/.gconf/* contains XML files, and their extension is .xml.
/etc/oglerc is an XML file, but doesn't have an extension at all.
~/local-cvs/resume/resume.fo is an XML file.
.nautilus-metafile.xml is XML.
/foreign-cvs/cvs.gnome.org/evolution/views/mail/Messages.galview is XML.
/foreign-cvs/cvs.gnome.org/evolution/views/mail/galview.xml is XML.

So only about 50% of the XML files have an "obvious" extension like
".xml".

> In any case we should come up with some way to do those things conveniently,
> because it applies to po-mode, to sgml-mode to tex-mode and probably
> a lot more.

auto-coding-functions should be able to handle those.

> Note that these are always associated with a mode, so
> it would be good if the implementation also was mode-specific so
> that it automatically works if you open an xml file called
> foo.myxmlextension (as long as "\\.myxmlextension\\'" is in the
> auto-mode-alist).

Yes.  It's very tricky though.  We can't possibly cover all the file
name extensions that would be used for XML.  I agree that it would be
great if we had a way to associate it with a mode.  The problem with
that though is that by the time the major mode function is called, the
file will have already been read from disk, and the only way to change
the coding system is to reread it from disk (as I understand things). 
And doing that in a major mode function is kind of a hack.  Maybe that's
the best solution, but auto-coding-functions certainly does the trick
here, and it seems to be extensible to handle the analogous po-mode and
tex-mode problems.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20  4:48 ` Eli Zaretskii
@ 2002-05-20  7:07   ` Colin Walters
  0 siblings, 0 replies; 20+ messages in thread
From: Colin Walters @ 2002-05-20  7:07 UTC (permalink / raw)


On Mon, 2002-05-20 at 00:48, Eli Zaretskii wrote:

> Why didn't you use `intern' in all cases?  If you are bothered by the 
> possibility that the resulting symbol is not a valid coding system, you 
> can check that with coding-system-p.

Well, I guess I should have checked to be sure that it was a valid
coding system before returning it.  I will do that if the patch is
installed (along with figuring out what types of encodings can actually
appear in an XML encoding declaration).

> Btw, does this change honor "C-x RET c"?

Yes.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-19 23:13 ` Stefan Monnier
  2002-05-20  4:53   ` Eli Zaretskii
  2002-05-20  7:04   ` Colin Walters
@ 2002-05-20 10:10   ` Kai Großjohann
  2002-05-20 12:59     ` Eli Zaretskii
  2002-05-20 14:18     ` Stefan Monnier
  2 siblings, 2 replies; 20+ messages in thread
From: Kai Großjohann @ 2002-05-20 10:10 UTC (permalink / raw)
  Cc: Colin Walters, emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@RUM.cs.yale.edu> writes:

> In any case we should come up with some way to do those things
> conveniently, because it applies to po-mode, to sgml-mode to
> tex-mode and probably a lot more.  Note that these are always
> associated with a mode, so it would be good if the implementation
> also was mode-specific so that it automatically works if you open an
> xml file called foo.myxmlextension (as long as
> "\\.myxmlextension\\'" is in the auto-mode-alist).

I think I want my XML files decoded correctly, even if I decide to
edit them in C mode, say.  (It is entirely reasonable to assume that
there might be specialized modes for special XML files.)

Opinions?

kai
-- 
Silence is foo!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 10:10   ` Kai Großjohann
@ 2002-05-20 12:59     ` Eli Zaretskii
  2002-05-20 14:31       ` Kai Großjohann
  2002-05-20 14:18     ` Stefan Monnier
  1 sibling, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2002-05-20 12:59 UTC (permalink / raw)
  Cc: emacs-devel


On Mon, 20 May 2002 Kai.Grossjohann@CS.Uni-Dortmund.DE wrote:

> I think I want my XML files decoded correctly, even if I decide to
> edit them in C mode, say.

You can always use "C-x RET c", or frob one of the auto-coding regexps 
locally, can't you?

I mean, editing XML files in C mode sounds a very strange requirement, 
one that should not happen frequently.  A PO file edited in C mode will 
also not decode correctly.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 10:10   ` Kai Großjohann
  2002-05-20 12:59     ` Eli Zaretskii
@ 2002-05-20 14:18     ` Stefan Monnier
  2002-05-20 14:29       ` Kai Großjohann
                         ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Stefan Monnier @ 2002-05-20 14:18 UTC (permalink / raw)
  Cc: Stefan Monnier, Colin Walters, emacs-devel

> > In any case we should come up with some way to do those things
> > conveniently, because it applies to po-mode, to sgml-mode to
> > tex-mode and probably a lot more.  Note that these are always
> > associated with a mode, so it would be good if the implementation
> > also was mode-specific so that it automatically works if you open an
> > xml file called foo.myxmlextension (as long as
> > "\\.myxmlextension\\'" is in the auto-mode-alist).
> 
> I think I want my XML files decoded correctly, even if I decide to
> edit them in C mode, say.  (It is entirely reasonable to assume that
> there might be specialized modes for special XML files.)

What about editing an elisp file containing an xml skeleton that
looks just like the tag matched by the auto-coding-regexp-alist ?


	Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20  7:04   ` Colin Walters
@ 2002-05-20 14:23     ` Stefan Monnier
  2002-05-20 22:32       ` Colin Walters
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Monnier @ 2002-05-20 14:23 UTC (permalink / raw)
  Cc: emacs-devel

> On Sun, 2002-05-19 at 19:13, Stefan Monnier wrote:
> > > 	* international/mule.el (auto-coding-functions): New variable.
> > 
> > Why not extend auto-coding-regexp-alist so it can associate a regexp
> > to a function (rather than a coding-system) ?
> 
> Hm.  It seems cleaner to just have the function do the searching in the
> first place, instead of in this case matching against a regexp, then
> callling a function which will probably have to do the same searching...

No, it could just use the match-data directly.
I was thinking of it the other way: the function will most likely need
to do a regexp search anyway, so why not include it with
the auto-coding-regexp-alist.  Also the "specify a coding-system
or a function returning a coding-system" thing is already used in
file-coding-system-alist so it seemed natural enough.

> > Or why not do what po.el does (i.e. use file-coding-system-alist) ?
> > Admittedly, the file-coding-system-alist approach is pretty
> > hairy/heavy-weight.
> 
> Well, it also has the disadvantage in this case that it depends on file
> extensions; XML tends to be used as an encoding for other types of
> files, which use their own extension.  So using file names as a way to
> detect XML is probably a bad approach.

That's why it would be better to link it to major-modes.

> > Note that these are always associated with a mode, so
> > it would be good if the implementation also was mode-specific so
> > that it automatically works if you open an xml file called
> > foo.myxmlextension (as long as "\\.myxmlextension\\'" is in the
> > auto-mode-alist).
> Yes.  It's very tricky though.

Yes, I know it's tricky.  But maybe we can come up with something clever.
In the mean time, I agree that extending auto-coding-regexp-alist is maybe
the best approach.


	Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 14:18     ` Stefan Monnier
@ 2002-05-20 14:29       ` Kai Großjohann
  2002-05-20 14:32         ` Stefan Monnier
  2002-05-20 15:26       ` Kai Großjohann
  2002-05-20 22:09       ` Colin Walters
  2 siblings, 1 reply; 20+ messages in thread
From: Kai Großjohann @ 2002-05-20 14:29 UTC (permalink / raw)
  Cc: Colin Walters, emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

> What about editing an elisp file containing an xml skeleton that
> looks just like the tag matched by the auto-coding-regexp-alist ?

I guess then you're screwed.  Consider the following text file:

<?xml version="1.0" encoding="utf-8"?><!-- -*- coding: emacs-mule; -*- --!>

It's easy to construct examples where things fail.

kai
-- 
Silence is foo!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 12:59     ` Eli Zaretskii
@ 2002-05-20 14:31       ` Kai Großjohann
  2002-05-20 14:32         ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Kai Großjohann @ 2002-05-20 14:31 UTC (permalink / raw)
  Cc: emacs-devel

Eli Zaretskii <eliz@is.elta.co.il> writes:

> I mean, editing XML files in C mode sounds a very strange requirement, 
> one that should not happen frequently.

Indeed.  I shouldn't have used such a strange example.  But I did say
that there might be special modes on top of XML which might wish to
use it.

Hm.  All things considered, maybe I should retract my suggestion.
You can always add those special modes to a list of `grok XML
processing instruction stuff' modes.

kai
-- 
Silence is foo!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 14:29       ` Kai Großjohann
@ 2002-05-20 14:32         ` Stefan Monnier
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Monnier @ 2002-05-20 14:32 UTC (permalink / raw)
  Cc: Stefan Monnier, Colin Walters, emacs-devel

> "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> 
> > What about editing an elisp file containing an xml skeleton that
> > looks just like the tag matched by the auto-coding-regexp-alist ?
> 
> I guess then you're screwed.  Consider the following text file:
> 
> <?xml version="1.0" encoding="utf-8"?><!-- -*- coding: emacs-mule; -*- --!>
> 
> It's easy to construct examples where things fail.

Note that my example might be a very legitimate situation, whereas
I find yours to be fundamentally wrong.  It's OK if things "fail"
when the user contradicts herself.


	Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 14:31       ` Kai Großjohann
@ 2002-05-20 14:32         ` Eli Zaretskii
  0 siblings, 0 replies; 20+ messages in thread
From: Eli Zaretskii @ 2002-05-20 14:32 UTC (permalink / raw)
  Cc: emacs-devel


On Mon, 20 May 2002 Kai.Grossjohann@CS.Uni-Dortmund.DE wrote:

> > I mean, editing XML files in C mode sounds a very strange requirement, 
> > one that should not happen frequently.
> 
> Indeed.  I shouldn't have used such a strange example.  But I did say
> that there might be special modes on top of XML which might wish to
> use it.
> 
> Hm.  All things considered, maybe I should retract my suggestion.
> You can always add those special modes to a list of `grok XML
> processing instruction stuff' modes.

If this is something special to XML files only (is it?), then perhaps XML 
mode should be able to edit C/Lisp/Ada/Pascal/whatever files embedded in 
XML.  Then you get both of the worlds for free (well, almost ;-).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-19  2:27 auto-detecting encoding for XML Colin Walters
  2002-05-19 23:13 ` Stefan Monnier
  2002-05-20  4:48 ` Eli Zaretskii
@ 2002-05-20 14:48 ` Richard Stallman
  2 siblings, 0 replies; 20+ messages in thread
From: Richard Stallman @ 2002-05-20 14:48 UTC (permalink / raw)
  Cc: emacs-devel

The change looks good to me.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 14:18     ` Stefan Monnier
  2002-05-20 14:29       ` Kai Großjohann
@ 2002-05-20 15:26       ` Kai Großjohann
  2002-05-20 22:18         ` Colin Walters
  2002-05-20 22:09       ` Colin Walters
  2 siblings, 1 reply; 20+ messages in thread
From: Kai Großjohann @ 2002-05-20 15:26 UTC (permalink / raw)
  Cc: Colin Walters, emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

> What about editing an elisp file containing an xml skeleton that
> looks just like the tag matched by the auto-coding-regexp-alist ?

The <?xml ...?> stuff must be at the beginning of an XML file, I
think.  I don't think elisp files are likely to begin with the
characters "<?".

kai
-- 
Silence is foo!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 14:18     ` Stefan Monnier
  2002-05-20 14:29       ` Kai Großjohann
  2002-05-20 15:26       ` Kai Großjohann
@ 2002-05-20 22:09       ` Colin Walters
  2 siblings, 0 replies; 20+ messages in thread
From: Colin Walters @ 2002-05-20 22:09 UTC (permalink / raw)


[ CC's trimmed ]

On Mon, 2002-05-20 at 10:18, Stefan Monnier wrote:

> What about editing an elisp file containing an xml skeleton that
> looks just like the tag matched by the auto-coding-regexp-alist ?

That is a good reason for allowing coding: tags to be able to override
`auto-coding-functions'.  I will change the implementation to allow
that.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 15:26       ` Kai Großjohann
@ 2002-05-20 22:18         ` Colin Walters
  0 siblings, 0 replies; 20+ messages in thread
From: Colin Walters @ 2002-05-20 22:18 UTC (permalink / raw)


On Mon, 2002-05-20 at 11:26, Kai Großjohann wrote:

> The <?xml ...?> stuff must be at the beginning of an XML file, I
> think.  I don't think elisp files are likely to begin with the
> characters "<?".

Ah, must the XML declaration be at the very beginning of the file?  I
thought that it might be legal for it to be preceeded by SGML comments
or other processing instructions, etc.

[ looks at XML standard ]

Hm, it seems to say that the "XMLdecl" section must come first.  That
makes things a lot easier.  Then I'll change
`sgml-xml-auto-coding-function' to only look at the first line, and we
won't have to worry about an elisp file which contains the XML header
somewhere embedded in it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 14:23     ` Stefan Monnier
@ 2002-05-20 22:32       ` Colin Walters
  2002-05-21 19:43         ` Stefan Monnier
  0 siblings, 1 reply; 20+ messages in thread
From: Colin Walters @ 2002-05-20 22:32 UTC (permalink / raw)


On Mon, 2002-05-20 at 10:23, Stefan Monnier wrote:

> No, it could just use the match-data directly.

Ugh.  Relying on `match-data' doesn't appeal to me at all.

> I was thinking of it the other way: the function will most likely need
> to do a regexp search anyway, so why not include it with
> the auto-coding-regexp-alist.

The main point of allowing arbitrary elisp functions is that you're
*not* limited to just doing a regexp search.  With your method, if
someone wanted to write a function which did some sort of minimal "real"
parsing, then they would have to add a null regexp or something to
`auto-coding-regexp-alist' just so their function would be called.

> Yes, I know it's tricky.  But maybe we can come up with something clever.
> In the mean time, I agree that extending auto-coding-regexp-alist is maybe
> the best approach.

Errr...I never said that extending `auto-coding-regexp-alist' was the
best solution; I think it's not as clean as having a separate
`auto-coding-functions'.

The best solution is something that links the coding detection functions
with the major modes, but that will be very difficult to implement
cleanly, while the `auto-coding-functions' solves at least one case in a
clean way.

But this isn't an important enough issue to spend time debating; it is
mostly an aesthetic issue.  If you are really adamant that extending
`auto-coding-regexp-alist' is better than `auto-coding-functions', go
ahead and install that instead of my patch.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: auto-detecting encoding for XML
  2002-05-20 22:32       ` Colin Walters
@ 2002-05-21 19:43         ` Stefan Monnier
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Monnier @ 2002-05-21 19:43 UTC (permalink / raw)
  Cc: emacs-devel

> > No, it could just use the match-data directly.
> Ugh.  Relying on `match-data' doesn't appeal to me at all.

Why?  font-lock-keywords uses that all the time: the function is called
only if the regexp matches and it clearly is called right after matching
it, so you can use the match-data directly without having to re-match.

> > I was thinking of it the other way: the function will most likely need
> > to do a regexp search anyway, so why not include it with
> > the auto-coding-regexp-alist.
> 
> The main point of allowing arbitrary elisp functions is that you're
> *not* limited to just doing a regexp search.  With your method, if
> someone wanted to write a function which did some sort of minimal "real"
> parsing, then they would have to add a null regexp or something to
> `auto-coding-regexp-alist' just so their function would be called.

That's a nice philosophical argument.
I don't think you (or I) can win this argument based on technicalities.
The difference is really just a matter of aesthetics.
All the actual cases I know of use a regexp-search to start with
(these are: sgml-mode, po-mode, latex-mode, babyl).

> > Yes, I know it's tricky.  But maybe we can come up with something clever.
> > In the mean time, I agree that extending auto-coding-regexp-alist is maybe
> > the best approach.
> 
> Errr...I never said that extending `auto-coding-regexp-alist' was the
> best solution; I think it's not as clean as having a separate
> `auto-coding-functions'.

I just find extending auto-coding-regexp-alist more Emacs-like,
with no loss of generality.  If nobody else on this list cares
either way, your way is obviously the best way (cause it has
a patch, while mine doesn't and I don't care enough to write it).


	Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2002-05-21 19:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-19  2:27 auto-detecting encoding for XML Colin Walters
2002-05-19 23:13 ` Stefan Monnier
2002-05-20  4:53   ` Eli Zaretskii
2002-05-20  7:04   ` Colin Walters
2002-05-20 14:23     ` Stefan Monnier
2002-05-20 22:32       ` Colin Walters
2002-05-21 19:43         ` Stefan Monnier
2002-05-20 10:10   ` Kai Großjohann
2002-05-20 12:59     ` Eli Zaretskii
2002-05-20 14:31       ` Kai Großjohann
2002-05-20 14:32         ` Eli Zaretskii
2002-05-20 14:18     ` Stefan Monnier
2002-05-20 14:29       ` Kai Großjohann
2002-05-20 14:32         ` Stefan Monnier
2002-05-20 15:26       ` Kai Großjohann
2002-05-20 22:18         ` Colin Walters
2002-05-20 22:09       ` Colin Walters
2002-05-20  4:48 ` Eli Zaretskii
2002-05-20  7:07   ` Colin Walters
2002-05-20 14:48 ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).