unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH] xml-escape-region
@ 2009-10-07 18:56 Daniel Colascione
  2009-10-07 22:10 ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Colascione @ 2009-10-07 18:56 UTC (permalink / raw)
  To: emacs-devel

Index: xml.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/xml.el,v
retrieving revision 1.64
diff -u -r1.64 xml.el
--- xml.el	5 Jan 2009 03:19:57 -0000	1.64
+++ xml.el	7 Oct 2009 18:56:08 -0000
@@ -852,6 +852,13 @@
              ;; grabbing the string works here.
              string ""))
 
+;;;##autoload
+(defun xml-escape-region (beg end)
+  (interactive "*r")
+  (let ((escaped (xml-escape-string (buffer-substring beg end))))
+    (delete-region beg end)
+    (insert escaped)))
+
 (defun xml-debug-print-internal (xml indent-string)
   "Outputs the XML tree in the current buffer.
 The first line is indented with INDENT-STRING."





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xml-escape-region
  2009-10-07 18:56 [PATCH] xml-escape-region Daniel Colascione
@ 2009-10-07 22:10 ` Stefan Monnier
  2009-10-08  2:13   ` Daniel Colascione
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Monnier @ 2009-10-07 22:10 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

> +;;;##autoload
> +(defun xml-escape-region (beg end)
> +  (interactive "*r")
> +  (let ((escaped (xml-escape-string (buffer-substring beg end))))
> +    (delete-region beg end)
> +    (insert escaped)))

I'd rather not autoload such a function.  But more importantly, this
implementation is very inefficient.  xml-escape-string itself is rather
inefficient except for short strings; this is OK for its current uses,
but for xml-escape-region it's definitely not good (i.e. only usable for
small regions).


        Stefan




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xml-escape-region
  2009-10-07 22:10 ` Stefan Monnier
@ 2009-10-08  2:13   ` Daniel Colascione
  2009-10-08  5:29     ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Colascione @ 2009-10-08  2:13 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs-Devel devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Oct 7, 2009, at 6:10 PM, Stefan Monnier wrote:

>> +;;;##autoload
>> +(defun xml-escape-region (beg end)
>> +  (interactive "*r")
>> +  (let ((escaped (xml-escape-string (buffer-substring beg end))))
>> +    (delete-region beg end)
>> +    (insert escaped)))
>
> I'd rather not autoload such a function.

Do you mean that it should be loaded all the time, or that the user  
should have to explicitly load xml.el before using the function? If  
the latter, then that would make binding it to a key less convenient.

>  But more importantly, this
> implementation is very inefficient.  xml-escape-string itself is  
> rather
> inefficient except for short strings; this is OK for its current uses,
> but for xml-escape-region it's definitely not good (i.e. only usable  
> for
> small regions).

How's this? It's O(N) in the amount of text escaped.

(defun xml-escape-region (beg end)
   "XML-escape text between BEG and END according to `xml-entity- 
alist`."
   (interactive "*r")

   (let ((search-re (mapconcat #'regexp-quote
                               (mapcar #'cdr xml-entity-alist)
                               "\\|"))
         (end (set-marker (make-marker) end)))

     (save-excursion
       (goto-char beg)
       (while (re-search-forward search-re end t)
         (replace-match (concat "&"
                                (car (rassoc (match-string 0)
                                             xml-entity-alist))
                                ";"))))))

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkrNSscACgkQ17c2LVA10VvMXgCfcp9fGLBvgg2x/WvemODxe77H
o4kAoM/oz33aci08TnzboRm4GTY6zskD
=DnOn
-----END PGP SIGNATURE-----





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xml-escape-region
  2009-10-08  2:13   ` Daniel Colascione
@ 2009-10-08  5:29     ` Stefan Monnier
  2009-10-08  6:01       ` Daniel Colascione
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Monnier @ 2009-10-08  5:29 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: Emacs-Devel devel

>>> +;;;##autoload
>>> +(defun xml-escape-region (beg end)
>>> +  (interactive "*r")
>>> +  (let ((escaped (xml-escape-string (buffer-substring beg end))))
>>> +    (delete-region beg end)
>>> +    (insert escaped)))
>> 
>> I'd rather not autoload such a function.

> Do you mean that it should be loaded all the time, or that the user should
> have to explicitly load xml.el before using the function?

Yes.

> If the latter, then that would make binding it to a key
> less convenient.

Hmm... didn't notice you defined it as a command.  How often/when do you
need to use/bind such a command other than in an sgml/xml-related file
(where the major mode might decide to preload such a command)?

>> But more importantly, this implementation is very inefficient.
>> xml-escape-string itself is rather inefficient except for short
>> strings; this is OK for its current uses, but for xml-escape-region
>> it's definitely not good (i.e. only usable for small regions).

> How's this? It's O(N) in the amount of text escaped.

Much better, thank you.

>   (let ((search-re (mapconcat #'regexp-quote
>                               (mapcar #'cdr xml-entity-alist)
>                               "\\|"))

Rather than a big \| of single chars, why not make a [...] regexp?
If you use regexp-opt, it should happen automatically.

Actually, now that I look at it, xml-entity-alist is poorly defined.
Instead of being a list of pairs of string and string (where the second
string is always of size 1), it should be a list of pairs of string
and char.  Also this code is also applicable to sgml and there's related
code in sgml-mode.el.  If someone wants to consolidate, that would
be welcome.

>     (save-excursion
>       (goto-char beg)
>       (while (re-search-forward search-re end t)
>         (replace-match (concat "&"
>                                (car (rassoc (match-string 0)
>                                             xml-entity-alist))
>                                ";"))))))

If you use a backward-search, you don't need to turn `end' (nor `start')
into a marker.


        Stefan




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xml-escape-region
  2009-10-08  5:29     ` Stefan Monnier
@ 2009-10-08  6:01       ` Daniel Colascione
  2009-10-08  6:03         ` Daniel Colascione
  2009-10-09 19:11         ` Stefan Monnier
  0 siblings, 2 replies; 7+ messages in thread
From: Daniel Colascione @ 2009-10-08  6:01 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs-Devel devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 8, 2009, at 1:29 AM, Stefan Monnier wrote:

>>>> +;;;##autoload
>>>> +(defun xml-escape-region (beg end)
>>>> +  (interactive "*r")
>>>> +  (let ((escaped (xml-escape-string (buffer-substring beg end))))
>>>> +    (delete-region beg end)
>>>> +    (insert escaped)))
>>>
>>> I'd rather not autoload such a function.
>
>> Do you mean that it should be loaded all the time, or that the user  
>> should
>> have to explicitly load xml.el before using the function?
>
> Yes.
>
>> If the latter, then that would make binding it to a key
>> less convenient.
>
> Hmm... didn't notice you defined it as a command.  How often/when do  
> you
> need to use/bind such a command other than in an sgml/xml-related file
> (where the major mode might decide to preload such a command)?

Pretty often, actually. XML (or XML-like syntax) crops up in a lot of  
places, including
literal strings in many programming languages. Some basic XML-editing  
functionality being available everywhere would be useful.
>
>
>>  (let ((search-re (mapconcat #'regexp-quote
>>                              (mapcar #'cdr xml-entity-alist)
>>                              "\\|"))
>
> Rather than a big \| of single chars, why not make a [...] regexp?
> If you use regexp-opt, it should happen automatically.

I figured the constant-factor overhead of regexp-opt (and its  
autoloading) wasn't worth it for such a simple regexp.


> Actually, now that I look at it, xml-entity-alist is poorly defined.
> Instead of being a list of pairs of string and string (where the  
> second
> string is always of size 1), it should be a list of pairs of string
> and char.

I think the idea was to be able to replace multi-character strings  
with XML entities defined for the current document.

> Also this code is also applicable to sgml and there's related
> code in sgml-mode.el.  If someone wants to consolidate, that would
> be welcome.

Does anyone actually use the unquotep parameter? It seems like quoting  
and unquoting should be separate functions. Nevertheless, the patch  
below should preserve existing behavior. I've also renamed the XML  
functions to better match existing code, e.g., base64.

>
>>    (save-excursion
>>      (goto-char beg)
>>      (while (re-search-forward search-re end t)
>>        (replace-match (concat "&"
>>                               (car (rassoc (match-string 0)
>>                                            xml-entity-alist))
>>                               ";"))))))
>
> If you use a backward-search, you don't need to turn `end' (nor  
> `start')
> into a marker.

Good idea.

Index: xml.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/xml.el,v
retrieving revision 1.64
diff -u -r1.64 xml.el
- --- xml.el	5 Jan 2009 03:19:57 -0000	1.64
+++ xml.el	8 Oct 2009 05:58:20 -0000
@@ -840,6 +840,40 @@

  (defalias 'xml-print 'xml-debug-print)

+
+;;;###autoload
+(defun xml-encode-region (start end)
+  "XML-escape text between START and END according to `xml-entity- 
alist`."
+  (interactive "*r")
+
+  (let ((search-re (mapconcat #'regexp-quote
+                              (mapcar #'cdr xml-entity-alist)
+                              "\\|")))
+    (save-excursion
+      (goto-char end)
+      (while (re-search-backward search-re start t)
+        (replace-match (concat "&"
+                               (car (rassoc (match-string 0)
+                                            xml-entity-alist))
+                               ";"))
+        (goto-char (match-beginning 0))))))
+
+;;;###autoload
+(defun xml-decode-region (start end)
+  "Decode XML entities between START and END according to `xml-entity- 
alist`."
+  (interactive "*r")
+  (let ((search-re (concat "&\\("
+                           (mapconcat #'regexp-quote
+                                      (mapcar #'car xml-entity-alist)
+                                      "\\|")
+                           "\\);")))
+
+    (save-excursion
+      (goto-char end)
+      (while (re-search-backward search-re start t)
+        (replace-match (cdr (assoc (match-string 1) xml-entity-alist)))
+        (goto-char (match-beginning 0))))))
+
  (defun xml-escape-string (string)
    "Return the string with entity substitutions made from
  xml-entity-alist."
Index: textmodes/sgml-mode.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/textmodes/sgml-mode.el,v
retrieving revision 1.141
diff -u -r1.141 sgml-mode.el
- --- textmodes/sgml-mode.el	24 Sep 2009 23:22:20 -0000	1.141
+++ textmodes/sgml-mode.el	8 Oct 2009 05:58:21 -0000
@@ -1097,21 +1097,10 @@
  Only &, < and > are quoted, the rest is left untouched.
  With prefix argument UNQUOTEP, unquote the region."
    (interactive "r\nP")
- -  (save-restriction
- -    (narrow-to-region start end)
- -    (goto-char (point-min))
- -    (if unquotep
- -	;; FIXME: We should unquote other named character references as well.
- -	(while (re-search-forward
- -		"\\(&\\(amp\\|\\(l\\|\\(g\\)\\)t\\)\\)[][<>&;\n\t \"%!'(),/=?]"
- -		nil t)
- -	  (replace-match (if (match-end 4) ">" (if (match-end 3) "<" "&")) t t
- -			 nil (if (eq (char-before (match-end 0)) ?\;) 0 1)))
- -      (while (re-search-forward "[&<>]" nil t)
- -	(replace-match (cdr (assq (char-before) '((?& . "&amp;")
- -						  (?< . "&lt;")
- -						  (?> . "&gt;"))))
- -		       t t)))))
+  (if unquote
+      (xml-decode-region start end)
+    (xml-encode-region start end)))
+

  (defun sgml-pretty-print (beg end)
    "Simple-minded pretty printer for SGML.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkrNgCEACgkQ17c2LVA10VuCmwCgpTFUg4oshpxAW+MZI1jDunWv
K4cAn2HqioVa34YnU63cMneytXV10Bby
=DYnh
-----END PGP SIGNATURE-----





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xml-escape-region
  2009-10-08  6:01       ` Daniel Colascione
@ 2009-10-08  6:03         ` Daniel Colascione
  2009-10-09 19:11         ` Stefan Monnier
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Colascione @ 2009-10-08  6:03 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: Stefan Monnier, Emacs-Devel devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 8, 2009, at 2:01 AM, Daniel Colascione wrote:
>
> - -    (if unquotep
> - -	;; FIXME: We should unquote other named character references as  
> well.
> - -	(while (re-search-forward

My mail client really needs to stop doing that.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkrNgLsACgkQ17c2LVA10Vs1bACgiD2RB5negFUpyfa13iUUbPG+
0uAAoMd8XjS1Q3UDzbpSsFlRvteB5tKa
=6Mnb
-----END PGP SIGNATURE-----





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xml-escape-region
  2009-10-08  6:01       ` Daniel Colascione
  2009-10-08  6:03         ` Daniel Colascione
@ 2009-10-09 19:11         ` Stefan Monnier
  1 sibling, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2009-10-09 19:11 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: Emacs-Devel devel

> Pretty often, actually. XML (or XML-like syntax) crops up in a lot of
> places, including
> literal strings in many programming languages. Some basic XML-editing
> functionality being available everywhere would be useful.

OK, then.

>>> (let ((search-re (mapconcat #'regexp-quote
>>> (mapcar #'cdr xml-entity-alist)
>>> "\\|"))
>> Rather than a big \| of single chars, why not make a [...] regexp?
>> If you use regexp-opt, it should happen automatically.
> I figured the constant-factor overhead of regexp-opt (and its autoloading)
> wasn't worth it for such a simple regexp.

Not sure if it makes a difference, indeed.

>> Actually, now that I look at it, xml-entity-alist is poorly defined.
>> Instead of being a list of pairs of string and string (where the second
>> string is always of size 1), it should be a list of pairs of string
>> and char.
> I think the idea was to be able to replace multi-character strings with XML
> entities defined for the current document.

All the examples I saw were single-char strings.  If indeed multi-char
strings can be used there, then the current definition is fine, indeed,
but then xml-escape-string is not.

> Does anyone actually use the unquotep parameter? It seems like quoting and
> unquoting should be separate functions.

Separate functions yes, but I think it makes sense to provide both under
the same key-binding, hence a single command.  This said, I never use
either, so I don't have a strong opinion.

> Nevertheless, the patch below should preserve existing behavior. I've
> also renamed the XML functions to better match existing code,
> e.g., base64.

Looks good,


        Stefan




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-10-09 19:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-07 18:56 [PATCH] xml-escape-region Daniel Colascione
2009-10-07 22:10 ` Stefan Monnier
2009-10-08  2:13   ` Daniel Colascione
2009-10-08  5:29     ` Stefan Monnier
2009-10-08  6:01       ` Daniel Colascione
2009-10-08  6:03         ` Daniel Colascione
2009-10-09 19:11         ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).