unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Daniel Colascione <danc@merrillpress.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Emacs-Devel devel <emacs-devel@gnu.org>
Subject: Re: [PATCH] xml-escape-region
Date: Thu, 8 Oct 2009 02:01:05 -0400	[thread overview]
Message-ID: <76BA010B-EFE0-48CA-BD43-B3CB63CDDAFF@merrillpress.com> (raw)
In-Reply-To: <jwv1vlea2eq.fsf-monnier+emacs@gnu.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 8, 2009, at 1:29 AM, Stefan Monnier wrote:

>>>> +;;;##autoload
>>>> +(defun xml-escape-region (beg end)
>>>> +  (interactive "*r")
>>>> +  (let ((escaped (xml-escape-string (buffer-substring beg end))))
>>>> +    (delete-region beg end)
>>>> +    (insert escaped)))
>>>
>>> I'd rather not autoload such a function.
>
>> Do you mean that it should be loaded all the time, or that the user  
>> should
>> have to explicitly load xml.el before using the function?
>
> Yes.
>
>> If the latter, then that would make binding it to a key
>> less convenient.
>
> Hmm... didn't notice you defined it as a command.  How often/when do  
> you
> need to use/bind such a command other than in an sgml/xml-related file
> (where the major mode might decide to preload such a command)?

Pretty often, actually. XML (or XML-like syntax) crops up in a lot of  
places, including
literal strings in many programming languages. Some basic XML-editing  
functionality being available everywhere would be useful.
>
>
>>  (let ((search-re (mapconcat #'regexp-quote
>>                              (mapcar #'cdr xml-entity-alist)
>>                              "\\|"))
>
> Rather than a big \| of single chars, why not make a [...] regexp?
> If you use regexp-opt, it should happen automatically.

I figured the constant-factor overhead of regexp-opt (and its  
autoloading) wasn't worth it for such a simple regexp.


> Actually, now that I look at it, xml-entity-alist is poorly defined.
> Instead of being a list of pairs of string and string (where the  
> second
> string is always of size 1), it should be a list of pairs of string
> and char.

I think the idea was to be able to replace multi-character strings  
with XML entities defined for the current document.

> Also this code is also applicable to sgml and there's related
> code in sgml-mode.el.  If someone wants to consolidate, that would
> be welcome.

Does anyone actually use the unquotep parameter? It seems like quoting  
and unquoting should be separate functions. Nevertheless, the patch  
below should preserve existing behavior. I've also renamed the XML  
functions to better match existing code, e.g., base64.

>
>>    (save-excursion
>>      (goto-char beg)
>>      (while (re-search-forward search-re end t)
>>        (replace-match (concat "&"
>>                               (car (rassoc (match-string 0)
>>                                            xml-entity-alist))
>>                               ";"))))))
>
> If you use a backward-search, you don't need to turn `end' (nor  
> `start')
> into a marker.

Good idea.

Index: xml.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/xml.el,v
retrieving revision 1.64
diff -u -r1.64 xml.el
- --- xml.el	5 Jan 2009 03:19:57 -0000	1.64
+++ xml.el	8 Oct 2009 05:58:20 -0000
@@ -840,6 +840,40 @@

  (defalias 'xml-print 'xml-debug-print)

+
+;;;###autoload
+(defun xml-encode-region (start end)
+  "XML-escape text between START and END according to `xml-entity- 
alist`."
+  (interactive "*r")
+
+  (let ((search-re (mapconcat #'regexp-quote
+                              (mapcar #'cdr xml-entity-alist)
+                              "\\|")))
+    (save-excursion
+      (goto-char end)
+      (while (re-search-backward search-re start t)
+        (replace-match (concat "&"
+                               (car (rassoc (match-string 0)
+                                            xml-entity-alist))
+                               ";"))
+        (goto-char (match-beginning 0))))))
+
+;;;###autoload
+(defun xml-decode-region (start end)
+  "Decode XML entities between START and END according to `xml-entity- 
alist`."
+  (interactive "*r")
+  (let ((search-re (concat "&\\("
+                           (mapconcat #'regexp-quote
+                                      (mapcar #'car xml-entity-alist)
+                                      "\\|")
+                           "\\);")))
+
+    (save-excursion
+      (goto-char end)
+      (while (re-search-backward search-re start t)
+        (replace-match (cdr (assoc (match-string 1) xml-entity-alist)))
+        (goto-char (match-beginning 0))))))
+
  (defun xml-escape-string (string)
    "Return the string with entity substitutions made from
  xml-entity-alist."
Index: textmodes/sgml-mode.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/textmodes/sgml-mode.el,v
retrieving revision 1.141
diff -u -r1.141 sgml-mode.el
- --- textmodes/sgml-mode.el	24 Sep 2009 23:22:20 -0000	1.141
+++ textmodes/sgml-mode.el	8 Oct 2009 05:58:21 -0000
@@ -1097,21 +1097,10 @@
  Only &, < and > are quoted, the rest is left untouched.
  With prefix argument UNQUOTEP, unquote the region."
    (interactive "r\nP")
- -  (save-restriction
- -    (narrow-to-region start end)
- -    (goto-char (point-min))
- -    (if unquotep
- -	;; FIXME: We should unquote other named character references as well.
- -	(while (re-search-forward
- -		"\\(&\\(amp\\|\\(l\\|\\(g\\)\\)t\\)\\)[][<>&;\n\t \"%!'(),/=?]"
- -		nil t)
- -	  (replace-match (if (match-end 4) ">" (if (match-end 3) "<" "&")) t t
- -			 nil (if (eq (char-before (match-end 0)) ?\;) 0 1)))
- -      (while (re-search-forward "[&<>]" nil t)
- -	(replace-match (cdr (assq (char-before) '((?& . "&amp;")
- -						  (?< . "&lt;")
- -						  (?> . "&gt;"))))
- -		       t t)))))
+  (if unquote
+      (xml-decode-region start end)
+    (xml-encode-region start end)))
+

  (defun sgml-pretty-print (beg end)
    "Simple-minded pretty printer for SGML.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkrNgCEACgkQ17c2LVA10VuCmwCgpTFUg4oshpxAW+MZI1jDunWv
K4cAn2HqioVa34YnU63cMneytXV10Bby
=DYnh
-----END PGP SIGNATURE-----





  reply	other threads:[~2009-10-08  6:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-07 18:56 [PATCH] xml-escape-region Daniel Colascione
2009-10-07 22:10 ` Stefan Monnier
2009-10-08  2:13   ` Daniel Colascione
2009-10-08  5:29     ` Stefan Monnier
2009-10-08  6:01       ` Daniel Colascione [this message]
2009-10-08  6:03         ` Daniel Colascione
2009-10-09 19:11         ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=76BA010B-EFE0-48CA-BD43-B3CB63CDDAFF@merrillpress.com \
    --to=danc@merrillpress.com \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).