From: Daniel Colascione <danc@merrillpress.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Emacs-Devel devel <emacs-devel@gnu.org>
Subject: Re: [PATCH] xml-escape-region
Date: Thu, 8 Oct 2009 02:01:05 -0400 [thread overview]
Message-ID: <76BA010B-EFE0-48CA-BD43-B3CB63CDDAFF@merrillpress.com> (raw)
In-Reply-To: <jwv1vlea2eq.fsf-monnier+emacs@gnu.org>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Oct 8, 2009, at 1:29 AM, Stefan Monnier wrote:
>>>> +;;;##autoload
>>>> +(defun xml-escape-region (beg end)
>>>> + (interactive "*r")
>>>> + (let ((escaped (xml-escape-string (buffer-substring beg end))))
>>>> + (delete-region beg end)
>>>> + (insert escaped)))
>>>
>>> I'd rather not autoload such a function.
>
>> Do you mean that it should be loaded all the time, or that the user
>> should
>> have to explicitly load xml.el before using the function?
>
> Yes.
>
>> If the latter, then that would make binding it to a key
>> less convenient.
>
> Hmm... didn't notice you defined it as a command. How often/when do
> you
> need to use/bind such a command other than in an sgml/xml-related file
> (where the major mode might decide to preload such a command)?
Pretty often, actually. XML (or XML-like syntax) crops up in a lot of
places, including
literal strings in many programming languages. Some basic XML-editing
functionality being available everywhere would be useful.
>
>
>> (let ((search-re (mapconcat #'regexp-quote
>> (mapcar #'cdr xml-entity-alist)
>> "\\|"))
>
> Rather than a big \| of single chars, why not make a [...] regexp?
> If you use regexp-opt, it should happen automatically.
I figured the constant-factor overhead of regexp-opt (and its
autoloading) wasn't worth it for such a simple regexp.
> Actually, now that I look at it, xml-entity-alist is poorly defined.
> Instead of being a list of pairs of string and string (where the
> second
> string is always of size 1), it should be a list of pairs of string
> and char.
I think the idea was to be able to replace multi-character strings
with XML entities defined for the current document.
> Also this code is also applicable to sgml and there's related
> code in sgml-mode.el. If someone wants to consolidate, that would
> be welcome.
Does anyone actually use the unquotep parameter? It seems like quoting
and unquoting should be separate functions. Nevertheless, the patch
below should preserve existing behavior. I've also renamed the XML
functions to better match existing code, e.g., base64.
>
>> (save-excursion
>> (goto-char beg)
>> (while (re-search-forward search-re end t)
>> (replace-match (concat "&"
>> (car (rassoc (match-string 0)
>> xml-entity-alist))
>> ";"))))))
>
> If you use a backward-search, you don't need to turn `end' (nor
> `start')
> into a marker.
Good idea.
Index: xml.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/xml.el,v
retrieving revision 1.64
diff -u -r1.64 xml.el
- --- xml.el 5 Jan 2009 03:19:57 -0000 1.64
+++ xml.el 8 Oct 2009 05:58:20 -0000
@@ -840,6 +840,40 @@
(defalias 'xml-print 'xml-debug-print)
+
+;;;###autoload
+(defun xml-encode-region (start end)
+ "XML-escape text between START and END according to `xml-entity-
alist`."
+ (interactive "*r")
+
+ (let ((search-re (mapconcat #'regexp-quote
+ (mapcar #'cdr xml-entity-alist)
+ "\\|")))
+ (save-excursion
+ (goto-char end)
+ (while (re-search-backward search-re start t)
+ (replace-match (concat "&"
+ (car (rassoc (match-string 0)
+ xml-entity-alist))
+ ";"))
+ (goto-char (match-beginning 0))))))
+
+;;;###autoload
+(defun xml-decode-region (start end)
+ "Decode XML entities between START and END according to `xml-entity-
alist`."
+ (interactive "*r")
+ (let ((search-re (concat "&\\("
+ (mapconcat #'regexp-quote
+ (mapcar #'car xml-entity-alist)
+ "\\|")
+ "\\);")))
+
+ (save-excursion
+ (goto-char end)
+ (while (re-search-backward search-re start t)
+ (replace-match (cdr (assoc (match-string 1) xml-entity-alist)))
+ (goto-char (match-beginning 0))))))
+
(defun xml-escape-string (string)
"Return the string with entity substitutions made from
xml-entity-alist."
Index: textmodes/sgml-mode.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/textmodes/sgml-mode.el,v
retrieving revision 1.141
diff -u -r1.141 sgml-mode.el
- --- textmodes/sgml-mode.el 24 Sep 2009 23:22:20 -0000 1.141
+++ textmodes/sgml-mode.el 8 Oct 2009 05:58:21 -0000
@@ -1097,21 +1097,10 @@
Only &, < and > are quoted, the rest is left untouched.
With prefix argument UNQUOTEP, unquote the region."
(interactive "r\nP")
- - (save-restriction
- - (narrow-to-region start end)
- - (goto-char (point-min))
- - (if unquotep
- - ;; FIXME: We should unquote other named character references as well.
- - (while (re-search-forward
- - "\\(&\\(amp\\|\\(l\\|\\(g\\)\\)t\\)\\)[][<>&;\n\t \"%!'(),/=?]"
- - nil t)
- - (replace-match (if (match-end 4) ">" (if (match-end 3) "<" "&")) t t
- - nil (if (eq (char-before (match-end 0)) ?\;) 0 1)))
- - (while (re-search-forward "[&<>]" nil t)
- - (replace-match (cdr (assq (char-before) '((?& . "&")
- - (?< . "<")
- - (?> . ">"))))
- - t t)))))
+ (if unquote
+ (xml-decode-region start end)
+ (xml-encode-region start end)))
+
(defun sgml-pretty-print (beg end)
"Simple-minded pretty printer for SGML.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
iEYEARECAAYFAkrNgCEACgkQ17c2LVA10VuCmwCgpTFUg4oshpxAW+MZI1jDunWv
K4cAn2HqioVa34YnU63cMneytXV10Bby
=DYnh
-----END PGP SIGNATURE-----
next prev parent reply other threads:[~2009-10-08 6:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-07 18:56 [PATCH] xml-escape-region Daniel Colascione
2009-10-07 22:10 ` Stefan Monnier
2009-10-08 2:13 ` Daniel Colascione
2009-10-08 5:29 ` Stefan Monnier
2009-10-08 6:01 ` Daniel Colascione [this message]
2009-10-08 6:03 ` Daniel Colascione
2009-10-09 19:11 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=76BA010B-EFE0-48CA-BD43-B3CB63CDDAFF@merrillpress.com \
--to=danc@merrillpress.com \
--cc=emacs-devel@gnu.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).