From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] xml-escape-region Date: Thu, 8 Oct 2009 02:01:05 -0400 Message-ID: <76BA010B-EFE0-48CA-BD43-B3CB63CDDAFF@merrillpress.com> References: <200910071456.31966.danc@merrillprint.com> <18A0FD1E-DAFE-4058-B6FC-630750EBBCEA@merrillpress.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (Apple Message framework v936) Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1254981698 12976 80.91.229.12 (8 Oct 2009 06:01:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 8 Oct 2009 06:01:38 +0000 (UTC) Cc: Emacs-Devel devel To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Oct 08 08:01:27 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Mvm47-0002R4-BU for ged-emacs-devel@m.gmane.org; Thu, 08 Oct 2009 08:01:27 +0200 Original-Received: from localhost ([127.0.0.1]:37660 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mvm46-0000sP-O8 for ged-emacs-devel@m.gmane.org; Thu, 08 Oct 2009 02:01:26 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Mvm3x-0000qm-UL for emacs-devel@gnu.org; Thu, 08 Oct 2009 02:01:17 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Mvm3r-0000nk-J1 for emacs-devel@gnu.org; Thu, 08 Oct 2009 02:01:15 -0400 Original-Received: from [199.232.76.173] (port=43090 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mvm3r-0000nh-0t for emacs-devel@gnu.org; Thu, 08 Oct 2009 02:01:11 -0400 Original-Received: from vpn.merrillpress.com ([64.61.107.78]:45170) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Mvm3q-0002kw-9F for emacs-devel@gnu.org; Thu, 08 Oct 2009 02:01:10 -0400 Original-Received: from cpe-67-246-181-235.buffalo.res.rr.com ([67.246.181.235] helo=[192.168.1.103]) by vpn.merrillpress.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1Mvm3p-0004Ak-0R; Thu, 08 Oct 2009 02:01:09 -0400 In-Reply-To: X-Pgp-Agent: GPGMail 1.2.0 (v56) X-Mailer: Apple Mail (2.936) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:115979 Archived-At: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 8, 2009, at 1:29 AM, Stefan Monnier wrote: >>>> +;;;##autoload >>>> +(defun xml-escape-region (beg end) >>>> + (interactive "*r") >>>> + (let ((escaped (xml-escape-string (buffer-substring beg end)))) >>>> + (delete-region beg end) >>>> + (insert escaped))) >>> >>> I'd rather not autoload such a function. > >> Do you mean that it should be loaded all the time, or that the user =20= >> should >> have to explicitly load xml.el before using the function? > > Yes. > >> If the latter, then that would make binding it to a key >> less convenient. > > Hmm... didn't notice you defined it as a command. How often/when do =20= > you > need to use/bind such a command other than in an sgml/xml-related file > (where the major mode might decide to preload such a command)? Pretty often, actually. XML (or XML-like syntax) crops up in a lot of =20= places, including literal strings in many programming languages. Some basic XML-editing =20= functionality being available everywhere would be useful. > > >> (let ((search-re (mapconcat #'regexp-quote >> (mapcar #'cdr xml-entity-alist) >> "\\|")) > > Rather than a big \| of single chars, why not make a [...] regexp? > If you use regexp-opt, it should happen automatically. I figured the constant-factor overhead of regexp-opt (and its =20 autoloading) wasn't worth it for such a simple regexp. > Actually, now that I look at it, xml-entity-alist is poorly defined. > Instead of being a list of pairs of string and string (where the =20 > second > string is always of size 1), it should be a list of pairs of string > and char. I think the idea was to be able to replace multi-character strings =20 with XML entities defined for the current document. > Also this code is also applicable to sgml and there's related > code in sgml-mode.el. If someone wants to consolidate, that would > be welcome. Does anyone actually use the unquotep parameter? It seems like quoting =20= and unquoting should be separate functions. Nevertheless, the patch =20 below should preserve existing behavior. I've also renamed the XML =20 functions to better match existing code, e.g., base64. > >> (save-excursion >> (goto-char beg) >> (while (re-search-forward search-re end t) >> (replace-match (concat "&" >> (car (rassoc (match-string 0) >> xml-entity-alist)) >> ";")))))) > > If you use a backward-search, you don't need to turn `end' (nor =20 > `start') > into a marker. Good idea. Index: xml.el =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /sources/emacs/emacs/lisp/xml.el,v retrieving revision 1.64 diff -u -r1.64 xml.el - --- xml.el 5 Jan 2009 03:19:57 -0000 1.64 +++ xml.el 8 Oct 2009 05:58:20 -0000 @@ -840,6 +840,40 @@ (defalias 'xml-print 'xml-debug-print) + +;;;###autoload +(defun xml-encode-region (start end) + "XML-escape text between START and END according to `xml-entity-=20 alist`." + (interactive "*r") + + (let ((search-re (mapconcat #'regexp-quote + (mapcar #'cdr xml-entity-alist) + "\\|"))) + (save-excursion + (goto-char end) + (while (re-search-backward search-re start t) + (replace-match (concat "&" + (car (rassoc (match-string 0) + xml-entity-alist)) + ";")) + (goto-char (match-beginning 0)))))) + +;;;###autoload +(defun xml-decode-region (start end) + "Decode XML entities between START and END according to `xml-entity-=20= alist`." + (interactive "*r") + (let ((search-re (concat "&\\(" + (mapconcat #'regexp-quote + (mapcar #'car xml-entity-alist) + "\\|") + "\\);"))) + + (save-excursion + (goto-char end) + (while (re-search-backward search-re start t) + (replace-match (cdr (assoc (match-string 1) xml-entity-alist))) + (goto-char (match-beginning 0)))))) + (defun xml-escape-string (string) "Return the string with entity substitutions made from xml-entity-alist." Index: textmodes/sgml-mode.el =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /sources/emacs/emacs/lisp/textmodes/sgml-mode.el,v retrieving revision 1.141 diff -u -r1.141 sgml-mode.el - --- textmodes/sgml-mode.el 24 Sep 2009 23:22:20 -0000 1.141 +++ textmodes/sgml-mode.el 8 Oct 2009 05:58:21 -0000 @@ -1097,21 +1097,10 @@ Only &, < and > are quoted, the rest is left untouched. With prefix argument UNQUOTEP, unquote the region." (interactive "r\nP") - - (save-restriction - - (narrow-to-region start end) - - (goto-char (point-min)) - - (if unquotep - - ;; FIXME: We should unquote other named character references as = well. - - (while (re-search-forward - - "\\(&\\(amp\\|\\(l\\|\\(g\\)\\)t\\)\\)[][<>&;\n\t = \"%!'(),/=3D?]" - - nil t) - - (replace-match (if (match-end 4) ">" (if (match-end 3) "<" = "&")) t t - - nil (if (eq (char-before (match-end 0)) ?\;) 0 = 1))) - - (while (re-search-forward "[&<>]" nil t) - - (replace-match (cdr (assq (char-before) '((?& . "&") - - (?< . "<") - - (?> . ">")))) - - t t))))) + (if unquote + (xml-decode-region start end) + (xml-encode-region start end))) + (defun sgml-pretty-print (beg end) "Simple-minded pretty printer for SGML. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkrNgCEACgkQ17c2LVA10VuCmwCgpTFUg4oshpxAW+MZI1jDunWv K4cAn2HqioVa34YnU63cMneytXV10Bby =3DDYnh -----END PGP SIGNATURE-----