* [PATCH] xml-escape-region @ 2009-10-07 18:56 Daniel Colascione 2009-10-07 22:10 ` Stefan Monnier 0 siblings, 1 reply; 7+ messages in thread From: Daniel Colascione @ 2009-10-07 18:56 UTC (permalink / raw) To: emacs-devel Index: xml.el =================================================================== RCS file: /cvsroot/emacs/emacs/lisp/xml.el,v retrieving revision 1.64 diff -u -r1.64 xml.el --- xml.el 5 Jan 2009 03:19:57 -0000 1.64 +++ xml.el 7 Oct 2009 18:56:08 -0000 @@ -852,6 +852,13 @@ ;; grabbing the string works here. string "")) +;;;##autoload +(defun xml-escape-region (beg end) + (interactive "*r") + (let ((escaped (xml-escape-string (buffer-substring beg end)))) + (delete-region beg end) + (insert escaped))) + (defun xml-debug-print-internal (xml indent-string) "Outputs the XML tree in the current buffer. The first line is indented with INDENT-STRING." ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xml-escape-region 2009-10-07 18:56 [PATCH] xml-escape-region Daniel Colascione @ 2009-10-07 22:10 ` Stefan Monnier 2009-10-08 2:13 ` Daniel Colascione 0 siblings, 1 reply; 7+ messages in thread From: Stefan Monnier @ 2009-10-07 22:10 UTC (permalink / raw) To: Daniel Colascione; +Cc: emacs-devel > +;;;##autoload > +(defun xml-escape-region (beg end) > + (interactive "*r") > + (let ((escaped (xml-escape-string (buffer-substring beg end)))) > + (delete-region beg end) > + (insert escaped))) I'd rather not autoload such a function. But more importantly, this implementation is very inefficient. xml-escape-string itself is rather inefficient except for short strings; this is OK for its current uses, but for xml-escape-region it's definitely not good (i.e. only usable for small regions). Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xml-escape-region 2009-10-07 22:10 ` Stefan Monnier @ 2009-10-08 2:13 ` Daniel Colascione 2009-10-08 5:29 ` Stefan Monnier 0 siblings, 1 reply; 7+ messages in thread From: Daniel Colascione @ 2009-10-08 2:13 UTC (permalink / raw) To: Stefan Monnier; +Cc: Emacs-Devel devel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 7, 2009, at 6:10 PM, Stefan Monnier wrote: >> +;;;##autoload >> +(defun xml-escape-region (beg end) >> + (interactive "*r") >> + (let ((escaped (xml-escape-string (buffer-substring beg end)))) >> + (delete-region beg end) >> + (insert escaped))) > > I'd rather not autoload such a function. Do you mean that it should be loaded all the time, or that the user should have to explicitly load xml.el before using the function? If the latter, then that would make binding it to a key less convenient. > But more importantly, this > implementation is very inefficient. xml-escape-string itself is > rather > inefficient except for short strings; this is OK for its current uses, > but for xml-escape-region it's definitely not good (i.e. only usable > for > small regions). How's this? It's O(N) in the amount of text escaped. (defun xml-escape-region (beg end) "XML-escape text between BEG and END according to `xml-entity- alist`." (interactive "*r") (let ((search-re (mapconcat #'regexp-quote (mapcar #'cdr xml-entity-alist) "\\|")) (end (set-marker (make-marker) end))) (save-excursion (goto-char beg) (while (re-search-forward search-re end t) (replace-match (concat "&" (car (rassoc (match-string 0) xml-entity-alist)) ";")))))) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkrNSscACgkQ17c2LVA10VvMXgCfcp9fGLBvgg2x/WvemODxe77H o4kAoM/oz33aci08TnzboRm4GTY6zskD =DnOn -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xml-escape-region 2009-10-08 2:13 ` Daniel Colascione @ 2009-10-08 5:29 ` Stefan Monnier 2009-10-08 6:01 ` Daniel Colascione 0 siblings, 1 reply; 7+ messages in thread From: Stefan Monnier @ 2009-10-08 5:29 UTC (permalink / raw) To: Daniel Colascione; +Cc: Emacs-Devel devel >>> +;;;##autoload >>> +(defun xml-escape-region (beg end) >>> + (interactive "*r") >>> + (let ((escaped (xml-escape-string (buffer-substring beg end)))) >>> + (delete-region beg end) >>> + (insert escaped))) >> >> I'd rather not autoload such a function. > Do you mean that it should be loaded all the time, or that the user should > have to explicitly load xml.el before using the function? Yes. > If the latter, then that would make binding it to a key > less convenient. Hmm... didn't notice you defined it as a command. How often/when do you need to use/bind such a command other than in an sgml/xml-related file (where the major mode might decide to preload such a command)? >> But more importantly, this implementation is very inefficient. >> xml-escape-string itself is rather inefficient except for short >> strings; this is OK for its current uses, but for xml-escape-region >> it's definitely not good (i.e. only usable for small regions). > How's this? It's O(N) in the amount of text escaped. Much better, thank you. > (let ((search-re (mapconcat #'regexp-quote > (mapcar #'cdr xml-entity-alist) > "\\|")) Rather than a big \| of single chars, why not make a [...] regexp? If you use regexp-opt, it should happen automatically. Actually, now that I look at it, xml-entity-alist is poorly defined. Instead of being a list of pairs of string and string (where the second string is always of size 1), it should be a list of pairs of string and char. Also this code is also applicable to sgml and there's related code in sgml-mode.el. If someone wants to consolidate, that would be welcome. > (save-excursion > (goto-char beg) > (while (re-search-forward search-re end t) > (replace-match (concat "&" > (car (rassoc (match-string 0) > xml-entity-alist)) > ";")))))) If you use a backward-search, you don't need to turn `end' (nor `start') into a marker. Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xml-escape-region 2009-10-08 5:29 ` Stefan Monnier @ 2009-10-08 6:01 ` Daniel Colascione 2009-10-08 6:03 ` Daniel Colascione 2009-10-09 19:11 ` Stefan Monnier 0 siblings, 2 replies; 7+ messages in thread From: Daniel Colascione @ 2009-10-08 6:01 UTC (permalink / raw) To: Stefan Monnier; +Cc: Emacs-Devel devel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 8, 2009, at 1:29 AM, Stefan Monnier wrote: >>>> +;;;##autoload >>>> +(defun xml-escape-region (beg end) >>>> + (interactive "*r") >>>> + (let ((escaped (xml-escape-string (buffer-substring beg end)))) >>>> + (delete-region beg end) >>>> + (insert escaped))) >>> >>> I'd rather not autoload such a function. > >> Do you mean that it should be loaded all the time, or that the user >> should >> have to explicitly load xml.el before using the function? > > Yes. > >> If the latter, then that would make binding it to a key >> less convenient. > > Hmm... didn't notice you defined it as a command. How often/when do > you > need to use/bind such a command other than in an sgml/xml-related file > (where the major mode might decide to preload such a command)? Pretty often, actually. XML (or XML-like syntax) crops up in a lot of places, including literal strings in many programming languages. Some basic XML-editing functionality being available everywhere would be useful. > > >> (let ((search-re (mapconcat #'regexp-quote >> (mapcar #'cdr xml-entity-alist) >> "\\|")) > > Rather than a big \| of single chars, why not make a [...] regexp? > If you use regexp-opt, it should happen automatically. I figured the constant-factor overhead of regexp-opt (and its autoloading) wasn't worth it for such a simple regexp. > Actually, now that I look at it, xml-entity-alist is poorly defined. > Instead of being a list of pairs of string and string (where the > second > string is always of size 1), it should be a list of pairs of string > and char. I think the idea was to be able to replace multi-character strings with XML entities defined for the current document. > Also this code is also applicable to sgml and there's related > code in sgml-mode.el. If someone wants to consolidate, that would > be welcome. Does anyone actually use the unquotep parameter? It seems like quoting and unquoting should be separate functions. Nevertheless, the patch below should preserve existing behavior. I've also renamed the XML functions to better match existing code, e.g., base64. > >> (save-excursion >> (goto-char beg) >> (while (re-search-forward search-re end t) >> (replace-match (concat "&" >> (car (rassoc (match-string 0) >> xml-entity-alist)) >> ";")))))) > > If you use a backward-search, you don't need to turn `end' (nor > `start') > into a marker. Good idea. Index: xml.el =================================================================== RCS file: /sources/emacs/emacs/lisp/xml.el,v retrieving revision 1.64 diff -u -r1.64 xml.el - --- xml.el 5 Jan 2009 03:19:57 -0000 1.64 +++ xml.el 8 Oct 2009 05:58:20 -0000 @@ -840,6 +840,40 @@ (defalias 'xml-print 'xml-debug-print) + +;;;###autoload +(defun xml-encode-region (start end) + "XML-escape text between START and END according to `xml-entity- alist`." + (interactive "*r") + + (let ((search-re (mapconcat #'regexp-quote + (mapcar #'cdr xml-entity-alist) + "\\|"))) + (save-excursion + (goto-char end) + (while (re-search-backward search-re start t) + (replace-match (concat "&" + (car (rassoc (match-string 0) + xml-entity-alist)) + ";")) + (goto-char (match-beginning 0)))))) + +;;;###autoload +(defun xml-decode-region (start end) + "Decode XML entities between START and END according to `xml-entity- alist`." + (interactive "*r") + (let ((search-re (concat "&\\(" + (mapconcat #'regexp-quote + (mapcar #'car xml-entity-alist) + "\\|") + "\\);"))) + + (save-excursion + (goto-char end) + (while (re-search-backward search-re start t) + (replace-match (cdr (assoc (match-string 1) xml-entity-alist))) + (goto-char (match-beginning 0)))))) + (defun xml-escape-string (string) "Return the string with entity substitutions made from xml-entity-alist." Index: textmodes/sgml-mode.el =================================================================== RCS file: /sources/emacs/emacs/lisp/textmodes/sgml-mode.el,v retrieving revision 1.141 diff -u -r1.141 sgml-mode.el - --- textmodes/sgml-mode.el 24 Sep 2009 23:22:20 -0000 1.141 +++ textmodes/sgml-mode.el 8 Oct 2009 05:58:21 -0000 @@ -1097,21 +1097,10 @@ Only &, < and > are quoted, the rest is left untouched. With prefix argument UNQUOTEP, unquote the region." (interactive "r\nP") - - (save-restriction - - (narrow-to-region start end) - - (goto-char (point-min)) - - (if unquotep - - ;; FIXME: We should unquote other named character references as well. - - (while (re-search-forward - - "\\(&\\(amp\\|\\(l\\|\\(g\\)\\)t\\)\\)[][<>&;\n\t \"%!'(),/=?]" - - nil t) - - (replace-match (if (match-end 4) ">" (if (match-end 3) "<" "&")) t t - - nil (if (eq (char-before (match-end 0)) ?\;) 0 1))) - - (while (re-search-forward "[&<>]" nil t) - - (replace-match (cdr (assq (char-before) '((?& . "&") - - (?< . "<") - - (?> . ">")))) - - t t))))) + (if unquote + (xml-decode-region start end) + (xml-encode-region start end))) + (defun sgml-pretty-print (beg end) "Simple-minded pretty printer for SGML. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkrNgCEACgkQ17c2LVA10VuCmwCgpTFUg4oshpxAW+MZI1jDunWv K4cAn2HqioVa34YnU63cMneytXV10Bby =DYnh -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xml-escape-region 2009-10-08 6:01 ` Daniel Colascione @ 2009-10-08 6:03 ` Daniel Colascione 2009-10-09 19:11 ` Stefan Monnier 1 sibling, 0 replies; 7+ messages in thread From: Daniel Colascione @ 2009-10-08 6:03 UTC (permalink / raw) To: Daniel Colascione; +Cc: Stefan Monnier, Emacs-Devel devel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 8, 2009, at 2:01 AM, Daniel Colascione wrote: > > - - (if unquotep > - - ;; FIXME: We should unquote other named character references as > well. > - - (while (re-search-forward My mail client really needs to stop doing that. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkrNgLsACgkQ17c2LVA10Vs1bACgiD2RB5negFUpyfa13iUUbPG+ 0uAAoMd8XjS1Q3UDzbpSsFlRvteB5tKa =6Mnb -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xml-escape-region 2009-10-08 6:01 ` Daniel Colascione 2009-10-08 6:03 ` Daniel Colascione @ 2009-10-09 19:11 ` Stefan Monnier 1 sibling, 0 replies; 7+ messages in thread From: Stefan Monnier @ 2009-10-09 19:11 UTC (permalink / raw) To: Daniel Colascione; +Cc: Emacs-Devel devel > Pretty often, actually. XML (or XML-like syntax) crops up in a lot of > places, including > literal strings in many programming languages. Some basic XML-editing > functionality being available everywhere would be useful. OK, then. >>> (let ((search-re (mapconcat #'regexp-quote >>> (mapcar #'cdr xml-entity-alist) >>> "\\|")) >> Rather than a big \| of single chars, why not make a [...] regexp? >> If you use regexp-opt, it should happen automatically. > I figured the constant-factor overhead of regexp-opt (and its autoloading) > wasn't worth it for such a simple regexp. Not sure if it makes a difference, indeed. >> Actually, now that I look at it, xml-entity-alist is poorly defined. >> Instead of being a list of pairs of string and string (where the second >> string is always of size 1), it should be a list of pairs of string >> and char. > I think the idea was to be able to replace multi-character strings with XML > entities defined for the current document. All the examples I saw were single-char strings. If indeed multi-char strings can be used there, then the current definition is fine, indeed, but then xml-escape-string is not. > Does anyone actually use the unquotep parameter? It seems like quoting and > unquoting should be separate functions. Separate functions yes, but I think it makes sense to provide both under the same key-binding, hence a single command. This said, I never use either, so I don't have a strong opinion. > Nevertheless, the patch below should preserve existing behavior. I've > also renamed the XML functions to better match existing code, > e.g., base64. Looks good, Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-10-09 19:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-07 18:56 [PATCH] xml-escape-region Daniel Colascione 2009-10-07 22:10 ` Stefan Monnier 2009-10-08 2:13 ` Daniel Colascione 2009-10-08 5:29 ` Stefan Monnier 2009-10-08 6:01 ` Daniel Colascione 2009-10-08 6:03 ` Daniel Colascione 2009-10-09 19:11 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).