From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Martin Schwamberger" Newsgroups: gmane.emacs.bugs Subject: nested comments in sgml-mode are not properly quoted. Date: Wed, 29 Jan 2003 23:19:38 +0100 Sender: bug-gnu-emacs-bounces+gnu-bug-gnu-emacs=m.gmane.org@gnu.org Message-ID: <3E38618A.10115.30C31B@localhost> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Trace: main.gmane.org 1043878675 30481 80.91.224.249 (29 Jan 2003 22:17:55 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 29 Jan 2003 22:17:55 +0000 (UTC) Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18e0Wm-0007vP-00 for ; Wed, 29 Jan 2003 23:17:53 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18e0Xc-0002m0-03 for gnu-bug-gnu-emacs@m.gmane.org; Wed, 29 Jan 2003 17:18:44 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18e0VQ-0001WB-00 for bug-gnu-emacs@gnu.org; Wed, 29 Jan 2003 17:16:28 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18e0Uv-00018l-00 for bug-gnu-emacs@gnu.org; Wed, 29 Jan 2003 17:15:58 -0500 Original-Received: from smtp02.web.de ([217.72.192.151] helo=smtp.web.de) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18e0UR-0000vM-00 for bug-gnu-emacs@gnu.org; Wed, 29 Jan 2003 17:15:27 -0500 Original-Received: from [217.3.90.28] (helo=chef) by smtp.web.de with esmtp (WEB.DE(Exim) 4.93 #1) id 18e0UM-0006xa-00 for bug-gnu-emacs@gnu.org; Wed, 29 Jan 2003 23:15:22 +0100 Original-To: bug-gnu-emacs@gnu.org Priority: normal X-Mailer: Pegasus Mail for Windows (v4.02, DE v4.02 R1a) Content-description: Mail message body X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Bug reports for GNU Emacs, the Swiss army knife of text editors List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: bug-gnu-emacs-bounces+gnu-bug-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.bugs:4344 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:4344 Hi, I frequently use comment-region and I was really unhappy when I found, that I couldn't use it savely in sgml/xml mode, due to an already known quoting problem. Since I couldn't find any way to avoid the problem without changing the code, I decided to fix the bug in newcomment.el which was shipped with emacs 21.2.1. The original quoting algorithm inserts one or more backslashes between first and second character of the comment markers. This leads to <\!-- ..... -\-> for SGML/XML comments. Unfortunatly, the resulting -- sequence is not allowed within SGML comments (see http://www.w3.org/TR/REC-xml#sec-comments) My algorithm inserts backslashes after every character except the last if the marker is longer than one character. This leads to <\!\-\- ..... -\-\>, which is allowed within comments. I've tested it for SGML and C style comments. I've also played with pascal comments in order to see what happens with single char endcomment markers. Everything seems to work well. Since it does only require the backslash(es) after the first character when it unquotes, it is able to unquote comment markers quoted by prior versions. Here are my new versions of comment-quote-re and comment-quote-nested. I left the original lines as comments. Immediately after these comments, my code starts with ;; MS: and ends with ;; -------------------------------------------------------------------- (defun comment-quote-re (str unp) ;; -------------------------------------------------------------------- ;; (concat (regexp-quote (substring str 0 1)) ;; "\\\\" (if unp "+" "*") ;; (regexp-quote (substring str 1)))) ;; -------------------------------------------------------------------- ;; MS: (let ((i 1) (len (length str)) ;; Each backslash sequence is defined as subexpression ;; in order add or remove backslashes easily (see comment-quote-nested). (qre (concat (regexp-quote (substring str 0 1)) "\\(\\\\" (if unp "+" "*") "\\)"))) (while (< i len) (setq qre (concat qre (regexp-quote (substring str i (1+ i))) ;; No trailing backslash for strings longer than one char. ;; Even though UNP is true, Backslash is optional to remain compatible. (if (< (1+ i) len) "\\(\\\\*\\)"))) (setq i (1+ i))) qre)) ;; -------------------------------------------------------------------- (defun comment-quote-nested (cs ce unp) "Quote or unquote nested comments. If UNP is non-nil, unquote nested comment markers." (setq cs (comment-string-strip cs t t)) (setq ce (comment-string-strip ce t t)) (when (and comment-quote-nested (> (length ce) 0)) (let ((re (concat (comment-quote-re ce unp) "\\|" (comment-quote-re cs unp)))) (goto-char (point-min)) (while (re-search-forward re nil t) ;; -------------------------------------------------------------------- ;; (goto-char (match-beginning 0)) ;; (forward-char 1) ;; (if unp (delete-char 1) (insert "\\")) ;; -------------------------------------------------------------------- ;; MS: (let ((i (regexp-opt-depth re))) ;; For each subexpression (sequence of backslashes) (while (> i 0) (when (match-beginning i) (goto-char (match-beginning i)) (if unp ;; quoted? (if (> (match-end i) (match-beginning i)) (delete-char 1)) (insert "\\"))) (setq i (1- i)))) ;; -------------------------------------------------------------------- (when (= (length ce) 1) ;; If the comment-end is a single char, adding a \ after that ;; "first" char won't deactivate it, so we turn such a CE ;; into !CS. I.e. for pascal, we turn } into !{ (if (not unp) (when (string= (match-string 0) ce) (replace-match (concat "!" cs) t t)) (when (and (< (point-min) (match-beginning 0)) (string= (buffer-substring (1- (match-beginning 0)) (1- (match-end 0))) (concat "!" cs))) ;; -------------------------------------------------------------------- ;; (backward-char 2) ;; -------------------------------------------------------------------- ;; MS: (goto-char (1- (match-beginning 0))) ;; -------------------------------------------------------------------- (delete-char (- (match-end 0) (match-beginning 0))) (insert ce)))))))) I hope, this gives you at least a few useful ideas, Martin