From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Matthew Swift Newsgroups: gmane.emacs.bugs Subject: confusion over undocumented syntax-table features, font-lock and syntax-tables Date: Tue, 11 Feb 2003 00:08:20 -0500 Sender: bug-gnu-emacs-bounces+gnu-bug-gnu-emacs=m.gmane.org@gnu.org Message-ID: <200302110508.h1B58Kcu016866@beth.swift.xxx> NNTP-Posting-Host: main.gmane.org X-Trace: main.gmane.org 1044940004 19545 80.91.224.249 (11 Feb 2003 05:06:44 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Tue, 11 Feb 2003 05:06:44 +0000 (UTC) Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18iScz-00054x-00 for ; Tue, 11 Feb 2003 06:06:41 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18iSew-0007w2-04 for gnu-bug-gnu-emacs@m.gmane.org; Tue, 11 Feb 2003 00:08:42 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18iSeo-0007tG-00 for bug-gnu-emacs@gnu.org; Tue, 11 Feb 2003 00:08:34 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18iSem-0007nn-00 for bug-gnu-emacs@gnu.org; Tue, 11 Feb 2003 00:08:33 -0500 Original-Received: from pool-68-160-54-133.bos.east.verizon.net ([68.160.54.133] helo=beth.swift.xxx) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18iSel-0007eU-00 for bug-gnu-emacs@gnu.org; Tue, 11 Feb 2003 00:08:31 -0500 Original-Received: from alum.mit.edu (swift@localhost [127.0.0.1]) h1B58Kcu016866 for ; Tue, 11 Feb 2003 00:08:20 -0500 Original-To: bug-gnu-emacs@gnu.org X-Mailscanner: clean (beth.swift.xxx) X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Bug reports for GNU Emacs, the Swiss army knife of text editors List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: bug-gnu-emacs-bounces+gnu-bug-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.bugs:4427 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:4427 This bug report will be sent to the Free Software Foundation, not to your local site managers! Please write in English, because the Emacs maintainers do not have translators to read other languages for them. Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list, and to the gnu.emacs.bug news group. In GNU Emacs 21.2.1 (i386-debian-linux-gnu, X toolkit, Xaw3d scroll bars) of 2002-11-06 on beth, modified by Debian configured using `configure i386-debian-linux-gnu --prefix=/usr/local --sharedstatedir=/var/lib --libexecdir=/usr/local/lib --localstatedir=/var/lib --infodir=/usr/local/share/info --mandir=/usr/local/share/man --with-pop=yes --with-x=yes --with-x-toolkit=athena --without-gif' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: nil locale-coding-system: nil default-enable-multibyte-characters: t Please describe exactly what actions triggered the bug and the precise symptoms of the bug: I was observing a strange behavior in `sh-mode' defined in sh-script.el where (re-search-forward "\\s<\\s<") was failing even though it was passing over a buffer substring of two characters whose syntax classes, as reported by `(char-syntax (char-after N))' and N+1 was "<". I have not figured out why that happens, and it may not be a bug, but in my experiments, I have come across a barrel full of puzzles and questions. I am reporting as much as I have been able to distinguish. The results of the following code completely baffles me. Is global-font-lock-mode changing the syntax classes? -----cut here (setq test " hello () { echo world.; } ## boln is at buffer position 40 ") (defun test () (sh-mode) (message "result is %S" (if (and (equal "<" (char-to-string (char-syntax ?#))) (equal (char-after 40) ?#) (equal (char-after 41) ?#) (equal "<" (char-to-string (char-syntax (char-after 40)))) (equal "<" (char-to-string (char-syntax (char-after 41)))) ) (save-excursion (goto-char (point-min)) (re-search-forward "\\s<\\s<")) "whoops!"))) (progn (global-font-lock-mode 0) ;; succeeds (test)) (progn (global-font-lock-mode 1) ;; `re-search-forward' fails the SECOND time, if not the first (no ;; pattern found) (test)) ;;(sh-mode) ;;(emacs-lisp-mode) ;;(global-font-lock-mode) ;;(test) ---- end of test file The facility for matching chars in syntax descriptors is either not fully documented or has some other problems. Looking into it further would take more time than I have at the moment. sh-script.el says: (defvar sh-mode-syntax-table '((sh eval sh-mode-syntax-table () ?\# "<" ?\n ">#" ?\" "\"\"" ?\' "\"'" ?\` "\"`" ?! "_" ?% "_" ?: "_" ?. "_" ?^ "_" ?~ "_" ?< "." ?> ".") (csh eval identity sh) (rc eval identity sh)) "Syntax-table used in Shell-Script mode. See `sh-feature'.") Consider the second entry in the table, which is the equivalent of (modify-syntax-entry ?\n ">#") The documentation for syntax descriptors says (both in TeXinfo and in functions' docstrings) that the second character, the matching character, is "used" only when the syntax class is "(" or ")" (open or close parentheses). The declaration above assigns a matching character to a character with the endcomment syntax class. The documentation does not say doing this is an error. But from here, all possibilities imply one or more problems. (And I should observe that it seems that, furthermore, several major modes assign matching characters to chars in the string delimiter (") class (usually the same one, e.g., " with " and ' with '); this usage is likewise problematic.) If the declaration of ">#" is equivalent to ">", with respect to all Emacs primitives and distributed Lisp code, then + sh-script.el should use simply ">" for clarity. It may be desirable to leave in a facility for assigning matching chars to non-paren classes, so that programmers can do something with it. If so, brief mention should be made in the TeXinfo documentation, if not the docstrings. If not, then + it should be documented that matching chars are ignored except for the "(" and ")" classes; + `modify-syntax-entry' should decline to install ignored matching chars by either signalling an error or by silently deleting the matching char; + `describe-syntax' should decline to report matching chars that do not have any significance, because reporting them is confusing (`describe-syntax' will report that ?\n matches ?#, and likewise if you assign matching chars to chars in other syntax classes for which matching seems irrelevant). If the declaration of ">#" is not equivalent to ">", then either the behavior is undefined or it is well-defined but not documented. If it is undefined, then sh-script.el should not be using it. If it is undocumented, then it should be documented. Recent input: M-x r e p o r t - e m a c s - b u g Recent messages: 1 <- require: gnus-group 1 -> require: gnus-start 1 <- require: gnus-start 1 -> require: gnus-util 1 <- require: gnus-util Loading gnus-topic...done Loading emacsbug... 1 -> require: sendmail 1 <- require: sendmail Loading emacsbug...done