all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Matthew Swift <swift@alum.mit.edu>
Subject: confusion over undocumented syntax-table features, font-lock and syntax-tables
Date: Tue, 11 Feb 2003 00:08:20 -0500	[thread overview]
Message-ID: <200302110508.h1B58Kcu016866@beth.swift.xxx> (raw)

This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

In GNU Emacs 21.2.1 (i386-debian-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2002-11-06 on beth, modified by Debian
configured using `configure  i386-debian-linux-gnu --prefix=/usr/local --sharedstatedir=/var/lib --libexecdir=/usr/local/lib --localstatedir=/var/lib --infodir=/usr/local/share/info --mandir=/usr/local/share/man --with-pop=yes --with-x=yes --with-x-toolkit=athena --without-gif'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: nil
  locale-coding-system: nil
  default-enable-multibyte-characters: t

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

I was observing a strange behavior in `sh-mode' defined in sh-script.el where
(re-search-forward "\\s<\\s<") was failing even though it was passing over a
buffer substring of two characters whose syntax classes, as reported by
`(char-syntax (char-after N))' and N+1 was "<".

I have not figured out why that happens, and it may not be a bug, but in my
experiments, I have come across a barrel full of puzzles and questions.  I am
reporting as much as I have been able to distinguish.

The results of the following code completely baffles me.  Is
global-font-lock-mode changing the syntax classes?

-----cut here
    (setq test "
    hello () { echo world.; }
    ## boln is at buffer position 40
    ")
    (defun test ()
      (sh-mode)
      (message "result is %S"
               (if (and 
                    (equal "<" (char-to-string (char-syntax ?#)))
                    (equal (char-after 40) ?#)
                    (equal (char-after 41) ?#)
                    (equal "<" (char-to-string (char-syntax (char-after 40))))
                    (equal "<" (char-to-string (char-syntax (char-after 41))))
                    )
               (save-excursion
                 (goto-char (point-min))
                 (re-search-forward "\\s<\\s<"))
	     "whoops!")))
    (progn
      (global-font-lock-mode 0)
      ;; succeeds
      (test))
    (progn
      (global-font-lock-mode 1)
      ;; `re-search-forward' fails the SECOND time, if not the first (no
      ;; pattern found)
      (test))

    ;;(sh-mode)
    ;;(emacs-lisp-mode)
    ;;(global-font-lock-mode)
    ;;(test)
---- end of test file

The facility for matching chars in syntax descriptors is either not fully
documented or has some other problems.  Looking into it further would take more
time than I have at the moment.

sh-script.el says:

    (defvar sh-mode-syntax-table
      '((sh eval sh-mode-syntax-table ()
            ?\# "<"
            ?\n ">#"
            ?\" "\"\""
            ?\' "\"'"
            ?\` "\"`"
            ?! "_"
            ?% "_"
            ?: "_"
            ?. "_"
            ?^ "_"
            ?~ "_"
            ?< "."
            ?> ".")
        (csh eval identity sh)
        (rc eval identity sh))

      "Syntax-table used in Shell-Script mode.  See `sh-feature'.")

Consider the second entry in the table, which is the equivalent of

         (modify-syntax-entry ?\n ">#")

The documentation for syntax descriptors says (both in TeXinfo and in
functions' docstrings) that the second character, the matching character, is
"used" only when the syntax class is "(" or ")" (open or close parentheses).

The declaration above assigns a matching character to a character with the
endcomment syntax class.  The documentation does not say doing this is an
error.  But from here, all possibilities imply one or more problems.  (And I
should observe that it seems that, furthermore, several major modes assign
matching characters to chars in the string delimiter (") class (usually the
same one, e.g., " with " and ' with '); this usage is likewise problematic.)

If the declaration of ">#" is equivalent to ">", with respect to all Emacs
primitives and distributed Lisp code, then

   + sh-script.el should use simply ">" for clarity.

   It may be desirable to leave in a facility for assigning matching chars to
   non-paren classes, so that programmers can do something with it.  If so,
   brief mention should be made in the TeXinfo documentation, if not the
   docstrings.  If not, then

       + it should be documented that matching chars are ignored except
         for the "(" and ")" classes;

       + `modify-syntax-entry' should decline to install ignored matching chars
         by either signalling an error or by silently deleting the matching
         char;

       + `describe-syntax' should decline to report matching chars that do not
         have any significance, because reporting them is confusing
         (`describe-syntax' will report that ?\n matches ?#, and likewise if
         you assign matching chars to chars in other syntax classes for which
         matching seems irrelevant).

If the declaration of ">#" is not equivalent to ">", then either the behavior
is undefined or it is well-defined but not documented.  If it is undefined,
then sh-script.el should not be using it.  If it is undocumented, then it
should be documented.

Recent input:
M-x r e p o r t - e m a c s - b u g <return>

Recent messages:
1 <- require: gnus-group
1 -> require: gnus-start
1 <- require: gnus-start
1 -> require: gnus-util
1 <- require: gnus-util
Loading gnus-topic...done
Loading emacsbug...
1 -> require: sendmail
1 <- require: sendmail
Loading emacsbug...done

             reply	other threads:[~2003-02-11  5:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-11  5:08 Matthew Swift [this message]
  -- strict thread matches above, loose matches on Subject: below --
2003-02-13  3:43 confusion over undocumented syntax-table features, font-lock and syntax-tables Luc Teirlinck
2003-02-13  4:17 Luc Teirlinck
2003-02-13 15:09 Luc Teirlinck
     [not found] <mailman.1933.1045148974.21513.bug-gnu-emacs@gnu.org>
2003-02-15 20:11 ` Matt Swift
2003-02-15 23:37 Luc Teirlinck
2003-02-16  5:46 ` Matt Swift

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200302110508.h1B58Kcu016866@beth.swift.xxx \
    --to=swift@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.