all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eric Abrahamsen <eric@ericabrahamsen.net>
To: emacs-devel@gnu.org
Subject: Using syntax tables to parse buffer content
Date: Mon, 24 May 2021 13:21:00 -0700	[thread overview]
Message-ID: <87o8czopcj.fsf@ericabrahamsen.net> (raw)

[I sent this to emacs.help a few days ago, but am hoping I'll have
better luck over here, sorry...]

Hi!

I often find myself parsing buffer or file contents using regular
expressions, and would much rather be using lower-level character syntax
to do it, both for reasons of speed and correctness. I've been looking
into using syntax tables to assign certain classes to characters, and
using either basic stuff like `skip-syntax-forward', or maybe
`parse-partial-sexp', to pull substrings out of a buffer.

My main problem now is escaping: I don't know how to treat escaped
special characters as non-special. The simplest example is in vCard
parsing. A property line might look like this:

URL;TYPE=homepage:https\://mygreatpage.com/
   ^    ^        ^

I've indicated the significant characters above: they include semicolon,
colon, equals, and comma. The semicolon in the URL is escaped, and
shouldn't be treated specially. These characters don't seem to fit the
existing syntax classes, so I've considered defining my own categories
for them.

The manual mentions escape syntax characters (the "\" class), but
doesn't quite make it clear *what* it escapes: I'm guessing only
open/close parentheses, and string delimiters? Then there's character
quote (the "/" class), which says the following character will "lose its
normal syntactic meaning", but I can't get that to *do* anything.

For example, in a text-mode test buffer, I add the "/" syntax class to
?*, then put that character before a space character, thinking it might
negate the space's whitespace class. That doesn't happen, though, as
(skip-syntax-forward "^ ") still stops at the space.

What am I missing, and is this kind of custom escaping possible? I can
peek back at the previous character, but at that point it's not too
different from regexp parsing.

Thanks in advance!
Eric




             reply	other threads:[~2021-05-24 20:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-24 20:21 Eric Abrahamsen [this message]
2021-05-24 21:07 ` Using syntax tables to parse buffer content Stefan Monnier
2021-05-26 16:43   ` Eric Abrahamsen
  -- strict thread matches above, loose matches on Subject: below --
2021-05-18 21:02 Eric Abrahamsen
2021-05-18 22:35 ` Jean Louis
2021-05-18 22:53   ` Eric Abrahamsen
2021-05-18 23:21     ` Jean Louis
2021-05-19  0:26       ` Eric Abrahamsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8czopcj.fsf@ericabrahamsen.net \
    --to=eric@ericabrahamsen.net \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.