unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Thorsten Jolitz <tjolitz@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Re: Low level trickery for changing character syntax?
Date: Wed, 09 Apr 2014 09:44:39 +0200	[thread overview]
Message-ID: <87d2grhrq0.fsf@gmail.com> (raw)
In-Reply-To: 5344E1A7.50900@easy-emacs.de

Andreas Röhler <andreas.roehler@easy-emacs.de> writes:

> Am 08.04.2014 19:00, schrieb Thorsten Jolitz:
>>
>> Hi List,
>>
>> assume an imaginary elisp library gro.el I cannot (or don't want to)
>> change that is used on files of type A, with functions matching these
>> kinds of strings:
>>
>> #+begin_src emacs-lisp
>>    (defconst rgxp-1 "^[*] [*]Fat[*]$")
>>
>>    (defun foo (strg)
>>      (and (string-match "^\\*+[ \t]* \\*.+\\*" strg)
>>           (string-match rgxp-1 strg)))
>> #+end_src
>>
>> #+results:
>> : foo
>>
>> #+begin_src emacs-lisp
>> (foo "* *Fat*")
>> #+end_src
>>
>> #+results:
>> : 0
>>
>> #+begin_src emacs-lisp
>> (foo "+ *Fat*")
>> #+end_src
>>
>> #+results:
>>
>> Now assume I want to use gro.el functionality on files of type B
>> such that it matches strings likes this:
>>
>> #+begin_src emacs-lisp
>> (foo "// # *Fat*//" )
>> #+end_src
>>
>> In short, when called from file.type-A, I want foo to match "// #
>> *Fat*//", while it should only match "* *Fat*" when called from
>> file.type-B (without changing foo or rgxp-1).
>>
>> Thus in rgxp-1 and in foo, "^" would need to be replaced with "^// ",
>> the first "*" would need to be replaced with "#" (the other occurences
>> not), and "$" would need to be replaced with "//$".
>>
>> Now I wonder what would be the best way (or at least a possible way) to
>> achieve this with Emacs low-level trickery (almost) without touching
>> gro.el. I don't enough know about syntax table low-level stuff besides
>> reading the manual, so these are only vague speculations:
>>
>>   1. Change the syntax-table of gro.el whenever it is applied to files of
>>   type B such that "^" is seen as "^// ", "*" as "#" etc.?
>>
>>   2. Define new categories and put "^" "*" and "$" in them, and somehow
>>   load/activate these categories conditional on the type of file gro.el
>>   functionality is called upon. These categories should then achieve that
>>   "^" is seen as "^// " etc when the categories are loaded?
>>
>>   3. Define "^" and "$", when found at beg/end of a string, as 'generic
>>   comment delimiter, and define "/" as generic comment delimiter too, such
>>   that "^//" and "//$" are matched by "^" and "$"?
>>
>> I know that these ideas do not and cannot work as described, but I'm
>> looking for a hint which idea could possibly work? What would be the way
>> to go?
>>
>> Or is this completely unrealistic and the only way to achieve it is to
>> change the hardcoded regexps in (imaginary) library gro.el?
>>
>
> You could define different syntax-tables and than call functions
>
> if type-A
> (with-syntax-table type-A ...

That looks like a promising approach, but I never worked with
syntax-tables so I ask myself:

Is it possible to redefine characters "^", "$" and "*" in a syntax-table
in such a way that the same hardcoded regexp, e.g.

  ,------------------
  | "^[*] [*]Fat[*]$"
  `------------------

matches "* *Fat*" when called (with-syntax-table type-A ...), but
matches e.g. "// # *Fat*//" when called (with-syntax-table type-B ...)? 

* First approach

(from the elisp manual)
,---------------------------------------------------------------------
| A syntax descriptor is a Lisp string that describes the syntax class
| and other syntactic properties of a character. When you want to
| modify the syntax of a character, that is done by calling the
| function modify-syntax-entry and passing a syntax descriptor as one
| of its arguments (see Syntax Table Functions).
| 
| The first character in a syntax descriptor must be a syntax class
| designator character. The second character, if present, specifies a
| matching character (e.g., in Lisp, the matching character for '(' is
| ')'); a space specifies that there is no matching character. Then
| come characters specifying additional syntax properties (see Syntax
| Flags).
| 
| If no matching character or flags are needed, only one character
| (specifying the syntax class) is sufficient.
| 
| For example, the syntax descriptor for the character '*' in C mode
| is ". 23" (i.e., punctuation, matching character slot unused, second
| character of a comment-starter, first character of a comment-ender),
| and the entry for '/' is '. 14' (i.e., punctuation, matching
| character slot unused, first character of a comment-starter, second
| character of a comment-ender).
`---------------------------------------------------------------------

I can see how give e.g. "^" a different syntax class from this quote,
maybe make it a comment-starter, but I cannot see how to make it match
the combination of itself, two comment-starters and a space if and only
if it follows a \", i.e. how to make 

,------
| (looking-at "^")
`------

match e.g.

,-------
| "// "
`-------

at the beginning of a line when called (with-syntax-table type-B ...)?

* Second approach

(from the elisp manual)
,---------------------------------------------------------------------
| When the syntax table is not flexible enough to specify the syntax
| of a language, you can override the syntax table for specific
| character occurrences in the buffer, by applying a syntax-table text
| property. See Text Properties, for how to apply text properties.
`---------------------------------------------------------------------

where I find:
,-------------------------------------------------------------------
| Properties with Special Meanings
| 
| Here is a table of text property names that have special built-in
| meanings.
| 
| syntax-table
|     The syntax-table property overrides what the syntax table says
|     about this particular character. See Syntax Properties.
`-------------------------------------------------------------------

So I could assign "^" some special value for its special text property
'syntax-table, but w/o an example how to achieve my goal this way I'm a
bit lost here.

* Third approach

(from the elisp manual)
,--------------------------------------------------------------------
| Categories
| 
| Categories provide an alternate way of classifying characters
| syntactically. You can define several categories as needed, then
| independently assign each character to one or more categories.
| Unlike syntax classes, categories are not mutually exclusive; it is
| normal for one character to belong to several categories.
`--------------------------------------------------------------------

category-tables are buffer-local like syntax-tables, what is useful in
my case. Say I define category-table "B" buffer-local in buffers of
type-B files. But what then? Would I have to put "^", "/" (or more
generally 'comment-start') and " " in that category, such that a single

,------
| (looking-at "^")
`------

matches

,-------
| "// "
`-------

when called from a buffer with buffer-local category "B"? I cannot see
how this should work.

-- 
cheers,
Thorsten




  reply	other threads:[~2014-04-09  7:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-08 17:00 Low level trickery for changing character syntax? Thorsten Jolitz
2014-04-08 17:06 ` Thorsten Jolitz
2014-04-08 17:49 ` Andreas Röhler
2014-04-09  0:26   ` Thorsten Jolitz
2014-04-09  5:59 ` Andreas Röhler
2014-04-09  7:44   ` Thorsten Jolitz [this message]
2014-04-09  9:56     ` Andreas Röhler
2014-04-09 12:49     ` Stefan Monnier
2014-04-09 13:12       ` Thorsten Jolitz
2014-04-09  7:09 ` Tassilo Horn
2014-04-09  8:52   ` Org Minor Mode (was Re: Low level trickery for changing character syntax?) Thorsten Jolitz
2014-04-09 12:50     ` Stefan Monnier
2014-04-09 13:01     ` Tassilo Horn
2014-04-09 13:43       ` Thorsten Jolitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d2grhrq0.fsf@gmail.com \
    --to=tjolitz@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).