all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Theodor Thornhill via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 60623@debbugs.gnu.org, juri@linkov.net, casouri@gmail.com,
	monnier@iro.umontreal.ca, mardani29@yahoo.es
Subject: bug#60623: 30.0.50; Add forward-sentence with tree sitter support
Date: Tue, 10 Jan 2023 20:33:52 +0100	[thread overview]
Message-ID: <87zgaqmbfj.fsf@thornhill.no> (raw)
In-Reply-To: <83358io2cd.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 8059 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: mardani29@yahoo.es, 60623@debbugs.gnu.org, casouri@gmail.com,
>>  monnier@iro.umontreal.ca, juri@linkov.net
>> Date: Tue, 10 Jan 2023 09:37:26 +0100
>> 
>> >> > No, because these are not really sentences in some human-readable
>> >> > language, these are program parts.  As such they should be somewhere
>> >> > under "27 Programs", possibly in "Defuns".
>> >> >
>> >> > However, "Sentences" might mention that programming modes have their
>> >> > own interpretation of "sentence" and corresponding movement commands.
>> >> 
>> >> Yeah, that makes sense.  Should I make an attempt at such formulations,
>> >> or will you do it at a later time?
>> >
>> > It is better that you try, if only to gain experience ;-)
>> 
>> How about this for starter?
>
> Very good, thank you very much.  A few comments below.
>
>> --- a/doc/emacs/programs.texi
>> +++ b/doc/emacs/programs.texi
>> @@ -163,6 +163,7 @@ Defuns
>>  * Left Margin Paren::   An open-paren or similar opening delimiter
>>                            starts a defun if it is at the left margin.
>>  * Moving by Defuns::    Commands to move over or mark a major definition.
>> +* Moving by Sentences:: Commands to move over certain definitions in code.
>                                                          ^^^^^^^^^^^
> I'd use "code units" or "units of code" here.

Done.

>
> Also, should we perhaps name the section "Moving by Statements"? or
> would it be too inaccurate?
>

I'm not sure.  I think that maybe because the commands involved, and the
ones that implicitly will be impacted, such as kill-sentence and friends
it is best to stay with Sentences?  But a statement is the better term
wrt programming languages of course.  I hold no strong opinions here.

>> +  These commands move point or set up the region based on definitions,
>> +also called @dfn{sentences}.  Even though sentences is usually
>
> Each @dfn in a manual should have an index entry, so that readers
> could easily find it.  in this case, the index entry should qualify
> the "sentences" term by the fact that we are talking about units of
> code.  So:
>
>   @cindex sentences, in programming languages
>

Done.

>> +considered when writing human languages, Emacs can use the same
>> +commands to move over certain constructs in programming languages
>> +(@pxref{Sentences}, @pxref{Moving by Defuns}).  In a programming
>> +language a sentence is usually a complete language construct smaller
>> +than defuns, but larger than sexps (@pxref{List Motion,,, elisp, The
>> +Emacs Lisp Reference Manual}).
>
> A couple of examples from two different languages could be a great
> help here.  Otherwise this text sounds a bit too abstract.
>

Something like this?

>> +@kindex M-a
>> +@kindex M-e
>
> Since we already have M-e elsewhere in the manual, I suggest to
> qualify the key bindings here:
>
>   @kindex M-a @r{(programming modes)}
>
> and similarly for M-e.  The @r{..} thingy is necessary to reset to the
> default typeface, since key index is implicitly typeset in @code.
>
>> +@findex backward-sentence
>> +@findex forward-sentence
>
> Likewise with these two @findex entries: qualify them, since we have
> the same commands documented elsewhere under "Sentences".
>

Done.

>> --- a/doc/emacs/text.texi
>> +++ b/doc/emacs/text.texi
>> @@ -253,6 +253,14 @@ Sentences
>>  of a sentence.  Set the variable @code{sentence-end-without-period} to
>>  @code{t} in such cases.
>>  
>> +  Even though the above mentioned sentence movement commands are based
>> +on human languages, other Emacs modes can set these command to get
>> +similar functionality.  What exactly a sentence is in a non-human
>> +language is dependent on the target language, but usually it is
>> +complete statements, such as a variable definition and initialization,
>> +or a conditional statement (@pxref{Moving by Sentences,,, emacs, The
>> +extensible self-documenting text editor}).
>
> The last sentence should be in "Moving by Sentences", since it
> describes the commands documented there.  Also, please add a
> cross-reference here to "Moving by Sentences", since you mention that
> in the text (and rightfully so).
>

Is something like this what you meant?

>> +@defvar treesit-sentence-type-regexp
>> +The value of this variable is a regexp matching the node type of sentence
>> +nodes.  (For ``node'' and ``node type'', @pxref{Parsing Program Source}.)
>> +
>> +@findex treesit-forward-sentence
>> +@findex forward-sentence
>> +@findex backward-sentence
>> +If Emacs is compiled with tree-sitter, it can use the tree-sitter
>> +parser information to move across syntax constructs.  Since what
>> +exactly is considered a sentence varies between languages, a major mode
>> +should set @code{treesit-sentence-type-regexp} to determine that.  Then
>> +the mode can get navigation-by-sentence functionality for free, by using
>> +@code{forward-sentence} and @code{backward-sentence}.
>
> Here please also add a cross-reference to the "Moving by Sentences"
> node in the Emacs manual, so that people could understand what kind of
> "sentences" this is talking about.
>
>> +** New defvar forward-sentence-function.
>       ^^^^^^^^^^
> "New variable"
>
>> +Emacs now can set this variable to customize the behavior of the
>> +'forward-sentence' function.
>
> Not "Emacs", but "major modes".
>
>> +** New defun forward-sentence-default-function.
>       ^^^^^^^^^
> "New function"
>
>> +The previous implementation of 'forward-sentence' is moved into its
>> +own function, to be bound by 'forward-sentence-function'.
>> +
>> +** New defvar-local 'treesit-sentence-type-regexp.
>> +Similarly to 'treesit-defun-type-regexp', this variable is used to
>> +navigate sentences in Tree-sitter enabled modes.
>> +
>> +** New function 'treesit-forward-sentence'.
>> +treesit.el now conditionally sets 'forward-sentence-function' for all
>> +Tree-sitter modes that sets 'treesit-sentence-type-regexp'.
>
> Please make these related items sub-headings of a common heading,
> something like "Commands and variables to move by program statements".
>

Done.

>> +
>>  \f
>>  * Changes in Specialized Modes and Packages in Emacs 30.1
>>  ---
>> diff --git a/lisp/textmodes/paragraphs.el b/lisp/textmodes/paragraphs.el
>> index 73abb155aaa..fd2d83eeebf 100644
>> --- a/lisp/textmodes/paragraphs.el
>> +++ b/lisp/textmodes/paragraphs.el
>> @@ -441,13 +441,12 @@ end-of-paragraph-text
>>  	  (if (< (point) (point-max))
>>  	      (end-of-paragraph-text))))))
>>  
>> -(defun forward-sentence (&optional arg)
>> +(defun forward-sentence-default-function (&optional arg)
>>    "Move forward to next end of sentence.  With argument, repeat.
>>  When ARG is negative, move backward repeatedly to start of sentence.
>>  
>>  The variable `sentence-end' is a regular expression that matches ends of
>>  sentences.  Also, every paragraph boundary terminates sentences as well."
>> -  (interactive "^p")
>>    (or arg (setq arg 1))
>>    (let ((opoint (point))
>>          (sentence-end (sentence-end)))
>> @@ -480,6 +479,18 @@ forward-sentence
>>      (let ((npoint (constrain-to-field nil opoint t)))
>>        (not (= npoint opoint)))))
>>  
>> +(defvar forward-sentence-function #'forward-sentence-default-function
>> +  "Function to be used to calculate sentence movements.
>> +See `forward-sentence' for a description of its behavior.")
>> +
>> +(defun forward-sentence (&optional arg)
>> +  "Move forward to next end of sentence.  With argument, repeat.
>                                              ^^^^^^^^^^^^^^^^^^^^^
> "With argument ARG, repeat."  The doc string should reference the
> arguments where possible.
>

Thanks, done.

>> +When ARG is negative, move backward repeatedly to start of sentence.
>    ^^^^
> "If", not "When".
>

Done

>> +(defvar-local treesit-sentence-type-regexp ""
>> +  "A regexp that matches the node type of sentence nodes.
>
> Why is the default an empty regexp? wouldn't nil be better?

Indeed it will, done.

How about this?

Theo


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-forward-sentence-with-tree-sitter-support-bug-60.patch --]
[-- Type: text/x-patch, Size: 10322 bytes --]

From 05b880680f70825d56b624b55f39b9114ba22c8d Mon Sep 17 00:00:00 2001
From: Theodor Thornhill <theo@thornhill.no>
Date: Sun, 8 Jan 2023 20:28:02 +0100
Subject: [PATCH] Add forward-sentence with tree sitter support (bug#60623)

* etc/NEWS: Mention the new changes.
* lisp/textmodes/paragraphs.el (forward-sentence-default-function):
Move old implementation to its own function.
(forward-sentence-function): New defvar defaulting to old behavior.
(forward-sentence): Use the variable in this function unconditionally.
* lisp/treesit.el (treesit-sentence-type-regexp): New defvar.
(treesit-forward-sentence): New defun.
(treesit-major-mode-setup): Conditionally set
forward-sentence-function.
* doc/emacs/programs.texi (Defuns): Add new subsection.
(Moving by Sentences): Add some documentation with xrefs to the elisp
manual and related nodes.
* doc/lispref/positions.texi (List Motion): Mention
treesit-sentence-type-regexp and describe how to enable this
functionality.
---
 doc/emacs/programs.texi      | 56 ++++++++++++++++++++++++++++++++++++
 doc/emacs/text.texi          |  5 ++++
 doc/lispref/positions.texi   | 17 +++++++++++
 etc/NEWS                     | 18 ++++++++++++
 lisp/textmodes/paragraphs.el | 15 ++++++++--
 lisp/treesit.el              | 27 +++++++++++++++++
 6 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/doc/emacs/programs.texi b/doc/emacs/programs.texi
index 44cad5a148e..a2cdf6c6eb9 100644
--- a/doc/emacs/programs.texi
+++ b/doc/emacs/programs.texi
@@ -163,6 +163,7 @@ Defuns
 * Left Margin Paren::   An open-paren or similar opening delimiter
                           starts a defun if it is at the left margin.
 * Moving by Defuns::    Commands to move over or mark a major definition.
+* Moving by Sentences:: Commands to move over certain code units.
 * Imenu::               Making buffer indexes as menus.
 * Which Function::      Which Function mode shows which function you are in.
 @end menu
@@ -254,6 +255,61 @@ Moving by Defuns
 language.  Other major modes may replace any or all of these key
 bindings for that purpose.
 
+@node Moving by Sentences
+@subsection Moving by Sentences
+@cindex sentences, in programming languages
+
+  These commands move point or set up the region based on units of
+code, also called @dfn{sentences}.  Even though sentences are usually
+considered when writing human languages, Emacs can use the same
+commands to move over certain constructs in programming languages
+(@pxref{Sentences}, @pxref{Moving by Defuns}).  In a programming
+language a sentence is usually a complete language construct smaller
+than defuns, but larger than sexps (@pxref{List Motion,,, elisp, The
+Emacs Lisp Reference Manual}).  What exactly a sentence is in a
+non-human language is dependent on the target language, but usually it
+is complete statements, such as a variable definition and
+initialization, or a conditional statement.  An example of a sentence
+in the C language could be
+
+@example
+int x = 5;
+@end example
+
+or in the JavaScript language it could look like
+
+@example
+const thing = () => console.log("Hi");
+
+const foo = [1] == '1'
+  ? "No way"
+  : "...";
+@end example
+
+@table @kbd
+@item M-a
+Move to beginning of current or preceding sentence
+(@code{backward-sentence}).
+@item M-e
+Move to end of current or following sentence (@code{forward-sentence}).
+@end table
+
+@cindex move to beginning or end of sentence
+@cindex sentence, move to beginning or end
+@kindex M-a @r{(programming modes)}
+@kindex M-e @r{(programming modes)}
+@findex backward-sentence @r{(programming modes)}
+@findex forward-sentence @r{(programming modes)}
+  The commands to move to the beginning and end of the current
+sentence are @kbd{M-a} (@code{backward-sentence}) and @kbd{M-e}
+(@code{forward-sentence}).  If you repeat one of these commands, or
+use a positive numeric argument, each repetition moves to the next
+sentence in the direction of motion.
+
+  @kbd{M-a} with a negative argument @minus{}@var{n} moves forward
+@var{n} times to the next end of a sentence.  Likewise, @kbd{M-e} with
+a negative argument moves back to a start of a sentence.
+
 @node Imenu
 @subsection Imenu
 @cindex index of buffer definitions
diff --git a/doc/emacs/text.texi b/doc/emacs/text.texi
index 8fbf731a4f7..acd3bb21c29 100644
--- a/doc/emacs/text.texi
+++ b/doc/emacs/text.texi
@@ -253,6 +253,11 @@ Sentences
 of a sentence.  Set the variable @code{sentence-end-without-period} to
 @code{t} in such cases.
 
+  Even though the above mentioned sentence movement commands are based
+on human languages, other Emacs modes can set these command to get
+similar functionality (@pxref{Moving by Sentences,,, emacs, The
+extensible self-documenting text editor}).
+
 @node Paragraphs
 @section Paragraphs
 @cindex paragraphs
diff --git a/doc/lispref/positions.texi b/doc/lispref/positions.texi
index f3824436246..8d95ecee7ab 100644
--- a/doc/lispref/positions.texi
+++ b/doc/lispref/positions.texi
@@ -858,6 +858,23 @@ List Motion
 recognize nested defuns.
 @end defvar
 
+@defvar treesit-sentence-type-regexp
+The value of this variable is a regexp matching the node type of sentence
+nodes.  (For ``node'' and ``node type'', @pxref{Parsing Program Source}.)
+@end defvar
+
+@findex treesit-forward-sentence
+@findex forward-sentence
+@findex backward-sentence
+If Emacs is compiled with tree-sitter, it can use the tree-sitter
+parser information to move across syntax constructs.  Since what
+exactly is considered a sentence varies between languages, a major
+mode should set @code{treesit-sentence-type-regexp} to determine that.
+Then the mode can get navigation-by-sentence functionality for free,
+by using @code{forward-sentence} and
+@code{backward-sentence}(@pxref{Moving by Sentences,,, emacs, The
+extensible self-documenting text editor}).
+
 @node Skipping Characters
 @subsection Skipping Characters
 @cindex skipping characters
diff --git a/etc/NEWS b/etc/NEWS
index 3aa8f2abb77..0c782eeaee8 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -66,6 +66,24 @@ treesit.el now unconditionally sets 'transpose-sexps-function' for all
 Tree-sitter modes.  This functionality utilizes the new
 'transpose-sexps-function'.
 
+** Commands and variables to move by program statements
+
+*** New variable 'forward-sentence-function'.
+Major modes now can set this variable to customize the behavior of the
+'forward-sentence' function.
+
+*** New function 'forward-sentence-default-function'.
+The previous implementation of 'forward-sentence' is moved into its
+own function, to be bound by 'forward-sentence-function'.
+
+*** New defvar-local 'treesit-sentence-type-regexp.
+Similarly to 'treesit-defun-type-regexp', this variable is used to
+navigate sentences in Tree-sitter enabled modes.
+
+*** New function 'treesit-forward-sentence'.
+treesit.el now conditionally sets 'forward-sentence-function' for all
+Tree-sitter modes that sets 'treesit-sentence-type-regexp'.
+
 \f
 * Changes in Specialized Modes and Packages in Emacs 30.1
 ---
diff --git a/lisp/textmodes/paragraphs.el b/lisp/textmodes/paragraphs.el
index 73abb155aaa..bf249fdcdfb 100644
--- a/lisp/textmodes/paragraphs.el
+++ b/lisp/textmodes/paragraphs.el
@@ -441,13 +441,12 @@ end-of-paragraph-text
 	  (if (< (point) (point-max))
 	      (end-of-paragraph-text))))))
 
-(defun forward-sentence (&optional arg)
+(defun forward-sentence-default-function (&optional arg)
   "Move forward to next end of sentence.  With argument, repeat.
 When ARG is negative, move backward repeatedly to start of sentence.
 
 The variable `sentence-end' is a regular expression that matches ends of
 sentences.  Also, every paragraph boundary terminates sentences as well."
-  (interactive "^p")
   (or arg (setq arg 1))
   (let ((opoint (point))
         (sentence-end (sentence-end)))
@@ -480,6 +479,18 @@ forward-sentence
     (let ((npoint (constrain-to-field nil opoint t)))
       (not (= npoint opoint)))))
 
+(defvar forward-sentence-function #'forward-sentence-default-function
+  "Function to be used to calculate sentence movements.
+See `forward-sentence' for a description of its behavior.")
+
+(defun forward-sentence (&optional arg)
+  "Move forward to next end of sentence.  With argument ARG, repeat.
+If ARG is negative, move backward repeatedly to start of
+sentence.  Delegates its work to `forward-sentence-function'."
+  (interactive "^p")
+  (or arg (setq arg 1))
+  (funcall forward-sentence-function arg))
+
 (defun count-sentences (start end)
   "Count sentences in current buffer from START to END."
   (let ((sentences 0)
diff --git a/lisp/treesit.el b/lisp/treesit.el
index a7f453a8899..4c01a8db281 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -1792,6 +1792,31 @@ treesit-text-type-regexp
 \"text_block\" in the case of a string.  This is used by
 `prog-fill-reindent-defun' and friends.")
 
+(defvar-local treesit-sentence-type-regexp nil
+  "A regexp that matches the node type of sentence nodes.
+
+A sentence node is a node that is bigger than a sexp, and
+delimits larger statements in the source code.  It is, however,
+smaller in scope than defuns.  This is used by
+`treesit-forward-sentence' and friends.")
+
+(defun treesit-forward-sentence (&optional arg)
+  "Tree-sitter `forward-sentence-function' function.
+
+ARG is the same as in `forward-sentence'.
+
+If inside comment or other nodes described in
+`treesit-sentence-type-regexp', use
+`forward-sentence-default-function', else move across nodes as
+described by `treesit-sentence-type-regexp'."
+  (if (string-match-p
+       treesit-text-type-regexp
+       (treesit-node-type (treesit-node-at (point))))
+      (funcall #'forward-sentence-default-function arg)
+    (funcall
+     (if (> arg 0) #'treesit-end-of-thing #'treesit-beginning-of-thing)
+     treesit-sentence-type-regexp (abs arg))))
+
 (defun treesit-default-defun-skipper ()
   "Skips spaces after navigating a defun.
 This function tries to move to the beginning of a line, either by
@@ -2256,6 +2281,8 @@ treesit-major-mode-setup
                 #'treesit-add-log-current-defun))
 
   (setq-local transpose-sexps-function #'treesit-transpose-sexps)
+  (when treesit-sentence-type-regexp
+    (setq-local forward-sentence-function #'treesit-forward-sentence))
 
   ;; Imenu.
   (when treesit-simple-imenu-settings
-- 
2.34.1


  reply	other threads:[~2023-01-10 19:33 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-07 11:54 bug#60623: 30.0.50; Add forward-sentence with tree sitter support Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-07 15:41 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-08 13:29   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-08 14:53     ` Eli Zaretskii
2023-01-08 19:35       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-08 19:57         ` Eli Zaretskii
2023-01-08 20:07           ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-09 12:37             ` Eli Zaretskii
2023-01-09 13:28               ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-10  8:37               ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-10 15:07                 ` Eli Zaretskii
2023-01-10 19:33                   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2023-01-10 20:03                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-10 20:22                       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-10 20:28                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-10 21:00                       ` Drew Adams
2023-01-11 14:08                     ` Eli Zaretskii
2023-01-11 14:41                       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-08 17:33     ` Juri Linkov
2023-01-08  8:36 ` Juri Linkov
2023-01-08  9:20   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-08 16:41     ` Drew Adams
2023-01-08 17:04       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-08 17:30         ` Juri Linkov
2023-01-08 19:19           ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-09  7:49             ` Juri Linkov
2023-01-09  8:01               ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-08 17:42         ` Drew Adams
2023-01-09  6:20       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-01-09 15:57         ` Drew Adams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgaqmbfj.fsf@thornhill.no \
    --to=bug-gnu-emacs@gnu.org \
    --cc=60623@debbugs.gnu.org \
    --cc=casouri@gmail.com \
    --cc=eliz@gnu.org \
    --cc=juri@linkov.net \
    --cc=mardani29@yahoo.es \
    --cc=monnier@iro.umontreal.ca \
    --cc=theo@thornhill.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.