unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#31204: 25.3; Make word motion more customizable
@ 2018-04-18  8:55 Yuri Khan
  2022-05-18 12:03 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 3+ messages in thread
From: Yuri Khan @ 2018-04-18  8:55 UTC (permalink / raw)
  To: 31204

While trying to make word motion commands (Ctrl+left/right, M-f/b) more
similar to that implemented in other editors:

http://lists.gnu.org/archive/html/help-gnu-emacs/2018-04/msg00230.html

I encountered a difficulty.

The function ‘forward-word’ behaves as follows:

1. Skip to the nearest character having word constituent syntax.
2. Skip to the nearest word boundary.

Step 2, by default, finds:

* a non-word-constituent character, OR
* a transition between two adjacent characters of different scripts
  (subject to exceptions controlled by ‘word-combining-categories’ and
  ‘word-separating-categories’),

whichever comes first.

Step 2 can also be customized by modifying
‘find-word-boundary-function-table’. This enables various useful
behaviors such as ‘subword-mode’, ‘superword-mode’, and possibly CJK
word breaking rules.

Step 1, on the other hand, is not customizable at all.


The specific behavior that I was trying to implement was to find the
nearest transition:

* from a word character to a non-word character, OR
* from a non-word non-whitespace character to a word character, OR
* from a non-word non-whitespace character to a whitespace character.

As an illustration (where ‘|’ specifies word motion stops when going
left to right):

    foo| ***| +++| (|bar|)|
       ^

When cursor is after ‘foo’, step 1 of ‘forward-word’ skips to directly
before ‘bar’, missing two stops.

As a result, implementing the desired behavior requires either:

* defining separate functions ‘my-forward-word’, ‘my-backward-word’,
  ‘my-left-word’, ‘my-right-word’, ‘my-kill-word’,
  ‘my-backward-kill-word’, and possibly more, and remapping their key
  bindings; OR

* advising ‘forward-word’ with an :override.


Perhaps it would be nice to have an optional hook for step 1 of
‘forward-word’, a function that would take two arguments POS and LIMIT,
and returning the starting word boundary position from which step 2 would
then work.


In GNU Emacs 25.3.2 (x86_64-pc-linux-gnu, GTK+ Version 3.18.9)
 of 2017-09-13 built on lcy01-32
Windowing system distributor 'The X.Org Foundation', version 11.0.11905000
System Description:    Ubuntu 16.04.4 LTS

Configured using:
 'configure --build=x86_64-linux-gnu --prefix=/usr
 '--includedir=${prefix}/include' '--mandir=${prefix}/share/man'
 '--infodir=${prefix}/share/info' --sysconfdir=/etc --localstatedir=/var
 --disable-silent-rules '--libdir=${prefix}/lib/x86_64-linux-gnu'
 '--libexecdir=${prefix}/lib/x86_64-linux-gnu' --disable-maintainer-mode
 --disable-dependency-tracking --prefix=/usr --sharedstatedir=/var/lib
 --program-suffix=25 --with-modules --with-x=yes --with-x-toolkit=gtk3
 'CFLAGS=-g -O2 -fstack-protector-strong -Wformat
 -Werror=format-security' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro''

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GCONF GSETTINGS
NOTIFY LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES

Important settings:
  value of $LC_MONETARY: en_US.UTF-8
  value of $LC_NUMERIC: en_US.UTF-8
  value of $LC_TIME: en_DK.utf8
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message dired format-spec rfc822 mml
mml-sec password-cache epg epg-config gnus-util mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail
rfc2047 rfc2045 ietf-drums mm-util help-fns help-mode easymenu
cl-loaddefs pcase cl-lib mail-prsvr mail-utils time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt
fringe tabulated-list newcomment elisp-mode lisp-mode prog-mode register
page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese charscript case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote dbusbind inotify dynamic-setting
system-font-setting font-render-setting move-toolbar gtk x-toolkit x
multi-tty make-network-process emacs)

Memory information:
((conses 16 86338 5928)
 (symbols 48 19769 0)
 (miscs 40 49 121)
 (strings 32 14363 4733)
 (string-bytes 1 409522)
 (vectors 16 11755)
 (vector-slots 8 430899 3852)
 (floats 8 166 64)
 (intervals 56 231 0)
 (buffers 976 18)
 (heap 1024 33279 1050))





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#31204: 25.3; Make word motion more customizable
  2018-04-18  8:55 bug#31204: 25.3; Make word motion more customizable Yuri Khan
@ 2022-05-18 12:03 ` Lars Ingebrigtsen
  2022-06-15 15:03   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 3+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-18 12:03 UTC (permalink / raw)
  To: Yuri Khan; +Cc: 31204

Yuri Khan <yuri.v.khan@gmail.com> writes:

> While trying to make word motion commands (Ctrl+left/right, M-f/b) more
> similar to that implemented in other editors:
>
> http://lists.gnu.org/archive/html/help-gnu-emacs/2018-04/msg00230.html
>
> I encountered a difficulty.

[...]

> Step 1, on the other hand, is not customizable at all.
>
> The specific behavior that I was trying to implement was to find the
> nearest transition:
>
> * from a word character to a non-word character, OR
> * from a non-word non-whitespace character to a word character, OR
> * from a non-word non-whitespace character to a whitespace character.
>
> As an illustration (where ‘|’ specifies word motion stops when going
> left to right):
>
>     foo| ***| +++| (|bar|)|
>        ^

[...]

> Perhaps it would be nice to have an optional hook for step 1 of
> ‘forward-word’, a function that would take two arguments POS and LIMIT,
> and returning the starting word boundary position from which step 2 would
> then work.

(I'm going through old bug reports that unfortunately weren't resolved
at the time.)

I think this sounds like it could be useful.  If we added such a hook to
`forward-word', what would the rest of the code look like to make
`C-<right>' work this way?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#31204: 25.3; Make word motion more customizable
  2022-05-18 12:03 ` Lars Ingebrigtsen
@ 2022-06-15 15:03   ` Lars Ingebrigtsen
  0 siblings, 0 replies; 3+ messages in thread
From: Lars Ingebrigtsen @ 2022-06-15 15:03 UTC (permalink / raw)
  To: Yuri Khan; +Cc: 31204

Lars Ingebrigtsen <larsi@gnus.org> writes:

>> As an illustration (where ‘|’ specifies word motion stops when going
>> left to right):
>>
>>     foo| ***| +++| (|bar|)|

[...]

> I think this sounds like it could be useful.  If we added such a hook to
> `forward-word', what would the rest of the code look like to make
> `C-<right>' work this way?

I've played a bit at the patch below, but I tend to think that this is
going about things the wrong way.  That is, for something like this to
work meaningfully, it would require a lot of setup (because
Vfind_word_boundary_function_table) would also have to be altered in
conjunction with this.

I.e., it's really about changing the definition of what a "word" is, and
in that case, I think it would be easier to just do that in a syntax
table, and then everything would work automatically.

(Or by advising the functions here.)

So I don't think it'd be worth it to proceed with something like the
below, and I'm therefore closing this bug report.

diff --git a/src/syntax.c b/src/syntax.c
index f9022d18d2..02d4dd4b9a 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -1462,20 +1462,33 @@ scan_words (ptrdiff_t from, EMACS_INT count)
 
   while (count > 0)
     {
-      while (true)
+      if (!NILP (Vfind_word_start_function))
 	{
-	  if (from == end)
+	  Lisp_Object np = call2 (Vfind_word_start_function,
+				  make_fixnum (from), make_fixnum (end));
+	  if (!FIXNUMP (np))
 	    return 0;
-	  UPDATE_SYNTAX_TABLE_FORWARD (from);
+	  from = XFIXNUM (np);
+	  from_byte = CHAR_TO_BYTE (from);
 	  ch0 = FETCH_CHAR_AS_MULTIBYTE (from_byte);
-	  code = SYNTAX (ch0);
-	  inc_both (&from, &from_byte);
-	  if (words_include_escapes
-	      && (code == Sescape || code == Scharquote))
-	    break;
-	  if (code == Sword)
-	    break;
-	  rarely_quit (from);
+	}
+      else
+	{
+	  while (true)
+	    {
+	      if (from == end)
+		return 0;
+	      UPDATE_SYNTAX_TABLE_FORWARD (from);
+	      ch0 = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+	      code = SYNTAX (ch0);
+	      inc_both (&from, &from_byte);
+	      if (words_include_escapes
+		  && (code == Sescape || code == Scharquote))
+		break;
+	      if (code == Sword)
+		break;
+	      rarely_quit (from);
+	    }
 	}
       /* Now CH0 is a character which begins a word and FROM is the
          position of the next character.  */
@@ -3792,6 +3805,12 @@ syms_of_syntax (void)
 In both cases, LIMIT bounds the search. */);
   Vfind_word_boundary_function_table = Fmake_char_table (Qnil, Qnil);
 
+  DEFVAR_LISP ("find-word-start-function",
+	       Vfind_word_start_function,
+	       doc: /* Function called to find the start of a word.
+It's called with two parameters, POS and LIMIT.  */);
+  Vfind_word_start_function = Qnil;
+
   DEFVAR_BOOL ("comment-end-can-be-escaped", comment_end_can_be_escaped,
                doc: /* Non-nil means an escaped ender inside a comment doesn't end the comment.  */);
   comment_end_can_be_escaped = false;


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-06-15 15:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-18  8:55 bug#31204: 25.3; Make word motion more customizable Yuri Khan
2022-05-18 12:03 ` Lars Ingebrigtsen
2022-06-15 15:03   ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).