Re: Understanding Word Boundaries

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

From: Xah Lee <xahlee@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Re: Understanding Word Boundaries
Date: Sun, 27 Jun 2010 07:58:10 -0700 (PDT)	[thread overview]
Message-ID: <775f8ff3-370c-43a6-a7fb-59000826e0c6@k1g2000prl.googlegroups.com> (raw)
In-Reply-To: mailman.2.1277549613.3306.help-gnu-emacs@gnu.org

On Jun 26, 3:53 am, Paul Drummond <paul.drumm...@iode.co.uk> wrote:
> Thanks for the responses guys.
>
> I think the point I am trying to make here is that it's a *big* task to fix
> word boundaries for every case (every word-related key binding multiplied by
> each language/major mode I use!).
>
> I presume that Emacs hackers either a) put up with it or b) spend a lot of
> time fixing each case until they are happy.
>
> I suspect the answer is b. ;-)
>
> I wish there was a single minor-mode that fixes all the word boundary issues
> for every major-mode I use!  I can but dream.   Or maybe I will get round to
> doing it myself one day!  ;)

Heres the answer again in case you missed it.

• Text Editor's Cursor Movement Behavior (emacs, vi, Notepad++)
  http://xahlee.org/emacs/text_editor_cursor_behavior.html

plain text version follows.
-------------------------------------
Text Editor's Cursor Movement Behavior (emacs, vi, Notepad++)

Xah Lee, 2010-06-17

This article discusses some differences of cursor movement behavior
among editors. That is, when you press “Ctrl+→”, on a line of
programing language code with lots of different sequence of symbols,
where exactly does the cursor stop at?

--------------------------------------------------
Always End at Beginning of Word?

Type the following in your favorite text editor.

something in the water does not compute

Now, you can try the word movement in different editors.

I tested this on Notepad, Notepad++, vim, emacs, Mac's TextEdit.

In Notepad, Notepad++, vim, the cursor always ends at the beginning of
each word.

In emacs, TextEdit, Xcode, they end in the beginning of the word if
you are moving backward, but ends at the end of the word if you are
moving forward.

That's the first major difference.

--------------------------------------------------
Does Movement Depends on the Language Mode?

Now, try this line:

something !! in @@ the ## water $$ does %% not ^^ compute

Now, vim and Notepad++ 's behavior are identical. Their behavior is
pretty simple and like before. They simply put the cursor at the
beginning of each string sequence, doesn't matter what the characters
are. Notepad is similar, except that it will move into between %%.

Emacs, TextEdit behaved similarly. Emacs will skip the symbol
clusters !!, @@, ##, ^^ entirely, while stopping at boundaries of $$
and %%. (when emacs is in text-mode) TextEdit will stop in middle of $
$ and ^^, but skip the other symbol clusters entirely.

I don't know about other editors, but i understand the behavior of
emacs well. Emacs has a syntax table concept. Each and every character
is classified into one of “whitespace”, “word”, “symbol”,
“punctuation”, and others. When you use backward-word, it simply move
untill it reaches a char that's not in the “word” group.

Each major mode's value of syntax table are usually different. So,
depending on which mode you are in, it'll either skip a character
sequence of identical chars entirely, or stop at their boundary.

(info "(elisp) Syntax Tables")

The question is whether other editor's word movement behavior changes
depending on the what language mode it is currently in. And if so, how
the behavior changes? do they use a concept similar to emacs's syntax
table?

In Notepad++, cursor word-motion behavior does not change with respect
to what language mode you are in. Some 5 min test shows nor for vim.

--------------------------------------------------
More Test

Now, create a file of this content for more test.

something in the water does not compute
something !! in @@ the ## water $$ does %% not ^^ compute
something!!in@@the##water$$does%%not^^compute
(defun insert-p-tag () "Insert <p></p> at cursor point."
  (interactive) (insert "<p></p>") (backward-char 4))
for (my $i = 0; $i < 9; $i++) { print "done!";}
<a><b>a b c</b> d e</a>

Answer this:

    * Does the positions the cursor stop depends on whether you are
moving left or right?
    * Does the word motion behavior change depending on what language
mode you are in?
    * What is your editor? on what OS?

--------------------------------------------------
Which is More Efficient?

Now, the interesting question is which model is more efficient for
general everyday coding of different languages.

First question is: is it more efficient in general for left/right word
motions to always land in the left boundary the word as in vim,
Notepad, Notepad++ ?

Certainly i think it is more intuitive that way. But otherwise i don't
know.

The second question is: whether it is good to have the movement change
depending on the language mode.

I don't know. But again it seems more intuitive that way, because
users have good expectation where the cursor will stop regardless what
language he's coding. Though, of course it MAY be less efficient,
because logically one'd think that it might be better to have word
motion behavior adopt to different language. But am not sure about
this in real world situations.

Though, i do find emacs syntax table annoying from my experience of
working with it a bit in the past few years... from the little i know,
i felt that it doesn't do much, its power to model syntax is quite
weak, and very complicated to use... but i don't know for sure.

This article is inspired from Paul Drummond question in gnu.emacs.help

--------------------------------------------------
2010-06-18

On 2010-06-17, Elena <egarr...@gmail.com> wrote:

    is there some elisp code to move by tokens when a programming mode
is
    active? For instance, in the following C code:

    double value = f ();

    the point - represented by | - would move like this:

    |double value = f ();
    double |value = f ();
    double value |= f ();
    double value = |f ();
    double value = f |();
    double value = f (|);
    double value = f ()|;

cc-mode has functions c-forward-token-1 and c-forward-token-2. (thanks
to Andreas Politz)

It is easy to write a elisp code to do what you want, though, might be
tedious depending on what you mean by token, and whether you really
want the cursor to move by token. (might be too many stops)

Here's a function i wrote and have been using it for a couple of
years. You can mod it to get what u want. Basically that's the idea.
But depending what you mean by token, might be tedious to get it
right.

(defun forward-block ()
  "Move cursor forward to next occurrence of double newline char.
In most major modes, this is the same as `forward-paragraph', however,
this function behaves the same in any mode.
forward-paragraph is mode dependent, because it depends on
syntax table that has different meaning for “paragraph” depending on
mode."
  (interactive)
  (skip-chars-forward "\n")
  (when (not (search-forward-regexp "\n[[:blank:]]*\n" nil t))
    (goto-char (point-max)) ) )

(defun backward-block ()
  "Move cursor backward to previous occurrence of double newline char.
See: `forward-block'"
  (interactive)
  (skip-chars-backward "\n")
  (when (not (search-backward-regexp "\n[[:blank:]]*\n" nil t))
    (goto-char (point-min))
    )
  )

actually, you can just mod it so that it always just skip syntax
classes that's white space... but then if you have 1+1+8 that'll skip
the whole thing...

  Xah
∑ http://xahlee.org/

☄

next prev parent reply	other threads:[~2010-06-27 14:58 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-16 10:44 Understanding Word Boundaries Paul Drummond
2010-06-16 20:07 ` Karan Bathla
2010-06-17 13:37   ` Deniz Dogan
2010-06-23  9:02 ` Gary
2010-06-26 10:46   ` Paul Drummond
2010-06-26 10:53     ` Paul Drummond
2010-06-26 11:22       ` Thien-Thi Nguyen
2010-06-26 23:49       ` ken
2010-06-27  3:05         ` Deniz Dogan
2012-12-11 11:18           ` Understanding Word and Sentence Boundaries ken
2012-12-11 12:03             ` Eric Abrahamsen
2012-12-11 15:17               ` ken
2012-12-12  7:02                 ` Eric Abrahamsen
2012-12-12 14:32                   ` Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] ken
2012-12-13  4:27                     ` Eric Abrahamsen
2012-12-13  5:59                       ` Eric Abrahamsen
     [not found]         ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org>
2010-06-27 15:02           ` Understanding Word Boundaries Xah Lee
2012-12-11  2:11       ` Samuel Wales
     [not found]     ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org>
2010-06-27 14:58       ` Xah Lee [this message]
2010-06-25 10:33 ` andreas.roehler
     [not found] <mailman.1.1276717938.15244.help-gnu-emacs@gnu.org>
2010-06-17  2:20 ` Stefan Monnier
2010-06-18  7:24   ` Uday S Reddy
2010-06-17 10:43 ` Uday S Reddy
2010-06-17 20:16 ` Elena
2010-06-18  5:30 ` Xah Lee
2010-06-18  7:06   ` Xah Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=775f8ff3-370c-43a6-a7fb-59000826e0c6@k1g2000prl.googlegroups.com \
    --to=xahlee@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).