From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Xah Lee <xahlee@gmail.com>
Newsgroups: gmane.emacs.help
Subject: Re: Understanding Word Boundaries
Date: Thu, 17 Jun 2010 22:30:32 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: <296085af-7772-47f1-a030-18c33f4435b1@a39g2000prb.googlegroups.com>
References: <mailman.1.1276717938.15244.help-gnu-emacs@gnu.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: dough.gmane.org 1291843444 6664 80.91.229.12 (8 Dec 2010 21:24:04 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Wed, 8 Dec 2010 21:24:04 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Dec 08 22:24:00 2010
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1PQRUV-0008Bq-RZ
	for geh-help-gnu-emacs@m.gmane.org; Wed, 08 Dec 2010 22:24:00 +0100
Original-Received: from localhost ([127.0.0.1]:60801 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1PQRUV-0005st-6m
	for geh-help-gnu-emacs@m.gmane.org; Wed, 08 Dec 2010 16:23:59 -0500
Original-Path: usenet.stanford.edu!postnews.google.com!a39g2000prb.googlegroups.com!not-for-mail
Original-Newsgroups: gnu.emacs.help,comp.emacs
Original-Lines: 226
Original-NNTP-Posting-Host: 67.180.85.8
Original-X-Trace: posting.google.com 1276839033 19620 127.0.0.1 (18 Jun 2010 05:30:33
	GMT)
Original-X-Complaints-To: groups-abuse@google.com
Original-NNTP-Posting-Date: Fri, 18 Jun 2010 05:30:33 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: a39g2000prb.googlegroups.com; posting-host=67.180.85.8; 
	posting-account=bRPKjQoAAACxZsR8_VPXCX27T2YcsyMA
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) 
	AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.70 Safari/533.4,
	gzip(gfe)
Original-Xref: usenet.stanford.edu gnu.emacs.help:179085 comp.emacs:100087
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:76040
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/76040>

On Jun 16, 3:44=C2=A0am, Paul Drummond <paul.drumm...@iode.co.uk> wrote:
> I have been an Emacs users for a few years now so definitely still a
> newbie! =C2=A0While initially I struggled to control its power, I eventua=
lly came
> round. =C2=A0Every issue I've had so far I've been able to fix by a quick=
 search
> in EmacsWiki, except for one frustrating and re-occurring problem that ha=
s
> plagued me for years - word boundaries.
>
> Before Emacs I used Vim exclusively and the word boundary behaviour in Vi=
m
> *just worked* - I didn't even have to think about it. =C2=A0No matter wha=
t
> language I used I could navigate and manipulate words without thinking ab=
out
> it. =C2=A0The way word boundaries work in Vim is elegant and I have spent=
 a lot
> of time trying to find some elisp to replicate the behaviour in Emacs but=
 to
> no avail.
>
> I could write some elisp myself but I am still very new to it so it will
> take a while - it's something I would like to do but I don't have time at
> the moment. =C2=A0Regardless, an elisp solution to the problem is not the=
 point
> of this post. =C2=A0I want to understand why word boundaries behave the w=
ay they
> do in Vanilla Emacs and I would greatly appropriate some views on this fr=
om
> some Emacs Gurus!
>
> Every time I notice the word boundary behaviour when hacking in Emacs I
> wonder to myself - "I must be missing something here. =C2=A0Surely, exper=
ienced
> Emacs users don't just *put up* with this! =C2=A0Yet every forum response=
, blog
> post, mailing-list post I have read suggests they do. =C2=A0This is atypi=
cal of
> the Emacs community in my experience. =C2=A0Usually when something behave=
s wrong
> in Emacs, it's easy to find some elisp that just fixes the problem full
> stop. =C2=A0Yet with word-boundaries all I can find is suggestions that f=
ix a
> particular gripe but nothing that provides a general solution.
>
> I have loads of examples but I will mentioned just a few here to hopefull=
y
> kick-start further discussion.
>
> ** Example 1
>
> I use org-mode for my journal and today I hit the word-boundary problem
> while entering my morning journal entry - here's a contrived example of w=
hat
> I entered:
>
> ** [10:27] Understanding Word Boundaries in Emacs
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^
> With point at the end of the word "Understanding" I hit C-w (which I bind=
 to
> backward-kill-word) and the word "Understanding" is killed as expected. =
=C2=A0But
> when I hit C-w again, the point kills to the colon. =C2=A0Why? =C2=A0Why =
is colon a
> word-boundary but the closing square bracket isn't?
>
> ** Example 2
>
> When editing C++ files I often need to delete the "ClassName::" part when
> declaring functions in the header:
>
> void ClassName::function();
> =C2=A0 =C2=A0 =C2=A0 =C2=A0^
>
> With point at the start of ClassName I want to press M-d twice to delete
> ClassName and :: but "::" isn't recognised as a word. =C2=A0In Vim I just=
 type
> "dw" twice and it *just works*.
>
> ** Example 3
>
> I have loads of problems when deleting and navigating words over multiple
> lines. =C2=A0In the following C++ code for instance:
>
> =C2=A0 =C2=A0 Page *page =3D new _Page(this);
> =C2=A0 =C2=A0 page.load();
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^
>
> When point is after "page", before the dot on the second line and I hit M=
-b
> (backward-word) point ends up at the first opening bracket of "Page(" !!!
>
> Again, vim does the right thing here - pressing 'b' takes the point to th=
e
> closing bracket of Page(this) so it doesn't recognise the semi-colon as a
> bracket which is intuitive and what I would expect. =C2=A0This is really =
the
> point I am trying to make. =C2=A0I have never taken the time to understan=
d the
> behaviour of word boundaries in Vim because *it just works*. =C2=A0In Ema=
cs I am
> forced to think about word boundaries because Emacs keeps surprising me w=
ith
> its weird behaviour!
>
> Note: My examples happen to be C++ but I use lots of other languages too
> including elisp, Clojure, JavaScript, Python and Java and the
> word-boundaries seem to be wrong for all of them.
>
> I have tried several different elisp solutions but each one has at least =
one
> feature that isn't quite right. =C2=A0Here are some links I kept, I've tr=
ied many
> other solutions but don't have the links to hand:
>
> http://stackoverflow.com/questions/2078855/about-the-forward-and-back...h=
ttp://stackoverflow.com/questions/1771102/changing-emacs-forward-wor...
>
> So to wrap up, the point of this post is to kick-start a discussion about
> why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.=
1
> in my case) seem to be so awkward and unintuitive.
>
> Regards,
> Paul Drummond


Good point.

I remember i felt something similar some 5 or 7 years ago and was
annoyed. But now i can't remember any detail... i just got used to
emacs and can't say i find it being problem at all.

actually, i think point is a valid one and a bit technically involved
in detail.

i'll have to study this in detail some other day but here's some
points.

For testing, save a file with this line as content:
something in the water does not compute

Now, you can try the word movement in different editors.

I tested this on Notepad, Notepad++, vim, emacs, Mac's TextEdit.

In short, different text editors all have a bit different behavior.
Here, Notepad, Notepad++, vim have the same behavior, while emacs and
TextEdit have similar behavior.

In Notepad, Notepad++, vim, the cursor always ends at the beginning of
each word.

In emacs and TextEdit, they end in the beginning of the word if you
are using backward-word, but ends at the end of the word if you are
using forward-word.

That's the first major difference.

--------------------------------------------------
Now, try this line:

something !! in @@ the ## water $$ does %% not ^^ compute

Now, vim and Notepad++ 's behavior are identical. Their behavior is
pretty simple and like before. They simply put the cursor at the
beginning of each string sequence, doesn't matter what the characters
are. Notepad is similar, except that it moves into between %%.

emacs and TextEdit behaved similarly.
Emacs will skip the symbol clusters entirely, except %%. (this depends
on what mode you are in)
TextEdit will also stop in middle of $$ and ^^, otherwise skip the
other symbols clusters entirely.

So, from this, it is clear that different editors has different
concepts of syntax group, or not such concept at all.

I understand well the emacs case. Emacs has a syntax table concept,
that groups certain chars into a classes of =E2=80=9Cwhitespace=E2=80=9D, =
=E2=80=9Cword=E2=80=9D,
=E2=80=9Csymbol=E2=80=9D, =E2=80=9Cpunctuation=E2=80=9D, ...etc. When you u=
se backward-word, it simply
move untill it reaches a char that's not in the =E2=80=9Cword=E2=80=9D grou=
p. So,
depending on which mode you are in, it'll either skip a character
sequence of identical chars entirely, or stop at their boundary. And
if the char sequence is of different symbols such as !@#$%&*() then
emacs may go into middle of them.

The question is whether other editors has syntax group notion, or that
their word movement behavior depends on the language mode at all.

--------------------------------------------------

Now, the interesting question is which model is more efficient for
general everyday coding of different languages.

First question is: is it more efficient in general for forward/
backward word motions to always land in front of the word as in vim,
Notepad, Notepad++ ?

Certainly i think it is more intuitive that way. But otherwise i don'
tknow. I'll have to do research on this some day.

The second question is whether it is good to have the movement
dependant on the language mode. Again i don't know.

Though, i do find emacs syntax table annoying from my experience of
working with it a bit in the past few years... from the little i know,
i felt that it doesn't do much, its power to model syntax is quite
weak, and very complicated to use... but i don't know for sure.

Btw, one of your example, this one:

Page *page =3D new _Page(this);
page.load();

i cannot duplicate.

  Xah
=E2=88=91 http://xahlee.org/

=E2=98=84