Understanding Word Boundaries

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* Understanding Word Boundaries
@ 2010-06-16 10:44 Paul Drummond
  2010-06-16 20:07 ` Karan Bathla
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Paul Drummond @ 2010-06-16 10:44 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 4326 bytes --]

I have been an Emacs users for a few years now so definitely still a
newbie!  While initially I struggled to control its power, I eventually came
round.  Every issue I've had so far I've been able to fix by a quick search
in EmacsWiki, except for one frustrating and re-occurring problem that has
plagued me for years - word boundaries.

Before Emacs I used Vim exclusively and the word boundary behaviour in Vim
*just worked* - I didn't even have to think about it.  No matter what
language I used I could navigate and manipulate words without thinking about
it.  The way word boundaries work in Vim is elegant and I have spent a lot
of time trying to find some elisp to replicate the behaviour in Emacs but to
no avail.

I could write some elisp myself but I am still very new to it so it will
take a while - it's something I would like to do but I don't have time at
the moment.  Regardless, an elisp solution to the problem is not the point
of this post.  I want to understand why word boundaries behave the way they
do in Vanilla Emacs and I would greatly appropriate some views on this from
some Emacs Gurus!

Every time I notice the word boundary behaviour when hacking in Emacs I
wonder to myself - "I must be missing something here.  Surely, experienced
Emacs users don't just *put up* with this!  Yet every forum response, blog
post, mailing-list post I have read suggests they do.  This is atypical of
the Emacs community in my experience.  Usually when something behaves wrong
in Emacs, it's easy to find some elisp that just fixes the problem full
stop.  Yet with word-boundaries all I can find is suggestions that fix a
particular gripe but nothing that provides a general solution.

I have loads of examples but I will mentioned just a few here to hopefully
kick-start further discussion.

** Example 1

I use org-mode for my journal and today I hit the word-boundary problem
while entering my morning journal entry - here's a contrived example of what
I entered:

** [10:27] Understanding Word Boundaries in Emacs
                                   ^
With point at the end of the word "Understanding" I hit C-w (which I bind to
backward-kill-word) and the word "Understanding" is killed as expected.  But
when I hit C-w again, the point kills to the colon.  Why?  Why is colon a
word-boundary but the closing square bracket isn't?

** Example 2

When editing C++ files I often need to delete the "ClassName::" part when
declaring functions in the header:

void ClassName::function();
       ^

With point at the start of ClassName I want to press M-d twice to delete
ClassName and :: but "::" isn't recognised as a word.  In Vim I just type
"dw" twice and it *just works*.

** Example 3

I have loads of problems when deleting and navigating words over multiple
lines.  In the following C++ code for instance:

    Page *page = new _Page(this);
    page.load();
           ^

When point is after "page", before the dot on the second line and I hit M-b
(backward-word) point ends up at the first opening bracket of "Page(" !!!

Again, vim does the right thing here - pressing 'b' takes the point to the
closing bracket of Page(this) so it doesn't recognise the semi-colon as a
bracket which is intuitive and what I would expect.  This is really the
point I am trying to make.  I have never taken the time to understand the
behaviour of word boundaries in Vim because *it just works*.  In Emacs I am
forced to think about word boundaries because Emacs keeps surprising me with
its weird behaviour!

Note: My examples happen to be C++ but I use lots of other languages too
including elisp, Clojure, JavaScript, Python and Java and the
word-boundaries seem to be wrong for all of them.

I have tried several different elisp solutions but each one has at least one
feature that isn't quite right.  Here are some links I kept, I've tried many
other solutions but don't have the links to hand:

http://stackoverflow.com/questions/2078855/about-the-forward-and-backward-a-word-behaviour-in-emacs
http://stackoverflow.com/questions/1771102/changing-emacs-forward-word-behaviour/1772365#1772365

So to wrap up, the point of this post is to kick-start a discussion about
why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1
in my case) seem to be so awkward and unintuitive.

Regards,
Paul Drummond

[-- Attachment #2: Type: text/html, Size: 4922 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word Boundaries
  2010-06-16 10:44 Understanding Word Boundaries Paul Drummond
@ 2010-06-16 20:07 ` Karan Bathla
  2010-06-17 13:37   ` Deniz Dogan
  2010-06-23  9:02 ` Gary
  2010-06-25 10:33 ` andreas.roehler
  2 siblings, 1 reply; 20+ messages in thread
From: Karan Bathla @ 2010-06-16 20:07 UTC (permalink / raw)
  To: help-gnu-emacs, Paul Drummond

[-- Attachment #1: Type: text/plain, Size: 5170 bytes --]

I don't know about the word boundary thing in vim and elisp code for that but the behaviour of backward-kill-word is simple : kill the last word; where a word is something alphanumeric. Any non alphanumeric characters like : and ( are deleted automatically if between point and last word. There is no concept here of : or ( being word boundaries.

So if you do M-d on ":67a" whole thing gets deleted and in "67a:", : remains (with point at beginning of string).

--- On Wed, 6/16/10, Paul Drummond <paul.drummond@iode.co.uk> wrote:

From: Paul Drummond <paul.drummond@iode.co.uk>
Subject: Understanding Word Boundaries
To: help-gnu-emacs@gnu.org
Date: Wednesday, June 16, 2010, 4:14 PM

I have been an Emacs users for a few years now so definitely still a newbie!  While initially I struggled to control its power, I eventually came round.  Every issue I've had so far I've been able to fix by a quick search in EmacsWiki, except for one frustrating and re-occurring problem that has plagued me for years - word boundaries.

Before Emacs I used Vim exclusively and the word boundary behaviour in Vim *just worked* - I didn't even have to think about it.  No matter what language I used I could navigate and manipulate words without thinking about it.  The way word boundaries work in Vim is elegant and I have spent a lot of time trying to find some elisp to replicate the behaviour in Emacs but to no avail.

I could write some elisp myself but I am still very new to it so it will take a while - it's something I would like to do but I don't have time at the moment.  Regardless, an elisp solution to the problem is not the point of this post.  I want to understand why word boundaries behave the way they do in Vanilla Emacs and I would greatly appropriate some views on this from some Emacs Gurus! 

Every time I notice the word boundary behaviour when hacking in Emacs I wonder to myself - "I must be missing something here.  Surely, experienced Emacs users don't just *put up* with this!  Yet every forum response, blog post, mailing-list post I have read suggests they do.  This is atypical of the Emacs community in my experience.  Usually when something behaves wrong in Emacs, it's easy to find some elisp that just fixes the problem full stop.  Yet with word-boundaries all I can find is suggestions that fix a particular gripe but nothing that provides a general solution.

I have loads of examples but I will mentioned just a few here to hopefully kick-start further discussion.  

** Example 1

I use org-mode for my journal and today I hit the word-boundary problem while entering my morning journal entry - here's a contrived example of what I entered:

** [10:27] Understanding Word Boundaries in Emacs

                                   ^
With point at the end of the word "Understanding" I hit C-w (which I bind to backward-kill-word) and the word "Understanding" is killed as expected.  But when I hit C-w again, the point kills to the colon.  Why?  Why is colon a word-boundary but the closing square bracket isn't?

** Example 2

When editing C++ files I often need to delete the "ClassName::" part when declaring functions in the header:

void ClassName::function();
       ^

With point at the start of ClassName I want to press M-d twice to delete ClassName and :: but "::" isn't recognised as a word.  In Vim I just type "dw" twice and it *just works*.

** Example 3

I have loads of problems when deleting and navigating words over multiple lines.  In the following C++ code for instance:

    Page *page = new _Page(this);
    page.load();
           ^          

When point is after "page", before the dot on the second line and I hit M-b (backward-word) point ends up at the first opening bracket of "Page(" !!!

Again, vim does the right thing here - pressing 'b' takes the point to the closing bracket of Page(this) so it doesn't recognise the semi-colon as a bracket which is intuitive and what I would expect.  This is really the point I am trying to make.  I have never taken the time to understand the behaviour of word boundaries in Vim because *it just works*.  In Emacs I am forced to think about word boundaries because Emacs keeps surprising me with its weird behaviour!

Note: My examples happen to be C++ but I use lots of other languages too including elisp, Clojure, JavaScript, Python and Java and the word-boundaries seem to be wrong for all of them.

I have tried several different elisp solutions but each one has at least one feature that isn't quite right.  Here are some links I kept, I've tried many other solutions but don't have the links to hand:

http://stackoverflow.com/questions/2078855/about-the-forward-and-backward-a-word-behaviour-in-emacs

http://stackoverflow.com/questions/1771102/changing-emacs-forward-word-behaviour/1772365#1772365

So to wrap up, the point of this post is to kick-start a discussion about why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1 in my case) seem to be so awkward and unintuitive. 

Regards,

Paul Drummond

[-- Attachment #2: Type: text/html, Size: 6301 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word Boundaries
  2010-06-16 20:07 ` Karan Bathla
@ 2010-06-17 13:37   ` Deniz Dogan
  0 siblings, 0 replies; 20+ messages in thread
From: Deniz Dogan @ 2010-06-17 13:37 UTC (permalink / raw)
  To: Karan Bathla; +Cc: help-gnu-emacs

2010/6/16 Karan Bathla <karan_goku@yahoo.com>
>
> I don't know about the word boundary thing in vim and elisp code for that but the behaviour of backward-kill-word is simple : kill the last word; where a word is something alphanumeric. Any non alphanumeric characters like : and ( are deleted automatically if between point and last word. There is no concept here of : or ( being word boundaries.
>
> So if you do M-d on ":67a" whole thing gets deleted and in "67a:", : remains (with point at beginning of string).
>

To be more specific, I think it depends on what the syntax table of
the active mode looks like. You can make your own syntax table to
change the behavior of "word commands" to some extent.

--
Deniz Dogan



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word Boundaries
  2010-06-16 10:44 Understanding Word Boundaries Paul Drummond
  2010-06-16 20:07 ` Karan Bathla
@ 2010-06-23  9:02 ` Gary
  2010-06-26 10:46   ` Paul Drummond
  2010-06-25 10:33 ` andreas.roehler
  2 siblings, 1 reply; 20+ messages in thread
From: Gary @ 2010-06-23  9:02 UTC (permalink / raw)
  To: help-gnu-emacs

Paul Drummond writes:
> ** Example 2
>
> When editing C++ files I often need to delete the "ClassName::" part when
> declaring functions in the header:
>
> void ClassName::function();
>        ^
>
> With point at the start of ClassName I want to press M-d twice to delete
> ClassName and :: but "::" isn't recognised as a word.  In Vim I just

Twice? Three times, shirley? Class and Name are both words...

(As you might guess, my pet peeve about the word boundary recognition is
when programming using camelcase.)

> So to wrap up, the point of this post is to kick-start a discussion about why
> the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1 in my
> case) seem to be so awkward and unintuitive.

Because it needs to be defined somewhat differently for natural
languages and different programming languages, at a guess. What a word
is depends entirely on the context you (and I) decide, and they may well
be different (see two versus three key presses above).




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word Boundaries
  2010-06-23  9:02 ` Gary
@ 2010-06-26 10:46   ` Paul Drummond
  2010-06-26 10:53     ` Paul Drummond
       [not found]     ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 20+ messages in thread
From: Paul Drummond @ 2010-06-26 10:46 UTC (permalink / raw)
  To: Gary; +Cc: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

On 23 June 2010 10:02, Gary <help-gnu-emacs@garydjones.name> wrote:

> Paul Drummond writes:
> > ** Example 2
> >
> > When editing C++ files I often need to delete the "ClassName::" part when
> > declaring functions in the header:
> >
> > void ClassName::function();
> >        ^
> >
> > With point at the start of ClassName I want to press M-d twice to delete
> > ClassName and :: but "::" isn't recognised as a word.  In Vim I just
>
> Twice? Three times, shirley? Class and Name are both words...
>

Yeah, I agree about CamelCase but I wanted to keep the example simple ;)

Because it needs to be defined somewhat differently for natural
> languages and different programming languages, at a guess. What a word
> is depends entirely on the context you (and I) decide, and they may well
> be different (see two versus three key presses above).
>

But each context I use has a major mode and I would expect each major mode
to have sensible default word boundaries but they don't.

Paul Drummond.

[-- Attachment #2: Type: text/html, Size: 1590 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word Boundaries
  2010-06-26 10:46   ` Paul Drummond
@ 2010-06-26 10:53     ` Paul Drummond
  2010-06-26 11:22       ` Thien-Thi Nguyen
                         ` (2 more replies)
       [not found]     ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org>
  1 sibling, 3 replies; 20+ messages in thread
From: Paul Drummond @ 2010-06-26 10:53 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 578 bytes --]

Thanks for the responses guys.

I think the point I am trying to make here is that it's a *big* task to fix
word boundaries for every case (every word-related key binding multiplied by
each language/major mode I use!).

I presume that Emacs hackers either a) put up with it or b) spend a lot of
time fixing each case until they are happy.

I suspect the answer is b. ;-)

I wish there was a single minor-mode that fixes all the word boundary issues
for every major-mode I use!  I can but dream.   Or maybe I will get round to
doing it myself one day!  ;)

Cheers,
Paul Drummond

[-- Attachment #2: Type: text/html, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word Boundaries
  2010-06-26 10:53     ` Paul Drummond
@ 2010-06-26 11:22       ` Thien-Thi Nguyen
  2010-06-26 23:49       ` ken
  2012-12-11  2:11       ` Samuel Wales
  2 siblings, 0 replies; 20+ messages in thread
From: Thien-Thi Nguyen @ 2010-06-26 11:22 UTC (permalink / raw)
  To: Paul Drummond; +Cc: help-gnu-emacs

() Paul Drummond <paul.drummond@iode.co.uk>
() Sat, 26 Jun 2010 11:53:08 +0100

   I suspect the answer is b. ;-)

There is another answer: (c) looking at sexps instead of words.

thi



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word Boundaries
  2010-06-26 10:53     ` Paul Drummond
  2010-06-26 11:22       ` Thien-Thi Nguyen
@ 2010-06-26 23:49       ` ken
  2010-06-27  3:05         ` Deniz Dogan
       [not found]         ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org>
  2012-12-11  2:11       ` Samuel Wales
  2 siblings, 2 replies; 20+ messages in thread
From: ken @ 2010-06-26 23:49 UTC (permalink / raw)
  To: Paul Drummond; +Cc: help-gnu-emacs


On 06/26/2010 06:53 AM Paul Drummond wrote:
> Thanks for the responses guys.
> 
> I think the point I am trying to make here is that it's a *big* task to
> fix word boundaries for every case (every word-related key binding
> multiplied by each language/major mode I use!).
> 
> I presume that Emacs hackers either a) put up with it or b) spend a lot
> of time fixing each case until they are happy.
> 
> I suspect the answer is b. ;-)
> 
> I wish there was a single minor-mode that fixes all the word boundary
> issues for every major-mode I use!  I can but dream.   Or maybe I will
> get round to doing it myself one day!  ;)
> 
> Cheers,
> Paul Drummond


Is it possible to specify word boundaries for a particular mode?

-- 
Find research and analysis on US healthcare, health insurance,
and health policy at: <http://healthpolicydaily.blogspot.com/>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word Boundaries
  2010-06-26 23:49       ` ken
@ 2010-06-27  3:05         ` Deniz Dogan
  2012-12-11 11:18           ` Understanding Word and Sentence Boundaries ken
       [not found]         ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org>
  1 sibling, 1 reply; 20+ messages in thread
From: Deniz Dogan @ 2010-06-27  3:05 UTC (permalink / raw)
  To: gebser; +Cc: help-gnu-emacs

2010/6/27 ken <gebser@mousecar.com>:
>
> On 06/26/2010 06:53 AM Paul Drummond wrote:
>> Thanks for the responses guys.
>>
>> I think the point I am trying to make here is that it's a *big* task to
>> fix word boundaries for every case (every word-related key binding
>> multiplied by each language/major mode I use!).
>>
>> I presume that Emacs hackers either a) put up with it or b) spend a lot
>> of time fixing each case until they are happy.
>>
>> I suspect the answer is b. ;-)
>>
>> I wish there was a single minor-mode that fixes all the word boundary
>> issues for every major-mode I use!  I can but dream.   Or maybe I will
>> get round to doing it myself one day!  ;)
>>
>> Cheers,
>> Paul Drummond
>
>
> Is it possible to specify word boundaries for a particular mode?
>

Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.

Regarding camel case word jumping, see subword-mode (previously known
as c-subword-mode) which is part of Emacs.

-- 
Deniz Dogan



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word  and Sentence Boundaries
  2010-06-27  3:05         ` Deniz Dogan
@ 2012-12-11 11:18           ` ken
  2012-12-11 12:03             ` Eric Abrahamsen
  0 siblings, 1 reply; 20+ messages in thread
From: ken @ 2012-12-11 11:18 UTC (permalink / raw)
  To: Deniz Dogan; +Cc: help-gnu-emacs

On 06/26/2010 11:05 PM Deniz Dogan wrote:
> 2010/6/27 ken<gebser@mousecar.com>:
>>
>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>> Thanks for the responses guys.
>>>
>>> I think the point I am trying to make here is that it's a *big* task to
>>> fix word boundaries for every case (every word-related key binding
>>> multiplied by each language/major mode I use!).
>>>
>>> I presume that Emacs hackers either a) put up with it or b) spend a lot
>>> of time fixing each case until they are happy.
>>>
>>> I suspect the answer is b. ;-)
>>>
>>> I wish there was a single minor-mode that fixes all the word boundary
>>> issues for every major-mode I use!  I can but dream.   Or maybe I will
>>> get round to doing it myself one day!  ;)
>>>
>>> Cheers,
>>> Paul Drummond
>>
>>
>> Is it possible to specify word boundaries for a particular mode?
>>
>
> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.

Thanks for the pointer to that function.

The behavior I see in need of repair is the role of so-called "comments"
in sentence syntax.</tag>  For instance, immediately before this 
sentence are two spaces... which should signify the end of the previous 
sentence.  But functions like "forward-sentence" and "fill-paragraph" 
and "backward-sentence" don't recognize it.

Said another way, the "</tag>" string obscures the relationship between 
the period before it and the two spaces after it and so fails to see 
that one sentence ends and another starts.  This occurs in text-mode and 
seems to be inherited by other modes.

If I'm reading "modify-syntax-entry" correctly, the default meanings of 
'<' and '>' are, respectively, beginning and end of comment, so 
modifying them wouldn't fix this problem.  Or can this be remedied by a 
change in the syntax table?  Or is this a bug?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word  and Sentence Boundaries
  2012-12-11 11:18           ` Understanding Word and Sentence Boundaries ken
@ 2012-12-11 12:03             ` Eric Abrahamsen
  2012-12-11 15:17               ` ken
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Abrahamsen @ 2012-12-11 12:03 UTC (permalink / raw)
  To: help-gnu-emacs

ken <gebser@mousecar.com> writes:

> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>> 2010/6/27 ken<gebser@mousecar.com>:
>>>
>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>> Thanks for the responses guys.
>>>>
>>>> I think the point I am trying to make here is that it's a *big* task to
>>>> fix word boundaries for every case (every word-related key binding
>>>> multiplied by each language/major mode I use!).
>>>>
>>>> I presume that Emacs hackers either a) put up with it or b) spend a lot
>>>> of time fixing each case until they are happy.
>>>>
>>>> I suspect the answer is b. ;-)
>>>>
>>>> I wish there was a single minor-mode that fixes all the word boundary
>>>> issues for every major-mode I use!  I can but dream.   Or maybe I will
>>>> get round to doing it myself one day!  ;)
>>>>
>>>> Cheers,
>>>> Paul Drummond
>>>
>>>
>>> Is it possible to specify word boundaries for a particular mode?
>>>
>>
>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.
>
> Thanks for the pointer to that function.
>
> The behavior I see in need of repair is the role of so-called "comments"
> in sentence syntax.</tag>  For instance, immediately before this
> sentence are two spaces... which should signify the end of the
> previous sentence.  But functions like "forward-sentence" and
> "fill-paragraph" and "backward-sentence" don't recognize it.
>
> Said another way, the "</tag>" string obscures the relationship
> between the period before it and the two spaces after it and so fails
> to see that one sentence ends and another starts.  This occurs in
> text-mode and seems to be inherited by other modes.
>
> If I'm reading "modify-syntax-entry" correctly, the default meanings
> of '<' and '>' are, respectively, beginning and end of comment, so
> modifying them wouldn't fix this problem.  Or can this be remedied by
> a change in the syntax table?  Or is this a bug?

For this particular case, I think you can modify the value of the
`sentence-end' variable (which is returned by the `sentence-end'
function? The whole thing is a little confusing). You'd probably be best
off starting with the docstring for the sentence-end function, and
working back from there.

I think the `sentence-end' variable is automatically buffer-local, which
means if you change it in a mode-hook it ought to work the way you want.
I agree that the whole syntax thing feels like a very well-polished
hack.

E




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word  and Sentence Boundaries
  2012-12-11 12:03             ` Eric Abrahamsen
@ 2012-12-11 15:17               ` ken
  2012-12-12  7:02                 ` Eric Abrahamsen
  0 siblings, 1 reply; 20+ messages in thread
From: ken @ 2012-12-11 15:17 UTC (permalink / raw)
  To: Eric Abrahamsen, GNU Emacs List

On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
> ken<gebser@mousecar.com>  writes:
>
>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>> 2010/6/27 ken<gebser@mousecar.com>:
>>>>
>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>> Thanks for the responses guys.
>>>>>
>>>>> ....
>>>>>
>>>>> I presume that Emacs hackers either a) put up with it or b) spend a lot
>>>>> of time fixing each case until they are happy.
>>>>>
>>>>> I suspect the answer is b. ;-)
>>>>>
>>>>> I wish there was a single minor-mode that fixes all the word boundary
>>>>> issues for every major-mode I use!  I can but dream.   Or maybe I will
>>>>> get round to doing it myself one day!  ;)
>>>>>
>>>>> Cheers,
>>>>> Paul Drummond
>>>>
>>>>
>>>> Is it possible to specify word boundaries for a particular mode?
>>>>
>>>
>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.
>>
>> Thanks for the pointer to that function.
>>
>> The behavior I see in need of repair is the role of so-called "comments"
>> in sentence syntax.</tag>   For instance, immediately before this
>> sentence are two spaces... which should signify the end of the
>> previous sentence.  But functions like "forward-sentence" and
>> "fill-paragraph" and "backward-sentence" don't recognize it.
>>
>> Said another way, the "</tag>" string obscures the relationship
>> between the period before it and the two spaces after it and so fails
>> to see that one sentence ends and another starts.  This occurs in
>> text-mode and seems to be inherited by other modes.
>>
>> If I'm reading "modify-syntax-entry" correctly, the default meanings
>> of '<' and'>' are, respectively, beginning and end of comment, so
>> modifying them wouldn't fix this problem.  Or can this be remedied by
>> a change in the syntax table?  Or is this a bug?
>
> For this particular case, I think you can modify the value of the
> `sentence-end' variable (which is returned by the `sentence-end'
> function? The whole thing is a little confusing). You'd probably be best
> off starting with the docstring for the sentence-end function, and
> working back from there.
>
> I think the `sentence-end' variable is automatically buffer-local, which
> means if you change it in a mode-hook it ought to work the way you want.
> I agree that the whole syntax thing feels like a very well-polished
> hack.
>
> E

Eric,

Yes, that would be the variable to adjust.  I took a hard look at it and 
discussed it (I believe) on this list years ago, but never came up with 
a fix.  As I see it, there are two problems:

First, "one" of the items in that RE would need to be "zero or more 
consecutive instances of '<' followed by any number of other characters 
up until the next '>' is found."  E.g., the RE would need to be able to 
find the end of this sentence</b></i>.)</q></p></span></div>  Though 
I've used REs successfully in quite a few instances and so with a small 
bit of help could probably figure that part out, there's a second issue.

My considered opinion is that in the above and similar examples, the end 
of the sentence is immediately after the period ('.')... or question 
mark, exclamation mark, etc. and not after the </div>.  That is where 
the point should go when forward-sentence is executed.  This means that 
no RE would work because, once it finds the RE-defined sentence-end, it 
then needs to go backwards within the found string until it encounters 
[.!?]+ and then search forward again to the first character after.  IOW, 
unless I'm missing some capability of REs, "sentence-end" needs to be a 
function rather than an RE and would be a different function than one 
which finds the beginning of a sentence.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word  and Sentence Boundaries
  2012-12-11 15:17               ` ken
@ 2012-12-12  7:02                 ` Eric Abrahamsen
  2012-12-12 14:32                   ` Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] ken
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Abrahamsen @ 2012-12-12  7:02 UTC (permalink / raw)
  To: help-gnu-emacs

ken <gebser@mousecar.com> writes:

> On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
>> ken<gebser@mousecar.com>  writes:
>>
>>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>>> 2010/6/27 ken<gebser@mousecar.com>:
>>>>>
>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>>> Thanks for the responses guys.
>>>>>>
>>>>>> ....
>>>>>>
>>>>>> I presume that Emacs hackers either a) put up with it or b) spend a lot
>>>>>> of time fixing each case until they are happy.
>>>>>>
>>>>>> I suspect the answer is b. ;-)
>>>>>>
>>>>>> I wish there was a single minor-mode that fixes all the word boundary
>>>>>> issues for every major-mode I use!  I can but dream.   Or maybe I will
>>>>>> get round to doing it myself one day!  ;)
>>>>>>
>>>>>> Cheers,
>>>>>> Paul Drummond
>>>>>
>>>>>
>>>>> Is it possible to specify word boundaries for a particular mode?
>>>>>
>>>>
>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.
>>>
>>> Thanks for the pointer to that function.
>>>
>>> The behavior I see in need of repair is the role of so-called "comments"
>>> in sentence syntax.</tag>   For instance, immediately before this
>>> sentence are two spaces... which should signify the end of the
>>> previous sentence.  But functions like "forward-sentence" and
>>> "fill-paragraph" and "backward-sentence" don't recognize it.
>>>
>>> Said another way, the "</tag>" string obscures the relationship
>>> between the period before it and the two spaces after it and so fails
>>> to see that one sentence ends and another starts.  This occurs in
>>> text-mode and seems to be inherited by other modes.
>>>
>>> If I'm reading "modify-syntax-entry" correctly, the default meanings
>>> of '<' and'>' are, respectively, beginning and end of comment, so
>>> modifying them wouldn't fix this problem.  Or can this be remedied by
>>> a change in the syntax table?  Or is this a bug?
>>
>> For this particular case, I think you can modify the value of the
>> `sentence-end' variable (which is returned by the `sentence-end'
>> function? The whole thing is a little confusing). You'd probably be best
>> off starting with the docstring for the sentence-end function, and
>> working back from there.
>>
>> I think the `sentence-end' variable is automatically buffer-local, which
>> means if you change it in a mode-hook it ought to work the way you want.
>> I agree that the whole syntax thing feels like a very well-polished
>> hack.
>>
>> E
>
> Eric,
>
> Yes, that would be the variable to adjust.  I took a hard look at it
> and discussed it (I believe) on this list years ago, but never came up
> with a fix.  As I see it, there are two problems:
>
> First, "one" of the items in that RE would need to be "zero or more
> consecutive instances of '<' followed by any number of other
> characters up until the next '>' is found."  E.g., the RE would need
> to be able to find the end of this
> sentence</b></i>.)</q></p></span></div>  Though I've used REs
> successfully in quite a few instances and so with a small bit of help
> could probably figure that part out, there's a second issue.
>
> My considered opinion is that in the above and similar examples, the
> end of the sentence is immediately after the period ('.')... or
> question mark, exclamation mark, etc. and not after the </div>.  That
> is where the point should go when forward-sentence is executed.  This
> means that no RE would work because, once it finds the RE-defined
> sentence-end, it then needs to go backwards within the found string
> until it encounters [.!?]+ and then search forward again to the first
> character after.  IOW, unless I'm missing some capability of REs,
> "sentence-end" needs to be a function rather than an RE and would be a
> different function than one which finds the beginning of a sentence.

I'm getting way out of my depth here, both regarding regexps and emacs'
sentence-related shenanigans, but you could consider advising the
`sentence-end' function so that it checks current the major mode, and
delegates to a different sentence-end function depending on the mode (or
declines to handle and bails to the built-in sentence-end).

The individual mode-specific sentence-end functions look at the text
after point, and return a different regexp every time, one specifically
tailored to this particular sentence in this particular mode. The call to
`forward-sentence' or whatever happily uses a different regexp every
time it is called.

Feels hacky, but I guess `sentence-end' is already doing this in a
sense -- potentially returning a different regexp every time.

My brain is exhausted!

E




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Finding end of sentence[ was Re: Understanding ... Sentence Boundaries]
  2012-12-12  7:02                 ` Eric Abrahamsen
@ 2012-12-12 14:32                   ` ken
  2012-12-13  4:27                     ` Eric Abrahamsen
  0 siblings, 1 reply; 20+ messages in thread
From: ken @ 2012-12-12 14:32 UTC (permalink / raw)
  To: GNU Emacs List

On 12/12/2012 02:02 AM Eric Abrahamsen wrote:
> ken<gebser@mousecar.com>  writes:
>
>> On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
>>> ken<gebser@mousecar.com>   writes:
>>>
>>>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>>>> 2010/6/27 ken<gebser@mousecar.com>:
>>>>>>
>>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>>>> Thanks for the responses guys.
>>>>>>>
>>>>>>> ....
>>>>>>>
>>>>>> Is it possible to specify word boundaries for a particular mode?
>>>>>>
>>>>>
>>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.
>>>>
>>>> Thanks for the pointer to that function.
>>>>
>>>> The behavior I see in need of repair is the role of so-called "comments"
>>>> in sentence syntax.</tag>    For instance, immediately before this
>>>> sentence are two spaces... which should signify the end of the
>>>> previous sentence.  But functions like "forward-sentence" and
>>>> "fill-paragraph" and "backward-sentence" don't recognize it.
>>>>
>>>> Said another way, the "</tag>" string obscures the relationship
>>>> between the period before it and the two spaces after it and so fails
>>>> to see that one sentence ends and another starts.  This occurs in
>>>> text-mode and seems to be inherited by other modes.
>>>>
>>>> If I'm reading "modify-syntax-entry" correctly, the default meanings
>>>> of '<' and'>' are, respectively, beginning and end of comment, so
>>>> modifying them wouldn't fix this problem.  Or can this be remedied by
>>>> a change in the syntax table?  Or is this a bug?
>>>
>>> For this particular case, I think you can modify the value of the
>>> `sentence-end' variable (which is returned by the `sentence-end'
>>> function? The whole thing is a little confusing). You'd probably be best
>>> off starting with the docstring for the sentence-end function, and
>>> working back from there.
>>>
>>> I think the `sentence-end' variable is automatically buffer-local, which
>>> means if you change it in a mode-hook it ought to work the way you want.
>>> I agree that the whole syntax thing feels like a very well-polished
>>> hack.
>>>
>>> E
>>
>> Eric,
>>
>> Yes, that would be the variable to adjust.  I took a hard look at it
>> and discussed it (I believe) on this list years ago, but never came up
>> with a fix.  As I see it, there are two problems:
>>
>> First, "one" of the items in that RE would need to be "zero or more
>> consecutive instances of '<' followed by any number of other
>> characters up until the next '>' is found."  E.g., the RE would need
>> to be able to find the end of this
>> sentence</b></i>.)</q></p></span></div>   Though I've used REs
>> successfully in quite a few instances and so with a small bit of help
>> could probably figure that part out, there's a second issue.
>>

[In my original post the paragraph below was unclear.  So changed it.]

>> My considered opinion is that in the above and similar examples, the
>> end of the sentence is immediately after the period ('.')... or
>> question mark, exclamation mark, etc. and not after the</div>.  That
>> is where the point should go when forward-sentence is executed.  This
>> means that no RE would work because, once it finds the RE-defined
>> sentence-end, it then needs to go backwards within the found string
>> until it encounters [.!?]+ and then move the mark one char forward to the
>> character after.  IOW, unless I'm missing some capability of REs,
>> "sentence-end" needs to be a function rather than an RE and would be a
>> different function than one which finds the beginning of a sentence.
>
> I'm getting way out of my depth here, both regarding regexps and emacs'
> sentence-related shenanigans, but you could consider advising the
> `sentence-end' function so that it checks current the major mode, and
> delegates to a different sentence-end function depending on the mode (or
> declines to handle and bails to the built-in sentence-end).
>
> The individual mode-specific sentence-end functions look at the text
> after point, and return a different regexp every time, one specifically
> tailored to this particular sentence in this particular mode. The call to
> `forward-sentence' or whatever happily uses a different regexp every
> time it is called.
>
> Feels hacky, but I guess `sentence-end' is already doing this in a
> sense -- potentially returning a different regexp every time.
>
> My brain is exhausted!
>
> E

If one were to write a mode-specific replacement for the existing 
"forward-sentence" and "sentence-end", what are some ways in elisp to 
ensure that they're invoked when working in that mode?  Would it be 
enough to include (the recoded) "forward-sentence" and "sentence-end" in 
the code for that mode...?  or would some kind of specific hook language 
need to be included in ~/.emacs?



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries]
  2012-12-12 14:32                   ` Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] ken
@ 2012-12-13  4:27                     ` Eric Abrahamsen
  2012-12-13  5:59                       ` Eric Abrahamsen
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Abrahamsen @ 2012-12-13  4:27 UTC (permalink / raw)
  To: help-gnu-emacs

ken <gebser@mousecar.com> writes:

> On 12/12/2012 02:02 AM Eric Abrahamsen wrote:
>> ken<gebser@mousecar.com>  writes:
>>
>>> On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
>>>> ken<gebser@mousecar.com>   writes:
>>>>
>>>>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>>>>> 2010/6/27 ken<gebser@mousecar.com>:
>>>>>>>
>>>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>>>>> Thanks for the responses guys.
>>>>>>>>
>>>>>>>> ....
>>>>>>>>
>>>>>>> Is it possible to specify word boundaries for a particular mode?
>>>>>>>
>>>>>>
>>>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.
>>>>>
>>>>> Thanks for the pointer to that function.
>>>>>
>>>>> The behavior I see in need of repair is the role of so-called "comments"
>>>>> in sentence syntax.</tag>    For instance, immediately before this
>>>>> sentence are two spaces... which should signify the end of the
>>>>> previous sentence.  But functions like "forward-sentence" and
>>>>> "fill-paragraph" and "backward-sentence" don't recognize it.
>>>>>
>>>>> Said another way, the "</tag>" string obscures the relationship
>>>>> between the period before it and the two spaces after it and so fails
>>>>> to see that one sentence ends and another starts.  This occurs in
>>>>> text-mode and seems to be inherited by other modes.
>>>>>
>>>>> If I'm reading "modify-syntax-entry" correctly, the default meanings
>>>>> of '<' and'>' are, respectively, beginning and end of comment, so
>>>>> modifying them wouldn't fix this problem.  Or can this be remedied by
>>>>> a change in the syntax table?  Or is this a bug?
>>>>
>>>> For this particular case, I think you can modify the value of the
>>>> `sentence-end' variable (which is returned by the `sentence-end'
>>>> function? The whole thing is a little confusing). You'd probably be best
>>>> off starting with the docstring for the sentence-end function, and
>>>> working back from there.
>>>>
>>>> I think the `sentence-end' variable is automatically buffer-local, which
>>>> means if you change it in a mode-hook it ought to work the way you want.
>>>> I agree that the whole syntax thing feels like a very well-polished
>>>> hack.
>>>>
>>>> E
>>>
>>> Eric,
>>>
>>> Yes, that would be the variable to adjust.  I took a hard look at it
>>> and discussed it (I believe) on this list years ago, but never came up
>>> with a fix.  As I see it, there are two problems:
>>>
>>> First, "one" of the items in that RE would need to be "zero or more
>>> consecutive instances of '<' followed by any number of other
>>> characters up until the next '>' is found."  E.g., the RE would need
>>> to be able to find the end of this
>>> sentence</b></i>.)</q></p></span></div>   Though I've used REs
>>> successfully in quite a few instances and so with a small bit of help
>>> could probably figure that part out, there's a second issue.
>>>
>
> [In my original post the paragraph below was unclear.  So changed it.]
>
>>> My considered opinion is that in the above and similar examples, the
>>> end of the sentence is immediately after the period ('.')... or
>>> question mark, exclamation mark, etc. and not after the</div>.  That
>>> is where the point should go when forward-sentence is executed.  This
>>> means that no RE would work because, once it finds the RE-defined
>>> sentence-end, it then needs to go backwards within the found string
>>> until it encounters [.!?]+ and then move the mark one char forward to the
>>> character after.  IOW, unless I'm missing some capability of REs,
>>> "sentence-end" needs to be a function rather than an RE and would be a
>>> different function than one which finds the beginning of a sentence.
>>
>> I'm getting way out of my depth here, both regarding regexps and emacs'
>> sentence-related shenanigans, but you could consider advising the
>> `sentence-end' function so that it checks current the major mode, and
>> delegates to a different sentence-end function depending on the mode (or
>> declines to handle and bails to the built-in sentence-end).
>>
>> The individual mode-specific sentence-end functions look at the text
>> after point, and return a different regexp every time, one specifically
>> tailored to this particular sentence in this particular mode. The call to
>> `forward-sentence' or whatever happily uses a different regexp every
>> time it is called.
>>
>> Feels hacky, but I guess `sentence-end' is already doing this in a
>> sense -- potentially returning a different regexp every time.
>>
>> My brain is exhausted!
>>
>> E
>
> If one were to write a mode-specific replacement for the existing
> "forward-sentence" and "sentence-end", what are some ways in elisp to
> ensure that they're invoked when working in that mode?  Would it be
> enough to include (the recoded) "forward-sentence" and "sentence-end"
> in the code for that mode...?  or would some kind of specific hook
> language need to be included in ~/.emacs?

I was considering overloading the `sentence-end' function in a
mode-hook, but I think it's highly likely that you'd end up polluting
other modes. So probably the safest thing to do is to advise it at the
top level, ie in your ~/.emacs file, and then check current mode from
there. Something like the following totally untested code:

--8<---------------cut here---------------start------------->8---
(defadvice sentence-end (before my-check-sentence-end activate)
  "Possibly short-circuit the `sentence-end' function."
  (cond ((derived-mode-p 'emacs-lisp-mode)
	 (emacs-lisp-sentence-end))
	((derived-mode-p 'some-other-mode)
	 (other-mode-sentence-end))
	(t ad-do-it)))

(defun emacs-lisp-sentence-end ()
  ;; examine text around point and return an appropriate regexp
)

(defun other-mode-sentence-end ()
   ;; return a different regexp
)
--8<---------------cut here---------------end--------------->8---

That ought to work, but I'm not guaranteeing that this is the best
approach!

E




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries]
  2012-12-13  4:27                     ` Eric Abrahamsen
@ 2012-12-13  5:59                       ` Eric Abrahamsen
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Abrahamsen @ 2012-12-13  5:59 UTC (permalink / raw)
  To: help-gnu-emacs

Eric Abrahamsen <eric@ericabrahamsen.net> writes:

> ken <gebser@mousecar.com> writes:
>
>> On 12/12/2012 02:02 AM Eric Abrahamsen wrote:
>>> ken<gebser@mousecar.com>  writes:
>>>
>>>> On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
>>>>> ken<gebser@mousecar.com>   writes:
>>>>>
>>>>>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>>>>>> 2010/6/27 ken<gebser@mousecar.com>:
>>>>>>>>
>>>>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>>>>>> Thanks for the responses guys.
>>>>>>>>>
>>>>>>>>> ....
>>>>>>>>>
>>>>>>>> Is it possible to specify word boundaries for a particular mode?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.
>>>>>>
>>>>>> Thanks for the pointer to that function.
>>>>>>
>>>>>> The behavior I see in need of repair is the role of so-called "comments"
>>>>>> in sentence syntax.</tag>    For instance, immediately before this
>>>>>> sentence are two spaces... which should signify the end of the
>>>>>> previous sentence.  But functions like "forward-sentence" and
>>>>>> "fill-paragraph" and "backward-sentence" don't recognize it.
>>>>>>
>>>>>> Said another way, the "</tag>" string obscures the relationship
>>>>>> between the period before it and the two spaces after it and so fails
>>>>>> to see that one sentence ends and another starts.  This occurs in
>>>>>> text-mode and seems to be inherited by other modes.
>>>>>>
>>>>>> If I'm reading "modify-syntax-entry" correctly, the default meanings
>>>>>> of '<' and'>' are, respectively, beginning and end of comment, so
>>>>>> modifying them wouldn't fix this problem.  Or can this be remedied by
>>>>>> a change in the syntax table?  Or is this a bug?
>>>>>
>>>>> For this particular case, I think you can modify the value of the
>>>>> `sentence-end' variable (which is returned by the `sentence-end'
>>>>> function? The whole thing is a little confusing). You'd probably be best
>>>>> off starting with the docstring for the sentence-end function, and
>>>>> working back from there.
>>>>>
>>>>> I think the `sentence-end' variable is automatically buffer-local, which
>>>>> means if you change it in a mode-hook it ought to work the way you want.
>>>>> I agree that the whole syntax thing feels like a very well-polished
>>>>> hack.
>>>>>
>>>>> E
>>>>
>>>> Eric,
>>>>
>>>> Yes, that would be the variable to adjust.  I took a hard look at it
>>>> and discussed it (I believe) on this list years ago, but never came up
>>>> with a fix.  As I see it, there are two problems:
>>>>
>>>> First, "one" of the items in that RE would need to be "zero or more
>>>> consecutive instances of '<' followed by any number of other
>>>> characters up until the next '>' is found."  E.g., the RE would need
>>>> to be able to find the end of this
>>>> sentence</b></i>.)</q></p></span></div>   Though I've used REs
>>>> successfully in quite a few instances and so with a small bit of help
>>>> could probably figure that part out, there's a second issue.
>>>>
>>
>> [In my original post the paragraph below was unclear.  So changed it.]
>>
>>>> My considered opinion is that in the above and similar examples, the
>>>> end of the sentence is immediately after the period ('.')... or
>>>> question mark, exclamation mark, etc. and not after the</div>.  That
>>>> is where the point should go when forward-sentence is executed.  This
>>>> means that no RE would work because, once it finds the RE-defined
>>>> sentence-end, it then needs to go backwards within the found string
>>>> until it encounters [.!?]+ and then move the mark one char forward to the
>>>> character after.  IOW, unless I'm missing some capability of REs,
>>>> "sentence-end" needs to be a function rather than an RE and would be a
>>>> different function than one which finds the beginning of a sentence.
>>>
>>> I'm getting way out of my depth here, both regarding regexps and emacs'
>>> sentence-related shenanigans, but you could consider advising the
>>> `sentence-end' function so that it checks current the major mode, and
>>> delegates to a different sentence-end function depending on the mode (or
>>> declines to handle and bails to the built-in sentence-end).
>>>
>>> The individual mode-specific sentence-end functions look at the text
>>> after point, and return a different regexp every time, one specifically
>>> tailored to this particular sentence in this particular mode. The call to
>>> `forward-sentence' or whatever happily uses a different regexp every
>>> time it is called.
>>>
>>> Feels hacky, but I guess `sentence-end' is already doing this in a
>>> sense -- potentially returning a different regexp every time.
>>>
>>> My brain is exhausted!
>>>
>>> E
>>
>> If one were to write a mode-specific replacement for the existing
>> "forward-sentence" and "sentence-end", what are some ways in elisp to
>> ensure that they're invoked when working in that mode?  Would it be
>> enough to include (the recoded) "forward-sentence" and "sentence-end"
>> in the code for that mode...?  or would some kind of specific hook
>> language need to be included in ~/.emacs?
>
> I was considering overloading the `sentence-end' function in a
> mode-hook, but I think it's highly likely that you'd end up polluting
> other modes. So probably the safest thing to do is to advise it at the
> top level, ie in your ~/.emacs file, and then check current mode from
> there. Something like the following totally untested code:
>
> (defadvice sentence-end (before my-check-sentence-end activate)
>   "Possibly short-circuit the `sentence-end' function."
>   (cond ((derived-mode-p 'emacs-lisp-mode)
> 	 (emacs-lisp-sentence-end))
> 	((derived-mode-p 'some-other-mode)
> 	 (other-mode-sentence-end))
> 	(t ad-do-it)))

I'm in the habit of using `derived-mode-p' but on second thought, you'll
probably just want to go with the simpler, but more exacting: (eq
major-mode 'emacs-lisp-mode)




^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org>]

* Re: Understanding Word Boundaries
       [not found]         ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org>
@ 2010-06-27 15:02           ` Xah Lee
  0 siblings, 0 replies; 20+ messages in thread
From: Xah Lee @ 2010-06-27 15:02 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 26, 8:05 pm, Deniz Dogan <deniz.a.m.do...@gmail.com> wrote:

> Regarding camel case word jumping, see subword-mode (previously known
> as c-subword-mode) which is part of Emacs.

Thanks for the info on subword-mode!

great discovery. Few years ago i searched the web and found one or two
camelCase mode, i installed it and it works, but now a bundled package
is much better!

thanks.

 Xah

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word Boundaries
  2010-06-26 10:53     ` Paul Drummond
  2010-06-26 11:22       ` Thien-Thi Nguyen
  2010-06-26 23:49       ` ken
@ 2012-12-11  2:11       ` Samuel Wales
  2 siblings, 0 replies; 20+ messages in thread
From: Samuel Wales @ 2012-12-11  2:11 UTC (permalink / raw)
  To: Paul Drummond; +Cc: help-gnu-emacs

On 6/26/10, Paul Drummond <paul.drummond@iode.co.uk> wrote:
> I wish there was a single minor-mode that fixes all the word boundary issues
> for every major-mode I use!  I can but dream.   Or maybe I will get round to
> doing it myself one day!  ;)

I have been using Emacs for decades, but I have not gotten used to its
navigation, killing, and marking boundary assumptions yet.

I'm always fixing up whitespace, going back and deleting less so as
not to delete punctuation, wanting the whole word or only part of it,
etc.  I think Emacs does the wrong thing somewhat more than it does
the right thing in this case.  Or maybe that is because it is more
noticeable when it does the wrong thing.

I keep thinking I should have gotten used to it by now.  :)

Given the great libraries out there for other things (e.g. scrolling),
you'd think there might be a customizable library for different
preferences for all syntax levels,  perhaps based on thingatpt.

Did you find anything, Paul?

Samuel

-- 
The Kafka Pandemic: http://thekafkapandemic.blogspot.com

The disease DOES progress.  MANY people have died from it.  ANYBODY
can get it.  There is no hope without action.

^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org>]

* Re: Understanding Word Boundaries
       [not found]     ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org>
@ 2010-06-27 14:58       ` Xah Lee
  0 siblings, 0 replies; 20+ messages in thread
From: Xah Lee @ 2010-06-27 14:58 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 26, 3:53 am, Paul Drummond <paul.drumm...@iode.co.uk> wrote:
> Thanks for the responses guys.
>
> I think the point I am trying to make here is that it's a *big* task to fix
> word boundaries for every case (every word-related key binding multiplied by
> each language/major mode I use!).
>
> I presume that Emacs hackers either a) put up with it or b) spend a lot of
> time fixing each case until they are happy.
>
> I suspect the answer is b. ;-)
>
> I wish there was a single minor-mode that fixes all the word boundary issues
> for every major-mode I use!  I can but dream.   Or maybe I will get round to
> doing it myself one day!  ;)

Heres the answer again in case you missed it.

• Text Editor's Cursor Movement Behavior (emacs, vi, Notepad++)
  http://xahlee.org/emacs/text_editor_cursor_behavior.html

plain text version follows.
-------------------------------------
Text Editor's Cursor Movement Behavior (emacs, vi, Notepad++)

Xah Lee, 2010-06-17

This article discusses some differences of cursor movement behavior
among editors. That is, when you press “Ctrl+→”, on a line of
programing language code with lots of different sequence of symbols,
where exactly does the cursor stop at?

--------------------------------------------------
Always End at Beginning of Word?

Type the following in your favorite text editor.

something in the water does not compute

Now, you can try the word movement in different editors.

I tested this on Notepad, Notepad++, vim, emacs, Mac's TextEdit.

In Notepad, Notepad++, vim, the cursor always ends at the beginning of
each word.

In emacs, TextEdit, Xcode, they end in the beginning of the word if
you are moving backward, but ends at the end of the word if you are
moving forward.

That's the first major difference.

--------------------------------------------------
Does Movement Depends on the Language Mode?

Now, try this line:

something !! in @@ the ## water $$ does %% not ^^ compute

Now, vim and Notepad++ 's behavior are identical. Their behavior is
pretty simple and like before. They simply put the cursor at the
beginning of each string sequence, doesn't matter what the characters
are. Notepad is similar, except that it will move into between %%.

Emacs, TextEdit behaved similarly. Emacs will skip the symbol
clusters !!, @@, ##, ^^ entirely, while stopping at boundaries of $$
and %%. (when emacs is in text-mode) TextEdit will stop in middle of $
$ and ^^, but skip the other symbol clusters entirely.

I don't know about other editors, but i understand the behavior of
emacs well. Emacs has a syntax table concept. Each and every character
is classified into one of “whitespace”, “word”, “symbol”,
“punctuation”, and others. When you use backward-word, it simply move
untill it reaches a char that's not in the “word” group.

Each major mode's value of syntax table are usually different. So,
depending on which mode you are in, it'll either skip a character
sequence of identical chars entirely, or stop at their boundary.

(info "(elisp) Syntax Tables")

The question is whether other editor's word movement behavior changes
depending on the what language mode it is currently in. And if so, how
the behavior changes? do they use a concept similar to emacs's syntax
table?

In Notepad++, cursor word-motion behavior does not change with respect
to what language mode you are in. Some 5 min test shows nor for vim.

--------------------------------------------------
More Test

Now, create a file of this content for more test.

something in the water does not compute
something !! in @@ the ## water $$ does %% not ^^ compute
something!!in@@the##water$$does%%not^^compute
(defun insert-p-tag () "Insert <p></p> at cursor point."
  (interactive) (insert "<p></p>") (backward-char 4))
for (my $i = 0; $i < 9; $i++) { print "done!";}
<a><b>a b c</b> d e</a>

Answer this:

    * Does the positions the cursor stop depends on whether you are
moving left or right?
    * Does the word motion behavior change depending on what language
mode you are in?
    * What is your editor? on what OS?

--------------------------------------------------
Which is More Efficient?

Now, the interesting question is which model is more efficient for
general everyday coding of different languages.

First question is: is it more efficient in general for left/right word
motions to always land in the left boundary the word as in vim,
Notepad, Notepad++ ?

Certainly i think it is more intuitive that way. But otherwise i don't
know.

The second question is: whether it is good to have the movement change
depending on the language mode.

I don't know. But again it seems more intuitive that way, because
users have good expectation where the cursor will stop regardless what
language he's coding. Though, of course it MAY be less efficient,
because logically one'd think that it might be better to have word
motion behavior adopt to different language. But am not sure about
this in real world situations.

Though, i do find emacs syntax table annoying from my experience of
working with it a bit in the past few years... from the little i know,
i felt that it doesn't do much, its power to model syntax is quite
weak, and very complicated to use... but i don't know for sure.

This article is inspired from Paul Drummond question in gnu.emacs.help

--------------------------------------------------
2010-06-18

On 2010-06-17, Elena <egarr...@gmail.com> wrote:

    is there some elisp code to move by tokens when a programming mode
is
    active? For instance, in the following C code:

    double value = f ();

    the point - represented by | - would move like this:

    |double value = f ();
    double |value = f ();
    double value |= f ();
    double value = |f ();
    double value = f |();
    double value = f (|);
    double value = f ()|;

cc-mode has functions c-forward-token-1 and c-forward-token-2. (thanks
to Andreas Politz)

It is easy to write a elisp code to do what you want, though, might be
tedious depending on what you mean by token, and whether you really
want the cursor to move by token. (might be too many stops)

Here's a function i wrote and have been using it for a couple of
years. You can mod it to get what u want. Basically that's the idea.
But depending what you mean by token, might be tedious to get it
right.

(defun forward-block ()
  "Move cursor forward to next occurrence of double newline char.
In most major modes, this is the same as `forward-paragraph', however,
this function behaves the same in any mode.
forward-paragraph is mode dependent, because it depends on
syntax table that has different meaning for “paragraph” depending on
mode."
  (interactive)
  (skip-chars-forward "\n")
  (when (not (search-forward-regexp "\n[[:blank:]]*\n" nil t))
    (goto-char (point-max)) ) )

(defun backward-block ()
  "Move cursor backward to previous occurrence of double newline char.
See: `forward-block'"
  (interactive)
  (skip-chars-backward "\n")
  (when (not (search-backward-regexp "\n[[:blank:]]*\n" nil t))
    (goto-char (point-min))
    )
  )

actually, you can just mod it so that it always just skip syntax
classes that's white space... but then if you have 1+1+8 that'll skip
the whole thing...

  Xah
∑ http://xahlee.org/

☄

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Understanding Word Boundaries
  2010-06-16 10:44 Understanding Word Boundaries Paul Drummond
  2010-06-16 20:07 ` Karan Bathla
  2010-06-23  9:02 ` Gary
@ 2010-06-25 10:33 ` andreas.roehler
  2 siblings, 0 replies; 20+ messages in thread
From: andreas.roehler @ 2010-06-25 10:33 UTC (permalink / raw)
  To: help-gnu-emacs

Am 16.06.2010 12:44, schrieb Paul Drummond:
> I have been an Emacs users for a few years now so definitely still a
> newbie!  While initially I struggled to control its power, I eventually came
> round.  Every issue I've had so far I've been able to fix by a quick search
> in EmacsWiki, except for one frustrating and re-occurring problem that has
> plagued me for years - word boundaries.
>
> Before Emacs I used Vim exclusively and the word boundary behaviour in Vim
> *just worked* - I didn't even have to think about it.  No matter what
> language I used I could navigate and manipulate words without thinking about
> it.  The way word boundaries work in Vim is elegant and I have spent a lot
> of time trying to find some elisp to replicate the behaviour in Emacs but to
> no avail.
>
> I could write some elisp myself but I am still very new to it so it will
> take a while - it's something I would like to do but I don't have time at
> the moment.  Regardless, an elisp solution to the problem is not the point
> of this post.  I want to understand why word boundaries behave the way they
> do in Vanilla Emacs and I would greatly appropriate some views on this from
> some Emacs Gurus!
>
> Every time I notice the word boundary behaviour when hacking in Emacs I
> wonder to myself - "I must be missing something here.  Surely, experienced
> Emacs users don't just *put up* with this!  Yet every forum response, blog
> post, mailing-list post I have read suggests they do.  This is atypical of
> the Emacs community in my experience.  Usually when something behaves wrong
> in Emacs, it's easy to find some elisp that just fixes the problem full
> stop.  Yet with word-boundaries all I can find is suggestions that fix a
> particular gripe but nothing that provides a general solution.
>
> I have loads of examples but I will mentioned just a few here to hopefully
> kick-start further discussion.
>
> ** Example 1
>
> I use org-mode for my journal and today I hit the word-boundary problem
> while entering my morning journal entry - here's a contrived example of what
> I entered:
>
> ** [10:27] Understanding Word Boundaries in Emacs
>                                     ^
> With point at the end of the word "Understanding" I hit C-w (which I bind to
> backward-kill-word) and the word "Understanding" is killed as expected.  But
> when I hit C-w again, the point kills to the colon.  Why?  Why is colon a
> word-boundary but the closing square bracket isn't?
>
> ** Example 2
>
> When editing C++ files I often need to delete the "ClassName::" part when
> declaring functions in the header:
>
> void ClassName::function();
>         ^
>
> With point at the start of ClassName I want to press M-d twice to delete
> ClassName and :: but "::" isn't recognised as a word.  In Vim I just type
> "dw" twice and it *just works*.
>

Hi,


seems not a question of word-boundaries, but a feature:

as you describe, Vim says: when word-chars are under cursor, kill them.
When non-word chars are there, kill until next word.

Interesting.


> ** Example 3
>
> I have loads of problems when deleting and navigating words over multiple
> lines.  In the following C++ code for instance:
>
>      Page *page = new _Page(this);
>      page.load();
>             ^
>
> When point is after "page", before the dot on the second line and I hit M-b
> (backward-word) point ends up at the first opening bracket of "Page(" !!!
>
> Again, vim does the right thing here - pressing 'b' takes the point to the
> closing bracket of Page(this) so it doesn't recognise the semi-colon as a
> bracket which is intuitive and what I would expect.  This is really the
> point I am trying to make.  I have never taken the time to understand the
> behaviour of word boundaries in Vim because *it just works*.  In Emacs I am
> forced to think about word boundaries because Emacs keeps surprising me with
> its weird behaviour!


Forward-moves stop after the object, backward-moves before.
When a mode defines '()' as word-characters, M-x backward-word will stop 
at the semi-colon at your example.


Andreas

>
> Note: My examples happen to be C++ but I use lots of other languages too
> including elisp, Clojure, JavaScript, Python and Java and the
> word-boundaries seem to be wrong for all of them.
>
> I have tried several different elisp solutions but each one has at least one
> feature that isn't quite right.  Here are some links I kept, I've tried many
> other solutions but don't have the links to hand:
>
> http://stackoverflow.com/questions/2078855/about-the-forward-and-backward-a-word-behaviour-in-emacs
> http://stackoverflow.com/questions/1771102/changing-emacs-forward-word-behaviour/1772365#1772365
>
> So to wrap up, the point of this post is to kick-start a discussion about
> why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1
> in my case) seem to be so awkward and unintuitive.
>
> Regards,
> Paul Drummond
>




^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-12-13  5:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-16 10:44 Understanding Word Boundaries Paul Drummond
2010-06-16 20:07 ` Karan Bathla
2010-06-17 13:37   ` Deniz Dogan
2010-06-23  9:02 ` Gary
2010-06-26 10:46   ` Paul Drummond
2010-06-26 10:53     ` Paul Drummond
2010-06-26 11:22       ` Thien-Thi Nguyen
2010-06-26 23:49       ` ken
2010-06-27  3:05         ` Deniz Dogan
2012-12-11 11:18           ` Understanding Word and Sentence Boundaries ken
2012-12-11 12:03             ` Eric Abrahamsen
2012-12-11 15:17               ` ken
2012-12-12  7:02                 ` Eric Abrahamsen
2012-12-12 14:32                   ` Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] ken
2012-12-13  4:27                     ` Eric Abrahamsen
2012-12-13  5:59                       ` Eric Abrahamsen
     [not found]         ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org>
2010-06-27 15:02           ` Understanding Word Boundaries Xah Lee
2012-12-11  2:11       ` Samuel Wales
     [not found]     ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org>
2010-06-27 14:58       ` Xah Lee
2010-06-25 10:33 ` andreas.roehler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).