* Understanding Word Boundaries @ 2010-06-16 10:44 Paul Drummond 2010-06-16 20:07 ` Karan Bathla ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Paul Drummond @ 2010-06-16 10:44 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 4326 bytes --] I have been an Emacs users for a few years now so definitely still a newbie! While initially I struggled to control its power, I eventually came round. Every issue I've had so far I've been able to fix by a quick search in EmacsWiki, except for one frustrating and re-occurring problem that has plagued me for years - word boundaries. Before Emacs I used Vim exclusively and the word boundary behaviour in Vim *just worked* - I didn't even have to think about it. No matter what language I used I could navigate and manipulate words without thinking about it. The way word boundaries work in Vim is elegant and I have spent a lot of time trying to find some elisp to replicate the behaviour in Emacs but to no avail. I could write some elisp myself but I am still very new to it so it will take a while - it's something I would like to do but I don't have time at the moment. Regardless, an elisp solution to the problem is not the point of this post. I want to understand why word boundaries behave the way they do in Vanilla Emacs and I would greatly appropriate some views on this from some Emacs Gurus! Every time I notice the word boundary behaviour when hacking in Emacs I wonder to myself - "I must be missing something here. Surely, experienced Emacs users don't just *put up* with this! Yet every forum response, blog post, mailing-list post I have read suggests they do. This is atypical of the Emacs community in my experience. Usually when something behaves wrong in Emacs, it's easy to find some elisp that just fixes the problem full stop. Yet with word-boundaries all I can find is suggestions that fix a particular gripe but nothing that provides a general solution. I have loads of examples but I will mentioned just a few here to hopefully kick-start further discussion. ** Example 1 I use org-mode for my journal and today I hit the word-boundary problem while entering my morning journal entry - here's a contrived example of what I entered: ** [10:27] Understanding Word Boundaries in Emacs ^ With point at the end of the word "Understanding" I hit C-w (which I bind to backward-kill-word) and the word "Understanding" is killed as expected. But when I hit C-w again, the point kills to the colon. Why? Why is colon a word-boundary but the closing square bracket isn't? ** Example 2 When editing C++ files I often need to delete the "ClassName::" part when declaring functions in the header: void ClassName::function(); ^ With point at the start of ClassName I want to press M-d twice to delete ClassName and :: but "::" isn't recognised as a word. In Vim I just type "dw" twice and it *just works*. ** Example 3 I have loads of problems when deleting and navigating words over multiple lines. In the following C++ code for instance: Page *page = new _Page(this); page.load(); ^ When point is after "page", before the dot on the second line and I hit M-b (backward-word) point ends up at the first opening bracket of "Page(" !!! Again, vim does the right thing here - pressing 'b' takes the point to the closing bracket of Page(this) so it doesn't recognise the semi-colon as a bracket which is intuitive and what I would expect. This is really the point I am trying to make. I have never taken the time to understand the behaviour of word boundaries in Vim because *it just works*. In Emacs I am forced to think about word boundaries because Emacs keeps surprising me with its weird behaviour! Note: My examples happen to be C++ but I use lots of other languages too including elisp, Clojure, JavaScript, Python and Java and the word-boundaries seem to be wrong for all of them. I have tried several different elisp solutions but each one has at least one feature that isn't quite right. Here are some links I kept, I've tried many other solutions but don't have the links to hand: http://stackoverflow.com/questions/2078855/about-the-forward-and-backward-a-word-behaviour-in-emacs http://stackoverflow.com/questions/1771102/changing-emacs-forward-word-behaviour/1772365#1772365 So to wrap up, the point of this post is to kick-start a discussion about why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1 in my case) seem to be so awkward and unintuitive. Regards, Paul Drummond [-- Attachment #2: Type: text/html, Size: 4922 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-16 10:44 Understanding Word Boundaries Paul Drummond @ 2010-06-16 20:07 ` Karan Bathla 2010-06-17 13:37 ` Deniz Dogan 2010-06-23 9:02 ` Gary 2010-06-25 10:33 ` andreas.roehler 2 siblings, 1 reply; 26+ messages in thread From: Karan Bathla @ 2010-06-16 20:07 UTC (permalink / raw) To: help-gnu-emacs, Paul Drummond [-- Attachment #1: Type: text/plain, Size: 5170 bytes --] I don't know about the word boundary thing in vim and elisp code for that but the behaviour of backward-kill-word is simple : kill the last word; where a word is something alphanumeric. Any non alphanumeric characters like : and ( are deleted automatically if between point and last word. There is no concept here of : or ( being word boundaries. So if you do M-d on ":67a" whole thing gets deleted and in "67a:", : remains (with point at beginning of string). --- On Wed, 6/16/10, Paul Drummond <paul.drummond@iode.co.uk> wrote: From: Paul Drummond <paul.drummond@iode.co.uk> Subject: Understanding Word Boundaries To: help-gnu-emacs@gnu.org Date: Wednesday, June 16, 2010, 4:14 PM I have been an Emacs users for a few years now so definitely still a newbie! While initially I struggled to control its power, I eventually came round. Every issue I've had so far I've been able to fix by a quick search in EmacsWiki, except for one frustrating and re-occurring problem that has plagued me for years - word boundaries. Before Emacs I used Vim exclusively and the word boundary behaviour in Vim *just worked* - I didn't even have to think about it. No matter what language I used I could navigate and manipulate words without thinking about it. The way word boundaries work in Vim is elegant and I have spent a lot of time trying to find some elisp to replicate the behaviour in Emacs but to no avail. I could write some elisp myself but I am still very new to it so it will take a while - it's something I would like to do but I don't have time at the moment. Regardless, an elisp solution to the problem is not the point of this post. I want to understand why word boundaries behave the way they do in Vanilla Emacs and I would greatly appropriate some views on this from some Emacs Gurus! Every time I notice the word boundary behaviour when hacking in Emacs I wonder to myself - "I must be missing something here. Surely, experienced Emacs users don't just *put up* with this! Yet every forum response, blog post, mailing-list post I have read suggests they do. This is atypical of the Emacs community in my experience. Usually when something behaves wrong in Emacs, it's easy to find some elisp that just fixes the problem full stop. Yet with word-boundaries all I can find is suggestions that fix a particular gripe but nothing that provides a general solution. I have loads of examples but I will mentioned just a few here to hopefully kick-start further discussion. ** Example 1 I use org-mode for my journal and today I hit the word-boundary problem while entering my morning journal entry - here's a contrived example of what I entered: ** [10:27] Understanding Word Boundaries in Emacs ^ With point at the end of the word "Understanding" I hit C-w (which I bind to backward-kill-word) and the word "Understanding" is killed as expected. But when I hit C-w again, the point kills to the colon. Why? Why is colon a word-boundary but the closing square bracket isn't? ** Example 2 When editing C++ files I often need to delete the "ClassName::" part when declaring functions in the header: void ClassName::function(); ^ With point at the start of ClassName I want to press M-d twice to delete ClassName and :: but "::" isn't recognised as a word. In Vim I just type "dw" twice and it *just works*. ** Example 3 I have loads of problems when deleting and navigating words over multiple lines. In the following C++ code for instance: Page *page = new _Page(this); page.load(); ^ When point is after "page", before the dot on the second line and I hit M-b (backward-word) point ends up at the first opening bracket of "Page(" !!! Again, vim does the right thing here - pressing 'b' takes the point to the closing bracket of Page(this) so it doesn't recognise the semi-colon as a bracket which is intuitive and what I would expect. This is really the point I am trying to make. I have never taken the time to understand the behaviour of word boundaries in Vim because *it just works*. In Emacs I am forced to think about word boundaries because Emacs keeps surprising me with its weird behaviour! Note: My examples happen to be C++ but I use lots of other languages too including elisp, Clojure, JavaScript, Python and Java and the word-boundaries seem to be wrong for all of them. I have tried several different elisp solutions but each one has at least one feature that isn't quite right. Here are some links I kept, I've tried many other solutions but don't have the links to hand: http://stackoverflow.com/questions/2078855/about-the-forward-and-backward-a-word-behaviour-in-emacs http://stackoverflow.com/questions/1771102/changing-emacs-forward-word-behaviour/1772365#1772365 So to wrap up, the point of this post is to kick-start a discussion about why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1 in my case) seem to be so awkward and unintuitive. Regards, Paul Drummond [-- Attachment #2: Type: text/html, Size: 6301 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-16 20:07 ` Karan Bathla @ 2010-06-17 13:37 ` Deniz Dogan 0 siblings, 0 replies; 26+ messages in thread From: Deniz Dogan @ 2010-06-17 13:37 UTC (permalink / raw) To: Karan Bathla; +Cc: help-gnu-emacs 2010/6/16 Karan Bathla <karan_goku@yahoo.com> > > I don't know about the word boundary thing in vim and elisp code for that but the behaviour of backward-kill-word is simple : kill the last word; where a word is something alphanumeric. Any non alphanumeric characters like : and ( are deleted automatically if between point and last word. There is no concept here of : or ( being word boundaries. > > So if you do M-d on ":67a" whole thing gets deleted and in "67a:", : remains (with point at beginning of string). > To be more specific, I think it depends on what the syntax table of the active mode looks like. You can make your own syntax table to change the behavior of "word commands" to some extent. -- Deniz Dogan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-16 10:44 Understanding Word Boundaries Paul Drummond 2010-06-16 20:07 ` Karan Bathla @ 2010-06-23 9:02 ` Gary 2010-06-26 10:46 ` Paul Drummond 2010-06-25 10:33 ` andreas.roehler 2 siblings, 1 reply; 26+ messages in thread From: Gary @ 2010-06-23 9:02 UTC (permalink / raw) To: help-gnu-emacs Paul Drummond writes: > ** Example 2 > > When editing C++ files I often need to delete the "ClassName::" part when > declaring functions in the header: > > void ClassName::function(); > ^ > > With point at the start of ClassName I want to press M-d twice to delete > ClassName and :: but "::" isn't recognised as a word. In Vim I just Twice? Three times, shirley? Class and Name are both words... (As you might guess, my pet peeve about the word boundary recognition is when programming using camelcase.) > So to wrap up, the point of this post is to kick-start a discussion about why > the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1 in my > case) seem to be so awkward and unintuitive. Because it needs to be defined somewhat differently for natural languages and different programming languages, at a guess. What a word is depends entirely on the context you (and I) decide, and they may well be different (see two versus three key presses above). ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-23 9:02 ` Gary @ 2010-06-26 10:46 ` Paul Drummond 2010-06-26 10:53 ` Paul Drummond [not found] ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 26+ messages in thread From: Paul Drummond @ 2010-06-26 10:46 UTC (permalink / raw) To: Gary; +Cc: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 997 bytes --] On 23 June 2010 10:02, Gary <help-gnu-emacs@garydjones.name> wrote: > Paul Drummond writes: > > ** Example 2 > > > > When editing C++ files I often need to delete the "ClassName::" part when > > declaring functions in the header: > > > > void ClassName::function(); > > ^ > > > > With point at the start of ClassName I want to press M-d twice to delete > > ClassName and :: but "::" isn't recognised as a word. In Vim I just > > Twice? Three times, shirley? Class and Name are both words... > Yeah, I agree about CamelCase but I wanted to keep the example simple ;) Because it needs to be defined somewhat differently for natural > languages and different programming languages, at a guess. What a word > is depends entirely on the context you (and I) decide, and they may well > be different (see two versus three key presses above). > But each context I use has a major mode and I would expect each major mode to have sensible default word boundaries but they don't. Paul Drummond. [-- Attachment #2: Type: text/html, Size: 1590 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-26 10:46 ` Paul Drummond @ 2010-06-26 10:53 ` Paul Drummond 2010-06-26 11:22 ` Thien-Thi Nguyen ` (2 more replies) [not found] ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org> 1 sibling, 3 replies; 26+ messages in thread From: Paul Drummond @ 2010-06-26 10:53 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 578 bytes --] Thanks for the responses guys. I think the point I am trying to make here is that it's a *big* task to fix word boundaries for every case (every word-related key binding multiplied by each language/major mode I use!). I presume that Emacs hackers either a) put up with it or b) spend a lot of time fixing each case until they are happy. I suspect the answer is b. ;-) I wish there was a single minor-mode that fixes all the word boundary issues for every major-mode I use! I can but dream. Or maybe I will get round to doing it myself one day! ;) Cheers, Paul Drummond [-- Attachment #2: Type: text/html, Size: 629 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-26 10:53 ` Paul Drummond @ 2010-06-26 11:22 ` Thien-Thi Nguyen 2010-06-26 23:49 ` ken 2012-12-11 2:11 ` Samuel Wales 2 siblings, 0 replies; 26+ messages in thread From: Thien-Thi Nguyen @ 2010-06-26 11:22 UTC (permalink / raw) To: Paul Drummond; +Cc: help-gnu-emacs () Paul Drummond <paul.drummond@iode.co.uk> () Sat, 26 Jun 2010 11:53:08 +0100 I suspect the answer is b. ;-) There is another answer: (c) looking at sexps instead of words. thi ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-26 10:53 ` Paul Drummond 2010-06-26 11:22 ` Thien-Thi Nguyen @ 2010-06-26 23:49 ` ken 2010-06-27 3:05 ` Deniz Dogan [not found] ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org> 2012-12-11 2:11 ` Samuel Wales 2 siblings, 2 replies; 26+ messages in thread From: ken @ 2010-06-26 23:49 UTC (permalink / raw) To: Paul Drummond; +Cc: help-gnu-emacs On 06/26/2010 06:53 AM Paul Drummond wrote: > Thanks for the responses guys. > > I think the point I am trying to make here is that it's a *big* task to > fix word boundaries for every case (every word-related key binding > multiplied by each language/major mode I use!). > > I presume that Emacs hackers either a) put up with it or b) spend a lot > of time fixing each case until they are happy. > > I suspect the answer is b. ;-) > > I wish there was a single minor-mode that fixes all the word boundary > issues for every major-mode I use! I can but dream. Or maybe I will > get round to doing it myself one day! ;) > > Cheers, > Paul Drummond Is it possible to specify word boundaries for a particular mode? -- Find research and analysis on US healthcare, health insurance, and health policy at: <http://healthpolicydaily.blogspot.com/> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-26 23:49 ` ken @ 2010-06-27 3:05 ` Deniz Dogan 2012-12-11 11:18 ` Understanding Word and Sentence Boundaries ken [not found] ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org> 1 sibling, 1 reply; 26+ messages in thread From: Deniz Dogan @ 2010-06-27 3:05 UTC (permalink / raw) To: gebser; +Cc: help-gnu-emacs 2010/6/27 ken <gebser@mousecar.com>: > > On 06/26/2010 06:53 AM Paul Drummond wrote: >> Thanks for the responses guys. >> >> I think the point I am trying to make here is that it's a *big* task to >> fix word boundaries for every case (every word-related key binding >> multiplied by each language/major mode I use!). >> >> I presume that Emacs hackers either a) put up with it or b) spend a lot >> of time fixing each case until they are happy. >> >> I suspect the answer is b. ;-) >> >> I wish there was a single minor-mode that fixes all the word boundary >> issues for every major-mode I use! I can but dream. Or maybe I will >> get round to doing it myself one day! ;) >> >> Cheers, >> Paul Drummond > > > Is it possible to specify word boundaries for a particular mode? > Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'. Regarding camel case word jumping, see subword-mode (previously known as c-subword-mode) which is part of Emacs. -- Deniz Dogan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word and Sentence Boundaries 2010-06-27 3:05 ` Deniz Dogan @ 2012-12-11 11:18 ` ken 2012-12-11 12:03 ` Eric Abrahamsen 0 siblings, 1 reply; 26+ messages in thread From: ken @ 2012-12-11 11:18 UTC (permalink / raw) To: Deniz Dogan; +Cc: help-gnu-emacs On 06/26/2010 11:05 PM Deniz Dogan wrote: > 2010/6/27 ken<gebser@mousecar.com>: >> >> On 06/26/2010 06:53 AM Paul Drummond wrote: >>> Thanks for the responses guys. >>> >>> I think the point I am trying to make here is that it's a *big* task to >>> fix word boundaries for every case (every word-related key binding >>> multiplied by each language/major mode I use!). >>> >>> I presume that Emacs hackers either a) put up with it or b) spend a lot >>> of time fixing each case until they are happy. >>> >>> I suspect the answer is b. ;-) >>> >>> I wish there was a single minor-mode that fixes all the word boundary >>> issues for every major-mode I use! I can but dream. Or maybe I will >>> get round to doing it myself one day! ;) >>> >>> Cheers, >>> Paul Drummond >> >> >> Is it possible to specify word boundaries for a particular mode? >> > > Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'. Thanks for the pointer to that function. The behavior I see in need of repair is the role of so-called "comments" in sentence syntax.</tag> For instance, immediately before this sentence are two spaces... which should signify the end of the previous sentence. But functions like "forward-sentence" and "fill-paragraph" and "backward-sentence" don't recognize it. Said another way, the "</tag>" string obscures the relationship between the period before it and the two spaces after it and so fails to see that one sentence ends and another starts. This occurs in text-mode and seems to be inherited by other modes. If I'm reading "modify-syntax-entry" correctly, the default meanings of '<' and '>' are, respectively, beginning and end of comment, so modifying them wouldn't fix this problem. Or can this be remedied by a change in the syntax table? Or is this a bug? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word and Sentence Boundaries 2012-12-11 11:18 ` Understanding Word and Sentence Boundaries ken @ 2012-12-11 12:03 ` Eric Abrahamsen 2012-12-11 15:17 ` ken 0 siblings, 1 reply; 26+ messages in thread From: Eric Abrahamsen @ 2012-12-11 12:03 UTC (permalink / raw) To: help-gnu-emacs ken <gebser@mousecar.com> writes: > On 06/26/2010 11:05 PM Deniz Dogan wrote: >> 2010/6/27 ken<gebser@mousecar.com>: >>> >>> On 06/26/2010 06:53 AM Paul Drummond wrote: >>>> Thanks for the responses guys. >>>> >>>> I think the point I am trying to make here is that it's a *big* task to >>>> fix word boundaries for every case (every word-related key binding >>>> multiplied by each language/major mode I use!). >>>> >>>> I presume that Emacs hackers either a) put up with it or b) spend a lot >>>> of time fixing each case until they are happy. >>>> >>>> I suspect the answer is b. ;-) >>>> >>>> I wish there was a single minor-mode that fixes all the word boundary >>>> issues for every major-mode I use! I can but dream. Or maybe I will >>>> get round to doing it myself one day! ;) >>>> >>>> Cheers, >>>> Paul Drummond >>> >>> >>> Is it possible to specify word boundaries for a particular mode? >>> >> >> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'. > > Thanks for the pointer to that function. > > The behavior I see in need of repair is the role of so-called "comments" > in sentence syntax.</tag> For instance, immediately before this > sentence are two spaces... which should signify the end of the > previous sentence. But functions like "forward-sentence" and > "fill-paragraph" and "backward-sentence" don't recognize it. > > Said another way, the "</tag>" string obscures the relationship > between the period before it and the two spaces after it and so fails > to see that one sentence ends and another starts. This occurs in > text-mode and seems to be inherited by other modes. > > If I'm reading "modify-syntax-entry" correctly, the default meanings > of '<' and '>' are, respectively, beginning and end of comment, so > modifying them wouldn't fix this problem. Or can this be remedied by > a change in the syntax table? Or is this a bug? For this particular case, I think you can modify the value of the `sentence-end' variable (which is returned by the `sentence-end' function? The whole thing is a little confusing). You'd probably be best off starting with the docstring for the sentence-end function, and working back from there. I think the `sentence-end' variable is automatically buffer-local, which means if you change it in a mode-hook it ought to work the way you want. I agree that the whole syntax thing feels like a very well-polished hack. E ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word and Sentence Boundaries 2012-12-11 12:03 ` Eric Abrahamsen @ 2012-12-11 15:17 ` ken 2012-12-12 7:02 ` Eric Abrahamsen 0 siblings, 1 reply; 26+ messages in thread From: ken @ 2012-12-11 15:17 UTC (permalink / raw) To: Eric Abrahamsen, GNU Emacs List On 12/11/2012 07:03 AM Eric Abrahamsen wrote: > ken<gebser@mousecar.com> writes: > >> On 06/26/2010 11:05 PM Deniz Dogan wrote: >>> 2010/6/27 ken<gebser@mousecar.com>: >>>> >>>> On 06/26/2010 06:53 AM Paul Drummond wrote: >>>>> Thanks for the responses guys. >>>>> >>>>> .... >>>>> >>>>> I presume that Emacs hackers either a) put up with it or b) spend a lot >>>>> of time fixing each case until they are happy. >>>>> >>>>> I suspect the answer is b. ;-) >>>>> >>>>> I wish there was a single minor-mode that fixes all the word boundary >>>>> issues for every major-mode I use! I can but dream. Or maybe I will >>>>> get round to doing it myself one day! ;) >>>>> >>>>> Cheers, >>>>> Paul Drummond >>>> >>>> >>>> Is it possible to specify word boundaries for a particular mode? >>>> >>> >>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'. >> >> Thanks for the pointer to that function. >> >> The behavior I see in need of repair is the role of so-called "comments" >> in sentence syntax.</tag> For instance, immediately before this >> sentence are two spaces... which should signify the end of the >> previous sentence. But functions like "forward-sentence" and >> "fill-paragraph" and "backward-sentence" don't recognize it. >> >> Said another way, the "</tag>" string obscures the relationship >> between the period before it and the two spaces after it and so fails >> to see that one sentence ends and another starts. This occurs in >> text-mode and seems to be inherited by other modes. >> >> If I'm reading "modify-syntax-entry" correctly, the default meanings >> of '<' and'>' are, respectively, beginning and end of comment, so >> modifying them wouldn't fix this problem. Or can this be remedied by >> a change in the syntax table? Or is this a bug? > > For this particular case, I think you can modify the value of the > `sentence-end' variable (which is returned by the `sentence-end' > function? The whole thing is a little confusing). You'd probably be best > off starting with the docstring for the sentence-end function, and > working back from there. > > I think the `sentence-end' variable is automatically buffer-local, which > means if you change it in a mode-hook it ought to work the way you want. > I agree that the whole syntax thing feels like a very well-polished > hack. > > E Eric, Yes, that would be the variable to adjust. I took a hard look at it and discussed it (I believe) on this list years ago, but never came up with a fix. As I see it, there are two problems: First, "one" of the items in that RE would need to be "zero or more consecutive instances of '<' followed by any number of other characters up until the next '>' is found." E.g., the RE would need to be able to find the end of this sentence</b></i>.)</q></p></span></div> Though I've used REs successfully in quite a few instances and so with a small bit of help could probably figure that part out, there's a second issue. My considered opinion is that in the above and similar examples, the end of the sentence is immediately after the period ('.')... or question mark, exclamation mark, etc. and not after the </div>. That is where the point should go when forward-sentence is executed. This means that no RE would work because, once it finds the RE-defined sentence-end, it then needs to go backwards within the found string until it encounters [.!?]+ and then search forward again to the first character after. IOW, unless I'm missing some capability of REs, "sentence-end" needs to be a function rather than an RE and would be a different function than one which finds the beginning of a sentence. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word and Sentence Boundaries 2012-12-11 15:17 ` ken @ 2012-12-12 7:02 ` Eric Abrahamsen 2012-12-12 14:32 ` Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] ken 0 siblings, 1 reply; 26+ messages in thread From: Eric Abrahamsen @ 2012-12-12 7:02 UTC (permalink / raw) To: help-gnu-emacs ken <gebser@mousecar.com> writes: > On 12/11/2012 07:03 AM Eric Abrahamsen wrote: >> ken<gebser@mousecar.com> writes: >> >>> On 06/26/2010 11:05 PM Deniz Dogan wrote: >>>> 2010/6/27 ken<gebser@mousecar.com>: >>>>> >>>>> On 06/26/2010 06:53 AM Paul Drummond wrote: >>>>>> Thanks for the responses guys. >>>>>> >>>>>> .... >>>>>> >>>>>> I presume that Emacs hackers either a) put up with it or b) spend a lot >>>>>> of time fixing each case until they are happy. >>>>>> >>>>>> I suspect the answer is b. ;-) >>>>>> >>>>>> I wish there was a single minor-mode that fixes all the word boundary >>>>>> issues for every major-mode I use! I can but dream. Or maybe I will >>>>>> get round to doing it myself one day! ;) >>>>>> >>>>>> Cheers, >>>>>> Paul Drummond >>>>> >>>>> >>>>> Is it possible to specify word boundaries for a particular mode? >>>>> >>>> >>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'. >>> >>> Thanks for the pointer to that function. >>> >>> The behavior I see in need of repair is the role of so-called "comments" >>> in sentence syntax.</tag> For instance, immediately before this >>> sentence are two spaces... which should signify the end of the >>> previous sentence. But functions like "forward-sentence" and >>> "fill-paragraph" and "backward-sentence" don't recognize it. >>> >>> Said another way, the "</tag>" string obscures the relationship >>> between the period before it and the two spaces after it and so fails >>> to see that one sentence ends and another starts. This occurs in >>> text-mode and seems to be inherited by other modes. >>> >>> If I'm reading "modify-syntax-entry" correctly, the default meanings >>> of '<' and'>' are, respectively, beginning and end of comment, so >>> modifying them wouldn't fix this problem. Or can this be remedied by >>> a change in the syntax table? Or is this a bug? >> >> For this particular case, I think you can modify the value of the >> `sentence-end' variable (which is returned by the `sentence-end' >> function? The whole thing is a little confusing). You'd probably be best >> off starting with the docstring for the sentence-end function, and >> working back from there. >> >> I think the `sentence-end' variable is automatically buffer-local, which >> means if you change it in a mode-hook it ought to work the way you want. >> I agree that the whole syntax thing feels like a very well-polished >> hack. >> >> E > > Eric, > > Yes, that would be the variable to adjust. I took a hard look at it > and discussed it (I believe) on this list years ago, but never came up > with a fix. As I see it, there are two problems: > > First, "one" of the items in that RE would need to be "zero or more > consecutive instances of '<' followed by any number of other > characters up until the next '>' is found." E.g., the RE would need > to be able to find the end of this > sentence</b></i>.)</q></p></span></div> Though I've used REs > successfully in quite a few instances and so with a small bit of help > could probably figure that part out, there's a second issue. > > My considered opinion is that in the above and similar examples, the > end of the sentence is immediately after the period ('.')... or > question mark, exclamation mark, etc. and not after the </div>. That > is where the point should go when forward-sentence is executed. This > means that no RE would work because, once it finds the RE-defined > sentence-end, it then needs to go backwards within the found string > until it encounters [.!?]+ and then search forward again to the first > character after. IOW, unless I'm missing some capability of REs, > "sentence-end" needs to be a function rather than an RE and would be a > different function than one which finds the beginning of a sentence. I'm getting way out of my depth here, both regarding regexps and emacs' sentence-related shenanigans, but you could consider advising the `sentence-end' function so that it checks current the major mode, and delegates to a different sentence-end function depending on the mode (or declines to handle and bails to the built-in sentence-end). The individual mode-specific sentence-end functions look at the text after point, and return a different regexp every time, one specifically tailored to this particular sentence in this particular mode. The call to `forward-sentence' or whatever happily uses a different regexp every time it is called. Feels hacky, but I guess `sentence-end' is already doing this in a sense -- potentially returning a different regexp every time. My brain is exhausted! E ^ permalink raw reply [flat|nested] 26+ messages in thread
* Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] 2012-12-12 7:02 ` Eric Abrahamsen @ 2012-12-12 14:32 ` ken 2012-12-13 4:27 ` Eric Abrahamsen 0 siblings, 1 reply; 26+ messages in thread From: ken @ 2012-12-12 14:32 UTC (permalink / raw) To: GNU Emacs List On 12/12/2012 02:02 AM Eric Abrahamsen wrote: > ken<gebser@mousecar.com> writes: > >> On 12/11/2012 07:03 AM Eric Abrahamsen wrote: >>> ken<gebser@mousecar.com> writes: >>> >>>> On 06/26/2010 11:05 PM Deniz Dogan wrote: >>>>> 2010/6/27 ken<gebser@mousecar.com>: >>>>>> >>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote: >>>>>>> Thanks for the responses guys. >>>>>>> >>>>>>> .... >>>>>>> >>>>>> Is it possible to specify word boundaries for a particular mode? >>>>>> >>>>> >>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'. >>>> >>>> Thanks for the pointer to that function. >>>> >>>> The behavior I see in need of repair is the role of so-called "comments" >>>> in sentence syntax.</tag> For instance, immediately before this >>>> sentence are two spaces... which should signify the end of the >>>> previous sentence. But functions like "forward-sentence" and >>>> "fill-paragraph" and "backward-sentence" don't recognize it. >>>> >>>> Said another way, the "</tag>" string obscures the relationship >>>> between the period before it and the two spaces after it and so fails >>>> to see that one sentence ends and another starts. This occurs in >>>> text-mode and seems to be inherited by other modes. >>>> >>>> If I'm reading "modify-syntax-entry" correctly, the default meanings >>>> of '<' and'>' are, respectively, beginning and end of comment, so >>>> modifying them wouldn't fix this problem. Or can this be remedied by >>>> a change in the syntax table? Or is this a bug? >>> >>> For this particular case, I think you can modify the value of the >>> `sentence-end' variable (which is returned by the `sentence-end' >>> function? The whole thing is a little confusing). You'd probably be best >>> off starting with the docstring for the sentence-end function, and >>> working back from there. >>> >>> I think the `sentence-end' variable is automatically buffer-local, which >>> means if you change it in a mode-hook it ought to work the way you want. >>> I agree that the whole syntax thing feels like a very well-polished >>> hack. >>> >>> E >> >> Eric, >> >> Yes, that would be the variable to adjust. I took a hard look at it >> and discussed it (I believe) on this list years ago, but never came up >> with a fix. As I see it, there are two problems: >> >> First, "one" of the items in that RE would need to be "zero or more >> consecutive instances of '<' followed by any number of other >> characters up until the next '>' is found." E.g., the RE would need >> to be able to find the end of this >> sentence</b></i>.)</q></p></span></div> Though I've used REs >> successfully in quite a few instances and so with a small bit of help >> could probably figure that part out, there's a second issue. >> [In my original post the paragraph below was unclear. So changed it.] >> My considered opinion is that in the above and similar examples, the >> end of the sentence is immediately after the period ('.')... or >> question mark, exclamation mark, etc. and not after the</div>. That >> is where the point should go when forward-sentence is executed. This >> means that no RE would work because, once it finds the RE-defined >> sentence-end, it then needs to go backwards within the found string >> until it encounters [.!?]+ and then move the mark one char forward to the >> character after. IOW, unless I'm missing some capability of REs, >> "sentence-end" needs to be a function rather than an RE and would be a >> different function than one which finds the beginning of a sentence. > > I'm getting way out of my depth here, both regarding regexps and emacs' > sentence-related shenanigans, but you could consider advising the > `sentence-end' function so that it checks current the major mode, and > delegates to a different sentence-end function depending on the mode (or > declines to handle and bails to the built-in sentence-end). > > The individual mode-specific sentence-end functions look at the text > after point, and return a different regexp every time, one specifically > tailored to this particular sentence in this particular mode. The call to > `forward-sentence' or whatever happily uses a different regexp every > time it is called. > > Feels hacky, but I guess `sentence-end' is already doing this in a > sense -- potentially returning a different regexp every time. > > My brain is exhausted! > > E If one were to write a mode-specific replacement for the existing "forward-sentence" and "sentence-end", what are some ways in elisp to ensure that they're invoked when working in that mode? Would it be enough to include (the recoded) "forward-sentence" and "sentence-end" in the code for that mode...? or would some kind of specific hook language need to be included in ~/.emacs? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] 2012-12-12 14:32 ` Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] ken @ 2012-12-13 4:27 ` Eric Abrahamsen 2012-12-13 5:59 ` Eric Abrahamsen 0 siblings, 1 reply; 26+ messages in thread From: Eric Abrahamsen @ 2012-12-13 4:27 UTC (permalink / raw) To: help-gnu-emacs ken <gebser@mousecar.com> writes: > On 12/12/2012 02:02 AM Eric Abrahamsen wrote: >> ken<gebser@mousecar.com> writes: >> >>> On 12/11/2012 07:03 AM Eric Abrahamsen wrote: >>>> ken<gebser@mousecar.com> writes: >>>> >>>>> On 06/26/2010 11:05 PM Deniz Dogan wrote: >>>>>> 2010/6/27 ken<gebser@mousecar.com>: >>>>>>> >>>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote: >>>>>>>> Thanks for the responses guys. >>>>>>>> >>>>>>>> .... >>>>>>>> >>>>>>> Is it possible to specify word boundaries for a particular mode? >>>>>>> >>>>>> >>>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'. >>>>> >>>>> Thanks for the pointer to that function. >>>>> >>>>> The behavior I see in need of repair is the role of so-called "comments" >>>>> in sentence syntax.</tag> For instance, immediately before this >>>>> sentence are two spaces... which should signify the end of the >>>>> previous sentence. But functions like "forward-sentence" and >>>>> "fill-paragraph" and "backward-sentence" don't recognize it. >>>>> >>>>> Said another way, the "</tag>" string obscures the relationship >>>>> between the period before it and the two spaces after it and so fails >>>>> to see that one sentence ends and another starts. This occurs in >>>>> text-mode and seems to be inherited by other modes. >>>>> >>>>> If I'm reading "modify-syntax-entry" correctly, the default meanings >>>>> of '<' and'>' are, respectively, beginning and end of comment, so >>>>> modifying them wouldn't fix this problem. Or can this be remedied by >>>>> a change in the syntax table? Or is this a bug? >>>> >>>> For this particular case, I think you can modify the value of the >>>> `sentence-end' variable (which is returned by the `sentence-end' >>>> function? The whole thing is a little confusing). You'd probably be best >>>> off starting with the docstring for the sentence-end function, and >>>> working back from there. >>>> >>>> I think the `sentence-end' variable is automatically buffer-local, which >>>> means if you change it in a mode-hook it ought to work the way you want. >>>> I agree that the whole syntax thing feels like a very well-polished >>>> hack. >>>> >>>> E >>> >>> Eric, >>> >>> Yes, that would be the variable to adjust. I took a hard look at it >>> and discussed it (I believe) on this list years ago, but never came up >>> with a fix. As I see it, there are two problems: >>> >>> First, "one" of the items in that RE would need to be "zero or more >>> consecutive instances of '<' followed by any number of other >>> characters up until the next '>' is found." E.g., the RE would need >>> to be able to find the end of this >>> sentence</b></i>.)</q></p></span></div> Though I've used REs >>> successfully in quite a few instances and so with a small bit of help >>> could probably figure that part out, there's a second issue. >>> > > [In my original post the paragraph below was unclear. So changed it.] > >>> My considered opinion is that in the above and similar examples, the >>> end of the sentence is immediately after the period ('.')... or >>> question mark, exclamation mark, etc. and not after the</div>. That >>> is where the point should go when forward-sentence is executed. This >>> means that no RE would work because, once it finds the RE-defined >>> sentence-end, it then needs to go backwards within the found string >>> until it encounters [.!?]+ and then move the mark one char forward to the >>> character after. IOW, unless I'm missing some capability of REs, >>> "sentence-end" needs to be a function rather than an RE and would be a >>> different function than one which finds the beginning of a sentence. >> >> I'm getting way out of my depth here, both regarding regexps and emacs' >> sentence-related shenanigans, but you could consider advising the >> `sentence-end' function so that it checks current the major mode, and >> delegates to a different sentence-end function depending on the mode (or >> declines to handle and bails to the built-in sentence-end). >> >> The individual mode-specific sentence-end functions look at the text >> after point, and return a different regexp every time, one specifically >> tailored to this particular sentence in this particular mode. The call to >> `forward-sentence' or whatever happily uses a different regexp every >> time it is called. >> >> Feels hacky, but I guess `sentence-end' is already doing this in a >> sense -- potentially returning a different regexp every time. >> >> My brain is exhausted! >> >> E > > If one were to write a mode-specific replacement for the existing > "forward-sentence" and "sentence-end", what are some ways in elisp to > ensure that they're invoked when working in that mode? Would it be > enough to include (the recoded) "forward-sentence" and "sentence-end" > in the code for that mode...? or would some kind of specific hook > language need to be included in ~/.emacs? I was considering overloading the `sentence-end' function in a mode-hook, but I think it's highly likely that you'd end up polluting other modes. So probably the safest thing to do is to advise it at the top level, ie in your ~/.emacs file, and then check current mode from there. Something like the following totally untested code: --8<---------------cut here---------------start------------->8--- (defadvice sentence-end (before my-check-sentence-end activate) "Possibly short-circuit the `sentence-end' function." (cond ((derived-mode-p 'emacs-lisp-mode) (emacs-lisp-sentence-end)) ((derived-mode-p 'some-other-mode) (other-mode-sentence-end)) (t ad-do-it))) (defun emacs-lisp-sentence-end () ;; examine text around point and return an appropriate regexp ) (defun other-mode-sentence-end () ;; return a different regexp ) --8<---------------cut here---------------end--------------->8--- That ought to work, but I'm not guaranteeing that this is the best approach! E ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] 2012-12-13 4:27 ` Eric Abrahamsen @ 2012-12-13 5:59 ` Eric Abrahamsen 0 siblings, 0 replies; 26+ messages in thread From: Eric Abrahamsen @ 2012-12-13 5:59 UTC (permalink / raw) To: help-gnu-emacs Eric Abrahamsen <eric@ericabrahamsen.net> writes: > ken <gebser@mousecar.com> writes: > >> On 12/12/2012 02:02 AM Eric Abrahamsen wrote: >>> ken<gebser@mousecar.com> writes: >>> >>>> On 12/11/2012 07:03 AM Eric Abrahamsen wrote: >>>>> ken<gebser@mousecar.com> writes: >>>>> >>>>>> On 06/26/2010 11:05 PM Deniz Dogan wrote: >>>>>>> 2010/6/27 ken<gebser@mousecar.com>: >>>>>>>> >>>>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote: >>>>>>>>> Thanks for the responses guys. >>>>>>>>> >>>>>>>>> .... >>>>>>>>> >>>>>>>> Is it possible to specify word boundaries for a particular mode? >>>>>>>> >>>>>>> >>>>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'. >>>>>> >>>>>> Thanks for the pointer to that function. >>>>>> >>>>>> The behavior I see in need of repair is the role of so-called "comments" >>>>>> in sentence syntax.</tag> For instance, immediately before this >>>>>> sentence are two spaces... which should signify the end of the >>>>>> previous sentence. But functions like "forward-sentence" and >>>>>> "fill-paragraph" and "backward-sentence" don't recognize it. >>>>>> >>>>>> Said another way, the "</tag>" string obscures the relationship >>>>>> between the period before it and the two spaces after it and so fails >>>>>> to see that one sentence ends and another starts. This occurs in >>>>>> text-mode and seems to be inherited by other modes. >>>>>> >>>>>> If I'm reading "modify-syntax-entry" correctly, the default meanings >>>>>> of '<' and'>' are, respectively, beginning and end of comment, so >>>>>> modifying them wouldn't fix this problem. Or can this be remedied by >>>>>> a change in the syntax table? Or is this a bug? >>>>> >>>>> For this particular case, I think you can modify the value of the >>>>> `sentence-end' variable (which is returned by the `sentence-end' >>>>> function? The whole thing is a little confusing). You'd probably be best >>>>> off starting with the docstring for the sentence-end function, and >>>>> working back from there. >>>>> >>>>> I think the `sentence-end' variable is automatically buffer-local, which >>>>> means if you change it in a mode-hook it ought to work the way you want. >>>>> I agree that the whole syntax thing feels like a very well-polished >>>>> hack. >>>>> >>>>> E >>>> >>>> Eric, >>>> >>>> Yes, that would be the variable to adjust. I took a hard look at it >>>> and discussed it (I believe) on this list years ago, but never came up >>>> with a fix. As I see it, there are two problems: >>>> >>>> First, "one" of the items in that RE would need to be "zero or more >>>> consecutive instances of '<' followed by any number of other >>>> characters up until the next '>' is found." E.g., the RE would need >>>> to be able to find the end of this >>>> sentence</b></i>.)</q></p></span></div> Though I've used REs >>>> successfully in quite a few instances and so with a small bit of help >>>> could probably figure that part out, there's a second issue. >>>> >> >> [In my original post the paragraph below was unclear. So changed it.] >> >>>> My considered opinion is that in the above and similar examples, the >>>> end of the sentence is immediately after the period ('.')... or >>>> question mark, exclamation mark, etc. and not after the</div>. That >>>> is where the point should go when forward-sentence is executed. This >>>> means that no RE would work because, once it finds the RE-defined >>>> sentence-end, it then needs to go backwards within the found string >>>> until it encounters [.!?]+ and then move the mark one char forward to the >>>> character after. IOW, unless I'm missing some capability of REs, >>>> "sentence-end" needs to be a function rather than an RE and would be a >>>> different function than one which finds the beginning of a sentence. >>> >>> I'm getting way out of my depth here, both regarding regexps and emacs' >>> sentence-related shenanigans, but you could consider advising the >>> `sentence-end' function so that it checks current the major mode, and >>> delegates to a different sentence-end function depending on the mode (or >>> declines to handle and bails to the built-in sentence-end). >>> >>> The individual mode-specific sentence-end functions look at the text >>> after point, and return a different regexp every time, one specifically >>> tailored to this particular sentence in this particular mode. The call to >>> `forward-sentence' or whatever happily uses a different regexp every >>> time it is called. >>> >>> Feels hacky, but I guess `sentence-end' is already doing this in a >>> sense -- potentially returning a different regexp every time. >>> >>> My brain is exhausted! >>> >>> E >> >> If one were to write a mode-specific replacement for the existing >> "forward-sentence" and "sentence-end", what are some ways in elisp to >> ensure that they're invoked when working in that mode? Would it be >> enough to include (the recoded) "forward-sentence" and "sentence-end" >> in the code for that mode...? or would some kind of specific hook >> language need to be included in ~/.emacs? > > I was considering overloading the `sentence-end' function in a > mode-hook, but I think it's highly likely that you'd end up polluting > other modes. So probably the safest thing to do is to advise it at the > top level, ie in your ~/.emacs file, and then check current mode from > there. Something like the following totally untested code: > > (defadvice sentence-end (before my-check-sentence-end activate) > "Possibly short-circuit the `sentence-end' function." > (cond ((derived-mode-p 'emacs-lisp-mode) > (emacs-lisp-sentence-end)) > ((derived-mode-p 'some-other-mode) > (other-mode-sentence-end)) > (t ad-do-it))) I'm in the habit of using `derived-mode-p' but on second thought, you'll probably just want to go with the simpler, but more exacting: (eq major-mode 'emacs-lisp-mode) ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org>]
* Re: Understanding Word Boundaries [not found] ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org> @ 2010-06-27 15:02 ` Xah Lee 0 siblings, 0 replies; 26+ messages in thread From: Xah Lee @ 2010-06-27 15:02 UTC (permalink / raw) To: help-gnu-emacs On Jun 26, 8:05 pm, Deniz Dogan <deniz.a.m.do...@gmail.com> wrote: > Regarding camel case word jumping, see subword-mode (previously known > as c-subword-mode) which is part of Emacs. Thanks for the info on subword-mode! great discovery. Few years ago i searched the web and found one or two camelCase mode, i installed it and it works, but now a bundled package is much better! thanks. Xah ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-26 10:53 ` Paul Drummond 2010-06-26 11:22 ` Thien-Thi Nguyen 2010-06-26 23:49 ` ken @ 2012-12-11 2:11 ` Samuel Wales 2 siblings, 0 replies; 26+ messages in thread From: Samuel Wales @ 2012-12-11 2:11 UTC (permalink / raw) To: Paul Drummond; +Cc: help-gnu-emacs On 6/26/10, Paul Drummond <paul.drummond@iode.co.uk> wrote: > I wish there was a single minor-mode that fixes all the word boundary issues > for every major-mode I use! I can but dream. Or maybe I will get round to > doing it myself one day! ;) I have been using Emacs for decades, but I have not gotten used to its navigation, killing, and marking boundary assumptions yet. I'm always fixing up whitespace, going back and deleting less so as not to delete punctuation, wanting the whole word or only part of it, etc. I think Emacs does the wrong thing somewhat more than it does the right thing in this case. Or maybe that is because it is more noticeable when it does the wrong thing. I keep thinking I should have gotten used to it by now. :) Given the great libraries out there for other things (e.g. scrolling), you'd think there might be a customizable library for different preferences for all syntax levels, perhaps based on thingatpt. Did you find anything, Paul? Samuel -- The Kafka Pandemic: http://thekafkapandemic.blogspot.com The disease DOES progress. MANY people have died from it. ANYBODY can get it. There is no hope without action. ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org>]
* Re: Understanding Word Boundaries [not found] ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org> @ 2010-06-27 14:58 ` Xah Lee 0 siblings, 0 replies; 26+ messages in thread From: Xah Lee @ 2010-06-27 14:58 UTC (permalink / raw) To: help-gnu-emacs On Jun 26, 3:53 am, Paul Drummond <paul.drumm...@iode.co.uk> wrote: > Thanks for the responses guys. > > I think the point I am trying to make here is that it's a *big* task to fix > word boundaries for every case (every word-related key binding multiplied by > each language/major mode I use!). > > I presume that Emacs hackers either a) put up with it or b) spend a lot of > time fixing each case until they are happy. > > I suspect the answer is b. ;-) > > I wish there was a single minor-mode that fixes all the word boundary issues > for every major-mode I use! I can but dream. Or maybe I will get round to > doing it myself one day! ;) Heres the answer again in case you missed it. • Text Editor's Cursor Movement Behavior (emacs, vi, Notepad++) http://xahlee.org/emacs/text_editor_cursor_behavior.html plain text version follows. ------------------------------------- Text Editor's Cursor Movement Behavior (emacs, vi, Notepad++) Xah Lee, 2010-06-17 This article discusses some differences of cursor movement behavior among editors. That is, when you press “Ctrl+→”, on a line of programing language code with lots of different sequence of symbols, where exactly does the cursor stop at? -------------------------------------------------- Always End at Beginning of Word? Type the following in your favorite text editor. something in the water does not compute Now, you can try the word movement in different editors. I tested this on Notepad, Notepad++, vim, emacs, Mac's TextEdit. In Notepad, Notepad++, vim, the cursor always ends at the beginning of each word. In emacs, TextEdit, Xcode, they end in the beginning of the word if you are moving backward, but ends at the end of the word if you are moving forward. That's the first major difference. -------------------------------------------------- Does Movement Depends on the Language Mode? Now, try this line: something !! in @@ the ## water $$ does %% not ^^ compute Now, vim and Notepad++ 's behavior are identical. Their behavior is pretty simple and like before. They simply put the cursor at the beginning of each string sequence, doesn't matter what the characters are. Notepad is similar, except that it will move into between %%. Emacs, TextEdit behaved similarly. Emacs will skip the symbol clusters !!, @@, ##, ^^ entirely, while stopping at boundaries of $$ and %%. (when emacs is in text-mode) TextEdit will stop in middle of $ $ and ^^, but skip the other symbol clusters entirely. I don't know about other editors, but i understand the behavior of emacs well. Emacs has a syntax table concept. Each and every character is classified into one of “whitespace”, “word”, “symbol”, “punctuation”, and others. When you use backward-word, it simply move untill it reaches a char that's not in the “word” group. Each major mode's value of syntax table are usually different. So, depending on which mode you are in, it'll either skip a character sequence of identical chars entirely, or stop at their boundary. (info "(elisp) Syntax Tables") The question is whether other editor's word movement behavior changes depending on the what language mode it is currently in. And if so, how the behavior changes? do they use a concept similar to emacs's syntax table? In Notepad++, cursor word-motion behavior does not change with respect to what language mode you are in. Some 5 min test shows nor for vim. -------------------------------------------------- More Test Now, create a file of this content for more test. something in the water does not compute something !! in @@ the ## water $$ does %% not ^^ compute something!!in@@the##water$$does%%not^^compute (defun insert-p-tag () "Insert <p></p> at cursor point." (interactive) (insert "<p></p>") (backward-char 4)) for (my $i = 0; $i < 9; $i++) { print "done!";} <a><b>a b c</b> d e</a> Answer this: * Does the positions the cursor stop depends on whether you are moving left or right? * Does the word motion behavior change depending on what language mode you are in? * What is your editor? on what OS? -------------------------------------------------- Which is More Efficient? Now, the interesting question is which model is more efficient for general everyday coding of different languages. First question is: is it more efficient in general for left/right word motions to always land in the left boundary the word as in vim, Notepad, Notepad++ ? Certainly i think it is more intuitive that way. But otherwise i don't know. The second question is: whether it is good to have the movement change depending on the language mode. I don't know. But again it seems more intuitive that way, because users have good expectation where the cursor will stop regardless what language he's coding. Though, of course it MAY be less efficient, because logically one'd think that it might be better to have word motion behavior adopt to different language. But am not sure about this in real world situations. Though, i do find emacs syntax table annoying from my experience of working with it a bit in the past few years... from the little i know, i felt that it doesn't do much, its power to model syntax is quite weak, and very complicated to use... but i don't know for sure. This article is inspired from Paul Drummond question in gnu.emacs.help -------------------------------------------------- 2010-06-18 On 2010-06-17, Elena <egarr...@gmail.com> wrote: is there some elisp code to move by tokens when a programming mode is active? For instance, in the following C code: double value = f (); the point - represented by | - would move like this: |double value = f (); double |value = f (); double value |= f (); double value = |f (); double value = f |(); double value = f (|); double value = f ()|; cc-mode has functions c-forward-token-1 and c-forward-token-2. (thanks to Andreas Politz) It is easy to write a elisp code to do what you want, though, might be tedious depending on what you mean by token, and whether you really want the cursor to move by token. (might be too many stops) Here's a function i wrote and have been using it for a couple of years. You can mod it to get what u want. Basically that's the idea. But depending what you mean by token, might be tedious to get it right. (defun forward-block () "Move cursor forward to next occurrence of double newline char. In most major modes, this is the same as `forward-paragraph', however, this function behaves the same in any mode. forward-paragraph is mode dependent, because it depends on syntax table that has different meaning for “paragraph” depending on mode." (interactive) (skip-chars-forward "\n") (when (not (search-forward-regexp "\n[[:blank:]]*\n" nil t)) (goto-char (point-max)) ) ) (defun backward-block () "Move cursor backward to previous occurrence of double newline char. See: `forward-block'" (interactive) (skip-chars-backward "\n") (when (not (search-backward-regexp "\n[[:blank:]]*\n" nil t)) (goto-char (point-min)) ) ) actually, you can just mod it so that it always just skip syntax classes that's white space... but then if you have 1+1+8 that'll skip the whole thing... Xah ∑ http://xahlee.org/ ☄ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-16 10:44 Understanding Word Boundaries Paul Drummond 2010-06-16 20:07 ` Karan Bathla 2010-06-23 9:02 ` Gary @ 2010-06-25 10:33 ` andreas.roehler 2 siblings, 0 replies; 26+ messages in thread From: andreas.roehler @ 2010-06-25 10:33 UTC (permalink / raw) To: help-gnu-emacs Am 16.06.2010 12:44, schrieb Paul Drummond: > I have been an Emacs users for a few years now so definitely still a > newbie! While initially I struggled to control its power, I eventually came > round. Every issue I've had so far I've been able to fix by a quick search > in EmacsWiki, except for one frustrating and re-occurring problem that has > plagued me for years - word boundaries. > > Before Emacs I used Vim exclusively and the word boundary behaviour in Vim > *just worked* - I didn't even have to think about it. No matter what > language I used I could navigate and manipulate words without thinking about > it. The way word boundaries work in Vim is elegant and I have spent a lot > of time trying to find some elisp to replicate the behaviour in Emacs but to > no avail. > > I could write some elisp myself but I am still very new to it so it will > take a while - it's something I would like to do but I don't have time at > the moment. Regardless, an elisp solution to the problem is not the point > of this post. I want to understand why word boundaries behave the way they > do in Vanilla Emacs and I would greatly appropriate some views on this from > some Emacs Gurus! > > Every time I notice the word boundary behaviour when hacking in Emacs I > wonder to myself - "I must be missing something here. Surely, experienced > Emacs users don't just *put up* with this! Yet every forum response, blog > post, mailing-list post I have read suggests they do. This is atypical of > the Emacs community in my experience. Usually when something behaves wrong > in Emacs, it's easy to find some elisp that just fixes the problem full > stop. Yet with word-boundaries all I can find is suggestions that fix a > particular gripe but nothing that provides a general solution. > > I have loads of examples but I will mentioned just a few here to hopefully > kick-start further discussion. > > ** Example 1 > > I use org-mode for my journal and today I hit the word-boundary problem > while entering my morning journal entry - here's a contrived example of what > I entered: > > ** [10:27] Understanding Word Boundaries in Emacs > ^ > With point at the end of the word "Understanding" I hit C-w (which I bind to > backward-kill-word) and the word "Understanding" is killed as expected. But > when I hit C-w again, the point kills to the colon. Why? Why is colon a > word-boundary but the closing square bracket isn't? > > ** Example 2 > > When editing C++ files I often need to delete the "ClassName::" part when > declaring functions in the header: > > void ClassName::function(); > ^ > > With point at the start of ClassName I want to press M-d twice to delete > ClassName and :: but "::" isn't recognised as a word. In Vim I just type > "dw" twice and it *just works*. > Hi, seems not a question of word-boundaries, but a feature: as you describe, Vim says: when word-chars are under cursor, kill them. When non-word chars are there, kill until next word. Interesting. > ** Example 3 > > I have loads of problems when deleting and navigating words over multiple > lines. In the following C++ code for instance: > > Page *page = new _Page(this); > page.load(); > ^ > > When point is after "page", before the dot on the second line and I hit M-b > (backward-word) point ends up at the first opening bracket of "Page(" !!! > > Again, vim does the right thing here - pressing 'b' takes the point to the > closing bracket of Page(this) so it doesn't recognise the semi-colon as a > bracket which is intuitive and what I would expect. This is really the > point I am trying to make. I have never taken the time to understand the > behaviour of word boundaries in Vim because *it just works*. In Emacs I am > forced to think about word boundaries because Emacs keeps surprising me with > its weird behaviour! Forward-moves stop after the object, backward-moves before. When a mode defines '()' as word-characters, M-x backward-word will stop at the semi-colon at your example. Andreas > > Note: My examples happen to be C++ but I use lots of other languages too > including elisp, Clojure, JavaScript, Python and Java and the > word-boundaries seem to be wrong for all of them. > > I have tried several different elisp solutions but each one has at least one > feature that isn't quite right. Here are some links I kept, I've tried many > other solutions but don't have the links to hand: > > http://stackoverflow.com/questions/2078855/about-the-forward-and-backward-a-word-behaviour-in-emacs > http://stackoverflow.com/questions/1771102/changing-emacs-forward-word-behaviour/1772365#1772365 > > So to wrap up, the point of this post is to kick-start a discussion about > why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1 > in my case) seem to be so awkward and unintuitive. > > Regards, > Paul Drummond > ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <mailman.1.1276717938.15244.help-gnu-emacs@gnu.org>]
* Re: Understanding Word Boundaries [not found] <mailman.1.1276717938.15244.help-gnu-emacs@gnu.org> @ 2010-06-17 2:20 ` Stefan Monnier 2010-06-18 7:24 ` Uday S Reddy 2010-06-17 10:43 ` Uday S Reddy ` (2 subsequent siblings) 3 siblings, 1 reply; 26+ messages in thread From: Stefan Monnier @ 2010-06-17 2:20 UTC (permalink / raw) To: help-gnu-emacs > I have been an Emacs users for a few years now so definitely still a > newbie! While initially I struggled to control its power, I eventually came > round. Every issue I've had so far I've been able to fix by a quick search > in EmacsWiki, except for one frustrating and re-occurring problem that has > plagued me for years - word boundaries. Emacs doesn't so much care about word-boundaries as about words. So when you forward-word, it just skip until the end of the next word, where "abc" is a word, but ";-( )" is not. So in many cases, it ends up doing in one step what VI would do in to: first skip over the non-word chars, and then skip the next few word-chars, whereas VI would stop after the run of non-word chars and stop again after the subsequent run of word chars. I don't think there a very good reason for doing it like Emacs vs doing it like VI. Each one has its advantages. VI's approach stops more often, so there's less chance that it'll skip the position in which you're interested, which is why you like it. In Emacs's approach OTOH you'll often get away with fewer operations. Stefan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-17 2:20 ` Stefan Monnier @ 2010-06-18 7:24 ` Uday S Reddy 0 siblings, 0 replies; 26+ messages in thread From: Uday S Reddy @ 2010-06-18 7:24 UTC (permalink / raw) To: help-gnu-emacs On 6/17/2010 3:20 AM, Stefan Monnier wrote: > > Emacs doesn't so much care about word-boundaries as about words. > So when you forward-word, it just skip until the end of the next word, > where "abc" is a word, but ";-( )" is not. > So in many cases, it ends up doing in one step what VI would do in [two]: > first skip over the non-word chars, and then skip the next few > word-chars, whereas VI would stop after the run of non-word chars and > stop again after the subsequent run of word chars. Indeed, reducing two down to one is an advantage. But if I have "abs;-()" and I want to delete the whole jing bang, Emacs loses big time! Cheers, Uday ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries [not found] <mailman.1.1276717938.15244.help-gnu-emacs@gnu.org> 2010-06-17 2:20 ` Stefan Monnier @ 2010-06-17 10:43 ` Uday S Reddy 2010-06-17 20:16 ` Elena 2010-06-18 5:30 ` Xah Lee 3 siblings, 0 replies; 26+ messages in thread From: Uday S Reddy @ 2010-06-17 10:43 UTC (permalink / raw) To: help-gnu-emacs On 6/16/2010 11:44 AM, Paul Drummond wrote: > Again, vim does the right thing here - pressing 'b' takes the point to > the closing bracket of Page(this) so it doesn't recognise the semi-colon > as a bracket which is intuitive and what I would expect. This is really > the point I am trying to make. I have never taken the time to > understand the behaviour of word boundaries in Vim because *it just > works*. In Emacs I am forced to think about word boundaries because > Emacs keeps surprising me with its weird behaviour! I never thought about this issue actively. I do have a vague recollection of facing it when I first moved back from vi to Emacs. Separating words and word boundaries feels more semantic and less mechanical. And it seems that you can get more done with the same key binding than we currently can. Seems like a good idea to implement it: forward-word-or-boundary, kill-word-or-boundary, ... My example would be, say "apples, oranges and peaches". Now think of deleting "apples, ". Cheers, Uday ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries [not found] <mailman.1.1276717938.15244.help-gnu-emacs@gnu.org> 2010-06-17 2:20 ` Stefan Monnier 2010-06-17 10:43 ` Uday S Reddy @ 2010-06-17 20:16 ` Elena 2010-06-18 5:30 ` Xah Lee 3 siblings, 0 replies; 26+ messages in thread From: Elena @ 2010-06-17 20:16 UTC (permalink / raw) To: help-gnu-emacs You may be interested in Emacs' standard library "thingatpt" and related contributed libraries: http://www.emacswiki.org/emacs/ThingAtPoint ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries [not found] <mailman.1.1276717938.15244.help-gnu-emacs@gnu.org> ` (2 preceding siblings ...) 2010-06-17 20:16 ` Elena @ 2010-06-18 5:30 ` Xah Lee 2010-06-18 7:06 ` Xah Lee 3 siblings, 1 reply; 26+ messages in thread From: Xah Lee @ 2010-06-18 5:30 UTC (permalink / raw) To: help-gnu-emacs On Jun 16, 3:44 am, Paul Drummond <paul.drumm...@iode.co.uk> wrote: > I have been an Emacs users for a few years now so definitely still a > newbie! While initially I struggled to control its power, I eventually came > round. Every issue I've had so far I've been able to fix by a quick search > in EmacsWiki, except for one frustrating and re-occurring problem that has > plagued me for years - word boundaries. > > Before Emacs I used Vim exclusively and the word boundary behaviour in Vim > *just worked* - I didn't even have to think about it. No matter what > language I used I could navigate and manipulate words without thinking about > it. The way word boundaries work in Vim is elegant and I have spent a lot > of time trying to find some elisp to replicate the behaviour in Emacs but to > no avail. > > I could write some elisp myself but I am still very new to it so it will > take a while - it's something I would like to do but I don't have time at > the moment. Regardless, an elisp solution to the problem is not the point > of this post. I want to understand why word boundaries behave the way they > do in Vanilla Emacs and I would greatly appropriate some views on this from > some Emacs Gurus! > > Every time I notice the word boundary behaviour when hacking in Emacs I > wonder to myself - "I must be missing something here. Surely, experienced > Emacs users don't just *put up* with this! Yet every forum response, blog > post, mailing-list post I have read suggests they do. This is atypical of > the Emacs community in my experience. Usually when something behaves wrong > in Emacs, it's easy to find some elisp that just fixes the problem full > stop. Yet with word-boundaries all I can find is suggestions that fix a > particular gripe but nothing that provides a general solution. > > I have loads of examples but I will mentioned just a few here to hopefully > kick-start further discussion. > > ** Example 1 > > I use org-mode for my journal and today I hit the word-boundary problem > while entering my morning journal entry - here's a contrived example of what > I entered: > > ** [10:27] Understanding Word Boundaries in Emacs > ^ > With point at the end of the word "Understanding" I hit C-w (which I bind to > backward-kill-word) and the word "Understanding" is killed as expected. But > when I hit C-w again, the point kills to the colon. Why? Why is colon a > word-boundary but the closing square bracket isn't? > > ** Example 2 > > When editing C++ files I often need to delete the "ClassName::" part when > declaring functions in the header: > > void ClassName::function(); > ^ > > With point at the start of ClassName I want to press M-d twice to delete > ClassName and :: but "::" isn't recognised as a word. In Vim I just type > "dw" twice and it *just works*. > > ** Example 3 > > I have loads of problems when deleting and navigating words over multiple > lines. In the following C++ code for instance: > > Page *page = new _Page(this); > page.load(); > ^ > > When point is after "page", before the dot on the second line and I hit M-b > (backward-word) point ends up at the first opening bracket of "Page(" !!! > > Again, vim does the right thing here - pressing 'b' takes the point to the > closing bracket of Page(this) so it doesn't recognise the semi-colon as a > bracket which is intuitive and what I would expect. This is really the > point I am trying to make. I have never taken the time to understand the > behaviour of word boundaries in Vim because *it just works*. In Emacs I am > forced to think about word boundaries because Emacs keeps surprising me with > its weird behaviour! > > Note: My examples happen to be C++ but I use lots of other languages too > including elisp, Clojure, JavaScript, Python and Java and the > word-boundaries seem to be wrong for all of them. > > I have tried several different elisp solutions but each one has at least one > feature that isn't quite right. Here are some links I kept, I've tried many > other solutions but don't have the links to hand: > > http://stackoverflow.com/questions/2078855/about-the-forward-and-back...http://stackoverflow.com/questions/1771102/changing-emacs-forward-wor... > > So to wrap up, the point of this post is to kick-start a discussion about > why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1 > in my case) seem to be so awkward and unintuitive. > > Regards, > Paul Drummond Good point. I remember i felt something similar some 5 or 7 years ago and was annoyed. But now i can't remember any detail... i just got used to emacs and can't say i find it being problem at all. actually, i think point is a valid one and a bit technically involved in detail. i'll have to study this in detail some other day but here's some points. For testing, save a file with this line as content: something in the water does not compute Now, you can try the word movement in different editors. I tested this on Notepad, Notepad++, vim, emacs, Mac's TextEdit. In short, different text editors all have a bit different behavior. Here, Notepad, Notepad++, vim have the same behavior, while emacs and TextEdit have similar behavior. In Notepad, Notepad++, vim, the cursor always ends at the beginning of each word. In emacs and TextEdit, they end in the beginning of the word if you are using backward-word, but ends at the end of the word if you are using forward-word. That's the first major difference. -------------------------------------------------- Now, try this line: something !! in @@ the ## water $$ does %% not ^^ compute Now, vim and Notepad++ 's behavior are identical. Their behavior is pretty simple and like before. They simply put the cursor at the beginning of each string sequence, doesn't matter what the characters are. Notepad is similar, except that it moves into between %%. emacs and TextEdit behaved similarly. Emacs will skip the symbol clusters entirely, except %%. (this depends on what mode you are in) TextEdit will also stop in middle of $$ and ^^, otherwise skip the other symbols clusters entirely. So, from this, it is clear that different editors has different concepts of syntax group, or not such concept at all. I understand well the emacs case. Emacs has a syntax table concept, that groups certain chars into a classes of “whitespace”, “word”, “symbol”, “punctuation”, ...etc. When you use backward-word, it simply move untill it reaches a char that's not in the “word” group. So, depending on which mode you are in, it'll either skip a character sequence of identical chars entirely, or stop at their boundary. And if the char sequence is of different symbols such as !@#$%&*() then emacs may go into middle of them. The question is whether other editors has syntax group notion, or that their word movement behavior depends on the language mode at all. -------------------------------------------------- Now, the interesting question is which model is more efficient for general everyday coding of different languages. First question is: is it more efficient in general for forward/ backward word motions to always land in front of the word as in vim, Notepad, Notepad++ ? Certainly i think it is more intuitive that way. But otherwise i don' tknow. I'll have to do research on this some day. The second question is whether it is good to have the movement dependant on the language mode. Again i don't know. Though, i do find emacs syntax table annoying from my experience of working with it a bit in the past few years... from the little i know, i felt that it doesn't do much, its power to model syntax is quite weak, and very complicated to use... but i don't know for sure. Btw, one of your example, this one: Page *page = new _Page(this); page.load(); i cannot duplicate. Xah ∑ http://xahlee.org/ ☄ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Understanding Word Boundaries 2010-06-18 5:30 ` Xah Lee @ 2010-06-18 7:06 ` Xah Lee 0 siblings, 0 replies; 26+ messages in thread From: Xah Lee @ 2010-06-18 7:06 UTC (permalink / raw) To: help-gnu-emacs doesdid some more study on this. wrote up a cleaned up version here: http://xahlee.blogspot.com/2010/06/text-editors-cursor-movement-behavior.html here's a excerpt of the question: ------------------------- Now, create a file of this content for more test. something in the water does not compute something !! in @@ the ## water $$ does %% not ^^ compute something!!in@@the##water$$does%%not^^compute (defun insert-p-tag () "Insert <p></p> at cursor point." (interactive) (insert "<p></p>") (backward-char 4)) for (my $i = 0; $i < 9; $i++) { print "done!";} <a><b>a b c</b> d e</a> Answer this: * Does the positions the cursor stop depends on whether you are moving left or right? * Does the word motion behavior change depending on what language mode you are in? * What is your editor? on what OS? Thanks. Xah ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2012-12-13 5:59 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-16 10:44 Understanding Word Boundaries Paul Drummond 2010-06-16 20:07 ` Karan Bathla 2010-06-17 13:37 ` Deniz Dogan 2010-06-23 9:02 ` Gary 2010-06-26 10:46 ` Paul Drummond 2010-06-26 10:53 ` Paul Drummond 2010-06-26 11:22 ` Thien-Thi Nguyen 2010-06-26 23:49 ` ken 2010-06-27 3:05 ` Deniz Dogan 2012-12-11 11:18 ` Understanding Word and Sentence Boundaries ken 2012-12-11 12:03 ` Eric Abrahamsen 2012-12-11 15:17 ` ken 2012-12-12 7:02 ` Eric Abrahamsen 2012-12-12 14:32 ` Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] ken 2012-12-13 4:27 ` Eric Abrahamsen 2012-12-13 5:59 ` Eric Abrahamsen [not found] ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org> 2010-06-27 15:02 ` Understanding Word Boundaries Xah Lee 2012-12-11 2:11 ` Samuel Wales [not found] ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org> 2010-06-27 14:58 ` Xah Lee 2010-06-25 10:33 ` andreas.roehler [not found] <mailman.1.1276717938.15244.help-gnu-emacs@gnu.org> 2010-06-17 2:20 ` Stefan Monnier 2010-06-18 7:24 ` Uday S Reddy 2010-06-17 10:43 ` Uday S Reddy 2010-06-17 20:16 ` Elena 2010-06-18 5:30 ` Xah Lee 2010-06-18 7:06 ` Xah Lee
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).