From: Eric Abrahamsen <eric@ericabrahamsen.net>
To: help-gnu-emacs@gnu.org
Subject: Re: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries]
Date: Thu, 13 Dec 2012 13:59:57 +0800 [thread overview]
Message-ID: <87wqwm7aqq.fsf@ericabrahamsen.net> (raw)
In-Reply-To: 87obhyh8zi.fsf@ericabrahamsen.net
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> ken <gebser@mousecar.com> writes:
>
>> On 12/12/2012 02:02 AM Eric Abrahamsen wrote:
>>> ken<gebser@mousecar.com> writes:
>>>
>>>> On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
>>>>> ken<gebser@mousecar.com> writes:
>>>>>
>>>>>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>>>>>> 2010/6/27 ken<gebser@mousecar.com>:
>>>>>>>>
>>>>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>>>>>> Thanks for the responses guys.
>>>>>>>>>
>>>>>>>>> ....
>>>>>>>>>
>>>>>>>> Is it possible to specify word boundaries for a particular mode?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.
>>>>>>
>>>>>> Thanks for the pointer to that function.
>>>>>>
>>>>>> The behavior I see in need of repair is the role of so-called "comments"
>>>>>> in sentence syntax.</tag> For instance, immediately before this
>>>>>> sentence are two spaces... which should signify the end of the
>>>>>> previous sentence. But functions like "forward-sentence" and
>>>>>> "fill-paragraph" and "backward-sentence" don't recognize it.
>>>>>>
>>>>>> Said another way, the "</tag>" string obscures the relationship
>>>>>> between the period before it and the two spaces after it and so fails
>>>>>> to see that one sentence ends and another starts. This occurs in
>>>>>> text-mode and seems to be inherited by other modes.
>>>>>>
>>>>>> If I'm reading "modify-syntax-entry" correctly, the default meanings
>>>>>> of '<' and'>' are, respectively, beginning and end of comment, so
>>>>>> modifying them wouldn't fix this problem. Or can this be remedied by
>>>>>> a change in the syntax table? Or is this a bug?
>>>>>
>>>>> For this particular case, I think you can modify the value of the
>>>>> `sentence-end' variable (which is returned by the `sentence-end'
>>>>> function? The whole thing is a little confusing). You'd probably be best
>>>>> off starting with the docstring for the sentence-end function, and
>>>>> working back from there.
>>>>>
>>>>> I think the `sentence-end' variable is automatically buffer-local, which
>>>>> means if you change it in a mode-hook it ought to work the way you want.
>>>>> I agree that the whole syntax thing feels like a very well-polished
>>>>> hack.
>>>>>
>>>>> E
>>>>
>>>> Eric,
>>>>
>>>> Yes, that would be the variable to adjust. I took a hard look at it
>>>> and discussed it (I believe) on this list years ago, but never came up
>>>> with a fix. As I see it, there are two problems:
>>>>
>>>> First, "one" of the items in that RE would need to be "zero or more
>>>> consecutive instances of '<' followed by any number of other
>>>> characters up until the next '>' is found." E.g., the RE would need
>>>> to be able to find the end of this
>>>> sentence</b></i>.)</q></p></span></div> Though I've used REs
>>>> successfully in quite a few instances and so with a small bit of help
>>>> could probably figure that part out, there's a second issue.
>>>>
>>
>> [In my original post the paragraph below was unclear. So changed it.]
>>
>>>> My considered opinion is that in the above and similar examples, the
>>>> end of the sentence is immediately after the period ('.')... or
>>>> question mark, exclamation mark, etc. and not after the</div>. That
>>>> is where the point should go when forward-sentence is executed. This
>>>> means that no RE would work because, once it finds the RE-defined
>>>> sentence-end, it then needs to go backwards within the found string
>>>> until it encounters [.!?]+ and then move the mark one char forward to the
>>>> character after. IOW, unless I'm missing some capability of REs,
>>>> "sentence-end" needs to be a function rather than an RE and would be a
>>>> different function than one which finds the beginning of a sentence.
>>>
>>> I'm getting way out of my depth here, both regarding regexps and emacs'
>>> sentence-related shenanigans, but you could consider advising the
>>> `sentence-end' function so that it checks current the major mode, and
>>> delegates to a different sentence-end function depending on the mode (or
>>> declines to handle and bails to the built-in sentence-end).
>>>
>>> The individual mode-specific sentence-end functions look at the text
>>> after point, and return a different regexp every time, one specifically
>>> tailored to this particular sentence in this particular mode. The call to
>>> `forward-sentence' or whatever happily uses a different regexp every
>>> time it is called.
>>>
>>> Feels hacky, but I guess `sentence-end' is already doing this in a
>>> sense -- potentially returning a different regexp every time.
>>>
>>> My brain is exhausted!
>>>
>>> E
>>
>> If one were to write a mode-specific replacement for the existing
>> "forward-sentence" and "sentence-end", what are some ways in elisp to
>> ensure that they're invoked when working in that mode? Would it be
>> enough to include (the recoded) "forward-sentence" and "sentence-end"
>> in the code for that mode...? or would some kind of specific hook
>> language need to be included in ~/.emacs?
>
> I was considering overloading the `sentence-end' function in a
> mode-hook, but I think it's highly likely that you'd end up polluting
> other modes. So probably the safest thing to do is to advise it at the
> top level, ie in your ~/.emacs file, and then check current mode from
> there. Something like the following totally untested code:
>
> (defadvice sentence-end (before my-check-sentence-end activate)
> "Possibly short-circuit the `sentence-end' function."
> (cond ((derived-mode-p 'emacs-lisp-mode)
> (emacs-lisp-sentence-end))
> ((derived-mode-p 'some-other-mode)
> (other-mode-sentence-end))
> (t ad-do-it)))
I'm in the habit of using `derived-mode-p' but on second thought, you'll
probably just want to go with the simpler, but more exacting: (eq
major-mode 'emacs-lisp-mode)
next prev parent reply other threads:[~2012-12-13 5:59 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-16 10:44 Understanding Word Boundaries Paul Drummond
2010-06-16 20:07 ` Karan Bathla
2010-06-17 13:37 ` Deniz Dogan
2010-06-23 9:02 ` Gary
2010-06-26 10:46 ` Paul Drummond
2010-06-26 10:53 ` Paul Drummond
2010-06-26 11:22 ` Thien-Thi Nguyen
2010-06-26 23:49 ` ken
2010-06-27 3:05 ` Deniz Dogan
2012-12-11 11:18 ` Understanding Word and Sentence Boundaries ken
2012-12-11 12:03 ` Eric Abrahamsen
2012-12-11 15:17 ` ken
2012-12-12 7:02 ` Eric Abrahamsen
2012-12-12 14:32 ` Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] ken
2012-12-13 4:27 ` Eric Abrahamsen
2012-12-13 5:59 ` Eric Abrahamsen [this message]
[not found] ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org>
2010-06-27 15:02 ` Understanding Word Boundaries Xah Lee
2012-12-11 2:11 ` Samuel Wales
[not found] ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org>
2010-06-27 14:58 ` Xah Lee
2010-06-25 10:33 ` andreas.roehler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wqwm7aqq.fsf@ericabrahamsen.net \
--to=eric@ericabrahamsen.net \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.