Finding end of sentence[ was Re: Understanding ... Sentence Boundaries]

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: ken <gebser@mousecar.com>
To: GNU Emacs List <help-gnu-emacs@gnu.org>
Subject: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries]
Date: Wed, 12 Dec 2012 09:32:31 -0500	[thread overview]
Message-ID: <50C8957F.6060103@mousecar.com> (raw)
In-Reply-To: <87ip8792j8.fsf@ericabrahamsen.net>

On 12/12/2012 02:02 AM Eric Abrahamsen wrote:
> ken<gebser@mousecar.com>  writes:
>
>> On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
>>> ken<gebser@mousecar.com>   writes:
>>>
>>>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>>>> 2010/6/27 ken<gebser@mousecar.com>:
>>>>>>
>>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>>>> Thanks for the responses guys.
>>>>>>>
>>>>>>> ....
>>>>>>>
>>>>>> Is it possible to specify word boundaries for a particular mode?
>>>>>>
>>>>>
>>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.
>>>>
>>>> Thanks for the pointer to that function.
>>>>
>>>> The behavior I see in need of repair is the role of so-called "comments"
>>>> in sentence syntax.</tag>    For instance, immediately before this
>>>> sentence are two spaces... which should signify the end of the
>>>> previous sentence.  But functions like "forward-sentence" and
>>>> "fill-paragraph" and "backward-sentence" don't recognize it.
>>>>
>>>> Said another way, the "</tag>" string obscures the relationship
>>>> between the period before it and the two spaces after it and so fails
>>>> to see that one sentence ends and another starts.  This occurs in
>>>> text-mode and seems to be inherited by other modes.
>>>>
>>>> If I'm reading "modify-syntax-entry" correctly, the default meanings
>>>> of '<' and'>' are, respectively, beginning and end of comment, so
>>>> modifying them wouldn't fix this problem.  Or can this be remedied by
>>>> a change in the syntax table?  Or is this a bug?
>>>
>>> For this particular case, I think you can modify the value of the
>>> `sentence-end' variable (which is returned by the `sentence-end'
>>> function? The whole thing is a little confusing). You'd probably be best
>>> off starting with the docstring for the sentence-end function, and
>>> working back from there.
>>>
>>> I think the `sentence-end' variable is automatically buffer-local, which
>>> means if you change it in a mode-hook it ought to work the way you want.
>>> I agree that the whole syntax thing feels like a very well-polished
>>> hack.
>>>
>>> E
>>
>> Eric,
>>
>> Yes, that would be the variable to adjust.  I took a hard look at it
>> and discussed it (I believe) on this list years ago, but never came up
>> with a fix.  As I see it, there are two problems:
>>
>> First, "one" of the items in that RE would need to be "zero or more
>> consecutive instances of '<' followed by any number of other
>> characters up until the next '>' is found."  E.g., the RE would need
>> to be able to find the end of this
>> sentence</b></i>.)</q></p></span></div>   Though I've used REs
>> successfully in quite a few instances and so with a small bit of help
>> could probably figure that part out, there's a second issue.
>>

[In my original post the paragraph below was unclear.  So changed it.]

>> My considered opinion is that in the above and similar examples, the
>> end of the sentence is immediately after the period ('.')... or
>> question mark, exclamation mark, etc. and not after the</div>.  That
>> is where the point should go when forward-sentence is executed.  This
>> means that no RE would work because, once it finds the RE-defined
>> sentence-end, it then needs to go backwards within the found string
>> until it encounters [.!?]+ and then move the mark one char forward to the
>> character after.  IOW, unless I'm missing some capability of REs,
>> "sentence-end" needs to be a function rather than an RE and would be a
>> different function than one which finds the beginning of a sentence.
>
> I'm getting way out of my depth here, both regarding regexps and emacs'
> sentence-related shenanigans, but you could consider advising the
> `sentence-end' function so that it checks current the major mode, and
> delegates to a different sentence-end function depending on the mode (or
> declines to handle and bails to the built-in sentence-end).
>
> The individual mode-specific sentence-end functions look at the text
> after point, and return a different regexp every time, one specifically
> tailored to this particular sentence in this particular mode. The call to
> `forward-sentence' or whatever happily uses a different regexp every
> time it is called.
>
> Feels hacky, but I guess `sentence-end' is already doing this in a
> sense -- potentially returning a different regexp every time.
>
> My brain is exhausted!
>
> E

If one were to write a mode-specific replacement for the existing 
"forward-sentence" and "sentence-end", what are some ways in elisp to 
ensure that they're invoked when working in that mode?  Would it be 
enough to include (the recoded) "forward-sentence" and "sentence-end" in 
the code for that mode...?  or would some kind of specific hook language 
need to be included in ~/.emacs?

next prev parent reply	other threads:[~2012-12-12 14:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-16 10:44 Understanding Word Boundaries Paul Drummond
2010-06-16 20:07 ` Karan Bathla
2010-06-17 13:37   ` Deniz Dogan
2010-06-23  9:02 ` Gary
2010-06-26 10:46   ` Paul Drummond
2010-06-26 10:53     ` Paul Drummond
2010-06-26 11:22       ` Thien-Thi Nguyen
2010-06-26 23:49       ` ken
2010-06-27  3:05         ` Deniz Dogan
2012-12-11 11:18           ` Understanding Word and Sentence Boundaries ken
2012-12-11 12:03             ` Eric Abrahamsen
2012-12-11 15:17               ` ken
2012-12-12  7:02                 ` Eric Abrahamsen
2012-12-12 14:32                   ` ken [this message]
2012-12-13  4:27                     ` Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] Eric Abrahamsen
2012-12-13  5:59                       ` Eric Abrahamsen
     [not found]         ` <mailman.7.1277607983.30403.help-gnu-emacs@gnu.org>
2010-06-27 15:02           ` Understanding Word Boundaries Xah Lee
2012-12-11  2:11       ` Samuel Wales
     [not found]     ` <mailman.2.1277549613.3306.help-gnu-emacs@gnu.org>
2010-06-27 14:58       ` Xah Lee
2010-06-25 10:33 ` andreas.roehler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C8957F.6060103@mousecar.com \
    --to=gebser@mousecar.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.