From: <tomas@tuxteam.de>
To: help-gnu-emacs@gnu.org
Subject: Re: "split-sentences"?
Date: Sat, 23 Jan 2021 14:10:56 +0100 [thread overview]
Message-ID: <20210123131055.GA11154@tuxteam.de> (raw)
In-Reply-To: <87a6t09ers.fsf@zoho.eu>
[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]
On Sat, Jan 23, 2021 at 10:35:51AM +0100, moasenwood--- via Users list for the GNU Emacs text editor wrote:
> tomas wrote:
>
> > Not exactly your result, but this comes close:
[...]
> > You can adjust the results by tweaking the regexp (try word
> > boundaries like '\<' and '\>'
>
> *scratches my head*
A candidate for a sentence boundary is a word boundary
(plus some other conditions). This was at least my thought
process leading to that suggestion. It might be a bad
suggestion, though.
> > if you want to keep punctuation) or the other split-string's
> > optional params (e.g. drop the empty matches, etc.).
>
> Well, that's a start, for sure. Thanks :)
You're welcome. Note that [:punct:] may be too broad a category:
does a sentence end with a comma? A semi-colon? A colon? What
about question and exclamation marks? What about the latter in
a language like Spanish, where they're parenthetical: "Ella
me preguntó ¿qué quieres?" (the parenthetical things make it
much easier to embed a question or an exclamation into something
else).
As always, the really interesting questions are left as exercises to
the reader... until you end with Natural Language Processing :-)
Possibly this is the danger Tomas Hlavaty is hinting at elsethread.
> Silly me, I already used `split-string' 10 times...
C'm on. Wetware caches are like that. Mine too.
Cheers
- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2021-01-23 13:10 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-23 5:10 "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23 6:38 ` "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23 8:41 ` "split-sentences"? tomas
2021-01-23 9:07 ` "split-sentences"? Tomas Hlavaty
2021-01-23 9:32 ` "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23 9:48 ` "split-sentences"? Eli Zaretskii
2021-01-24 0:39 ` "split-sentences"? Tomas Hlavaty
2021-01-23 9:35 ` "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23 13:10 ` tomas [this message]
2021-01-23 17:46 ` "split-sentences"? Eric Abrahamsen
2021-01-23 20:56 ` "split-sentences"? tomas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210123131055.GA11154@tuxteam.de \
--to=tomas@tuxteam.de \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).