unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: <tomas@tuxteam.de>
To: help-gnu-emacs@gnu.org
Subject: Re: "split-sentences"?
Date: Sat, 23 Jan 2021 14:10:56 +0100	[thread overview]
Message-ID: <20210123131055.GA11154@tuxteam.de> (raw)
In-Reply-To: <87a6t09ers.fsf@zoho.eu>

[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]

On Sat, Jan 23, 2021 at 10:35:51AM +0100, moasenwood--- via Users list for the GNU Emacs text editor wrote:
> tomas wrote:
> 
> > Not exactly your result, but this comes close:

[...]

> > You can adjust the results by tweaking the regexp (try word
> > boundaries like '\<' and '\>'
> 
> *scratches my head*

A candidate for a sentence boundary is a word boundary
(plus some other conditions). This was at least my thought
process leading to that suggestion. It might be a bad
suggestion, though.

> > if you want to keep punctuation) or the other split-string's
> > optional params (e.g. drop the empty matches, etc.).
> 
> Well, that's a start, for sure. Thanks :)

You're welcome. Note that [:punct:] may be too broad a category:
does a sentence end with a comma? A semi-colon? A colon? What
about question and exclamation marks? What about the latter in
a language like Spanish, where they're parenthetical: "Ella
me preguntó ¿qué quieres?" (the parenthetical things make it
much easier to embed a question or an exclamation into something
else).

As always, the really interesting questions are left as exercises to
the reader... until you end with Natural Language Processing :-)

Possibly this is the danger Tomas Hlavaty is hinting at elsethread.

> Silly me, I already used `split-string' 10 times...

C'm on. Wetware caches are like that. Mine too.

Cheers
 - t

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  reply	other threads:[~2021-01-23 13:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-23  5:10 "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23  6:38 ` "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23  8:41   ` "split-sentences"? tomas
2021-01-23  9:07     ` "split-sentences"? Tomas Hlavaty
2021-01-23  9:32       ` "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23  9:48       ` "split-sentences"? Eli Zaretskii
2021-01-24  0:39         ` "split-sentences"? Tomas Hlavaty
2021-01-23  9:35     ` "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23 13:10       ` tomas [this message]
2021-01-23 17:46         ` "split-sentences"? Eric Abrahamsen
2021-01-23 20:56           ` "split-sentences"? tomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210123131055.GA11154@tuxteam.de \
    --to=tomas@tuxteam.de \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).