all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: <tomas@tuxteam.de>
To: help-gnu-emacs@gnu.org
Subject: Re: "split-sentences"?
Date: Sat, 23 Jan 2021 09:41:37 +0100	[thread overview]
Message-ID: <20210123084136.GA2306@tuxteam.de> (raw)
In-Reply-To: <87v9bo9myu.fsf@zoho.eu>

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

On Sat, Jan 23, 2021 at 07:38:49AM +0100, moasenwood--- via Users list for the GNU Emacs text editor wrote:
> moasenwood--- via Users list for the GNU Emacs text editor wrote:
> 
> > Can I parse/split a string into sentences based on
> > human-language punctuation?
> >
> > Did anyone do that already?
> 
> I mean very mechanically is fine, no linguistics or anything.
> 
> So this
> 
> "'This sentence is spoken by Mr. W. E. B Dubois, Esq.!' played
> through amazon.com alexa speakers?"
>
> would be
> 
> ("'" "This sentence is spoken by Mr" "." "W" "." "E" "." "B
> Dubois" "," "Esq" "." "!" "'" "played through amazon" "."
> "com" "alexa "speakers" "?")

Not exactly your result, but this comes close:

  (split-string
    "'This sentence is spoken by Mr. W. E. B Dubois, Esq.!' played through amazon.com alexa speakers?"
    "[[:punct:]][[:space:]]*")

=>

  (""
   "This sentence is spoken by Mr"
   "W"
   "E"
   "B Dubois"
   "Esq"
   ""
   ""
   "played through amazon"
   "com alexa speakers"
   "")

You can adjust the results by tweaking the regexp (try word
boundaries like '\<' and '\>' if you want to keep punctuation)
or the other split-string's optional params (e.g. drop the
empty matches, etc.).

Cheers
 - t

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  reply	other threads:[~2021-01-23  8:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-23  5:10 "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23  6:38 ` "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23  8:41   ` tomas [this message]
2021-01-23  9:07     ` "split-sentences"? Tomas Hlavaty
2021-01-23  9:32       ` "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23  9:48       ` "split-sentences"? Eli Zaretskii
2021-01-24  0:39         ` "split-sentences"? Tomas Hlavaty
2021-01-23  9:35     ` "split-sentences"? moasenwood--- via Users list for the GNU Emacs text editor
2021-01-23 13:10       ` "split-sentences"? tomas
2021-01-23 17:46         ` "split-sentences"? Eric Abrahamsen
2021-01-23 20:56           ` "split-sentences"? tomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210123084136.GA2306@tuxteam.de \
    --to=tomas@tuxteam.de \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.