From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Newsgroups: gmane.emacs.help Subject: Re: "split-sentences"? Date: Sat, 23 Jan 2021 14:10:56 +0100 Message-ID: <20210123131055.GA11154@tuxteam.de> References: <87zh109r2d.fsf@zoho.eu> <87v9bo9myu.fsf@zoho.eu> <20210123084136.GA2306@tuxteam.de> <87a6t09ers.fsf@zoho.eu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3V7upXqbjpZ4EhLz" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25188"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/1.5.21 (2010-09-15) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jan 23 14:11:48 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1l3Ihg-0006SP-Gc for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 23 Jan 2021 14:11:48 +0100 Original-Received: from localhost ([::1]:33504 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l3Ihf-0007AZ-Bu for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 23 Jan 2021 08:11:47 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:45976) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l3Igz-00079u-P8 for help-gnu-emacs@gnu.org; Sat, 23 Jan 2021 08:11:05 -0500 Original-Received: from mail.tuxteam.de ([5.199.139.25]:59644) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.90_1) (envelope-from ) id 1l3Igv-0001rG-Tk for help-gnu-emacs@gnu.org; Sat, 23 Jan 2021 08:11:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tuxteam.de; s=mail; h=From:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:To:Date; bh=T1YVZOdr5v3lEVYoJQjyH/RZ67FXo2p/sj0eonxRu4k=; b=n8zIbeLgLDedP9AZqmPtReVi4OPDUBokKzibfndp03L0g7V85MNK9+jSqiLg2Zuygt0xshuGeE6Zv/VBVcOypHwLRcpajf6iJyM5E2cfQTSLO7u1czLURNndGwfwEWwx5UT8SMVauX5muFtpo71p6SaEK8aIOn9VAmkdaYbYzQvxy5TSrE7tbfZqirqtaBZwuBpb3gt9ZLX8bkCPnMassBW3iwWH6dh4EdZ7vcFrsroaRCu/48o3GZiBE+Uji5NMZI5zQ0iLBFAoHETerLqe7hwk5G61yn1qLXnj7+P9mXl6QLcNk7oN7YteR19BR7gk5phhbcCYsV9pWdRfaMsL2Q==; Original-Received: from tomas by mail.tuxteam.de with local (Exim 4.80) (envelope-from ) id 1l3Igq-00034z-1W for help-gnu-emacs@gnu.org; Sat, 23 Jan 2021 14:10:56 +0100 Content-Disposition: inline In-Reply-To: <87a6t09ers.fsf@zoho.eu> Received-SPF: pass client-ip=5.199.139.25; envelope-from=tomas@tuxteam.de; helo=mail.tuxteam.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:127329 Archived-At: --3V7upXqbjpZ4EhLz Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 23, 2021 at 10:35:51AM +0100, moasenwood--- via Users list for = the GNU Emacs text editor wrote: > tomas wrote: >=20 > > Not exactly your result, but this comes close: [...] > > You can adjust the results by tweaking the regexp (try word > > boundaries like '\<' and '\>' >=20 > *scratches my head* A candidate for a sentence boundary is a word boundary (plus some other conditions). This was at least my thought process leading to that suggestion. It might be a bad suggestion, though. > > if you want to keep punctuation) or the other split-string's > > optional params (e.g. drop the empty matches, etc.). >=20 > Well, that's a start, for sure. Thanks :) You're welcome. Note that [:punct:] may be too broad a category: does a sentence end with a comma? A semi-colon? A colon? What about question and exclamation marks? What about the latter in a language like Spanish, where they're parenthetical: "Ella me pregunt=C3=B3 =C2=BFqu=C3=A9 quieres?" (the parenthetical things make it much easier to embed a question or an exclamation into something else). As always, the really interesting questions are left as exercises to the reader... until you end with Natural Language Processing :-) Possibly this is the danger Tomas Hlavaty is hinting at elsethread. > Silly me, I already used `split-string' 10 times... C'm on. Wetware caches are like that. Mine too. Cheers - t --3V7upXqbjpZ4EhLz Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAmAMIF8ACgkQBcgs9XrR2kYPQACeKFEmpmPAvf29BpHCs3RaB4uu zioAniexFPScIkCAixRgd3sjR9Rekx0v =LLsX -----END PGP SIGNATURE----- --3V7upXqbjpZ4EhLz--