From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Drew Adams Newsgroups: gmane.emacs.devel Subject: Proposed enhancement for `split-string' Date: Mon, 14 Jul 2014 15:51:24 -0700 (PDT) Message-ID: <7025b422-78b5-4b17-b199-70cbb1f6de93@default> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1405378312 25727 80.91.229.3 (14 Jul 2014 22:51:52 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 14 Jul 2014 22:51:52 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jul 15 00:51:46 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1X6p61-0004KB-PF for ged-emacs-devel@m.gmane.org; Tue, 15 Jul 2014 00:51:45 +0200 Original-Received: from localhost ([::1]:60490 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X6p61-0003n8-C6 for ged-emacs-devel@m.gmane.org; Mon, 14 Jul 2014 18:51:45 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35667) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X6p5r-0003lr-Su for emacs-devel@gnu.org; Mon, 14 Jul 2014 18:51:43 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X6p5j-0004zN-Ex for emacs-devel@gnu.org; Mon, 14 Jul 2014 18:51:35 -0400 Original-Received: from aserp1040.oracle.com ([141.146.126.69]:36702) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X6p5j-0004yw-8t for emacs-devel@gnu.org; Mon, 14 Jul 2014 18:51:27 -0400 Original-Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s6EMpQe4004432 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 14 Jul 2014 22:51:26 GMT Original-Received: from userz7021.oracle.com (userz7021.oracle.com [156.151.31.85]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s6EMpPKn005580 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Mon, 14 Jul 2014 22:51:25 GMT Original-Received: from abhmp0018.oracle.com (abhmp0018.oracle.com [141.146.116.24]) by userz7021.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s6EMpOKU023322 for ; Mon, 14 Jul 2014 22:51:25 GMT X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.8 (707110) [OL 12.0.6691.5000 (x86)] X-Source-IP: acsinet21.oracle.com [141.146.126.237] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 141.146.126.69 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:173027 Archived-At: Function `split-string' currently has this signature, where SEPARATORS is a regexp that defines (by matching) the separators used to split the STRING: (split-string STRING &optional SEPARATORS OMIT-NULLS TRIM) The STRING parts returned are the non-matches for regexp SEPARATORS. I have an enhancement of `split-string' to propose, which lets you alternatively split the string based on a character predicate or a text property, instead of based on matching a regexp. Code: http://www.emacswiki.org/emacs-en/download/subr%2b.el Description: http://www.emacswiki.org/emacs/SplittingStrings I can submit the enhancenment as a patch of subr.el, if there is interest. --- This would be the new (compatible) signature of `split-string': (split-string STRING &optional HOW OMIT-NULLS TRIM FLIP TEST) ^^^ ^^^^ ^^^^ The second arg, HOW, can be a regexp, giving the same behavior as now. Alternatively, HOW can be (1) a character predicate or (2) a doubleton plist (PROPERTY VALUE), where PROPERTY is a text property and VALUE is one of its possible values. 1. If HOW is a predicate then it must accept a character argument. Substrings whose chars satisfy the predicate are used as separators, so the return value is a list of substrings whose chars do *not* satisfy predicate HOW. 2. If HOW is (PROPERTY VALUE) then STRING is split into substrings whose chars do *not* have text property PROPERTY with value VALUE. If VALUE is nil then any non-nil VALUE matches; that is, only the presence of PROPERTY is tested. Characters that have PROPERTY belong to the separators, which are excluded. If VALUE is non-nil then a match occurs when the actual value of PROPERTY is `eq' to VALUE; that is, characters that have a PROPERTY of VALUE are those that are excluded. Non-nil optional arg TEST is a binary predicate that is applied to each char in STRING and to VALUE. If it returns non-nil for a given character occurrence then that occurrence is part of a substring that is excluded from the result (i.e., the char is part of a separator). IOW, there are 3 ways to define the separator strings for splitting: regexp matching, char-predicate satisfying, and text-property matching. By providing non-nil TEST you can test, for example: * Whether the actual value of text property `invisible' belongs to the current `buffer-invisibility-spec'. * Whether a particular face is among the faces that are the value of property `face'. Non-nil optional arg FLIP simply swaps the separators and the kept substrings - regardless of HOW the separating is defined. The substrings that would be returned if FLIP were nil are treated as the separators, and the substrings that would be treated as separators if FLIP were nil are returned as the result of splitting. The code I have also defines the following functions (in addition to a few helper functions). First, 3 specializations of `split-string', corresponding to the 3 kinds of HOW: * `split-string-by-regexp' - `split-string' specialized for a regexp HOW. That is, split by separator regexp matching. This is the behavior of today's `split-string'. * `split-string-by-property' - `split-string' specialized for a property-value HOW. That is, split by separator property-value matching. * `split-string-by-predicate - `split-string' specialized for a char-predicate HOW. That is, split by separator predicate satisfying. Second, functions similar to `buffer-substring', which return the region as a string, but which exclude or include only certain string parts: * `buffer-substring-of-propertied' - Return the parts that have a given PROPERTY. * `buffer-substring-of-unpropertied' - Return the parts that do not have a given PROPERTY. * `buffer-substring-of-visible' - Return the visible parts. * `buffer-substring-of-invisible' - Return the invisible parts. * `buffer-substring-of-faced' - Return the parts that have property `face'. * `buffer-substring-of-unfaced' - Return the parts that do not have property `face'. Example use case: I use `buffer-substring-of-visible' in a function that I bind to `filter-buffer-substring-function', to remove invisible text from the region string (which I use as part of an indirect buffer name): (lambda (beg end _delete) ; Remove invisible text. (let ((strg (buffer-substring-of-visible beg end))) (set-text-properties 0 (length strg) () strg) strg))