unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Proposed enhancement for `split-string'
@ 2014-07-14 22:51 Drew Adams
  2014-07-15  0:03 ` Stephen J. Turnbull
  0 siblings, 1 reply; 3+ messages in thread
From: Drew Adams @ 2014-07-14 22:51 UTC (permalink / raw)
  To: emacs-devel

Function `split-string' currently has this signature, where SEPARATORS
is a regexp that defines (by matching) the separators used to split
the STRING:

(split-string STRING &optional SEPARATORS OMIT-NULLS TRIM)

The STRING parts returned are the non-matches for regexp SEPARATORS.


I have an enhancement of `split-string' to propose, which lets you
alternatively split the string based on a character predicate or a
text property, instead of based on matching a regexp.

Code:        http://www.emacswiki.org/emacs-en/download/subr%2b.el
Description: http://www.emacswiki.org/emacs/SplittingStrings

I can submit the enhancenment as a patch of subr.el, if there is
interest.

---

This would be the new (compatible) signature of `split-string':

(split-string STRING &optional HOW OMIT-NULLS TRIM FLIP TEST)
                               ^^^                 ^^^^ ^^^^

The second arg, HOW, can be a regexp, giving the same behavior as now.
Alternatively, HOW can be (1) a character predicate or (2) a doubleton
plist (PROPERTY VALUE), where PROPERTY is a text property and VALUE is
one of its possible values.

1. If HOW is a predicate then it must accept a character argument.
   Substrings whose chars satisfy the predicate are used as
   separators, so the return value is a list of substrings whose chars
   do *not* satisfy predicate HOW.

2. If HOW is (PROPERTY VALUE) then STRING is split into substrings
   whose chars do *not* have text property PROPERTY with value VALUE.

If VALUE is nil then any non-nil VALUE matches; that is, only the
presence of PROPERTY is tested.  Characters that have PROPERTY belong
to the separators, which are excluded.

If VALUE is non-nil then a match occurs when the actual value of
PROPERTY is `eq' to VALUE; that is, characters that have a PROPERTY of
VALUE are those that are excluded.

Non-nil optional arg TEST is a binary predicate that is applied to
each char in STRING and to VALUE.  If it returns non-nil for a given
character occurrence then that occurrence is part of a substring that
is excluded from the result (i.e., the char is part of a separator).

IOW, there are 3 ways to define the separator strings for splitting:
regexp matching, char-predicate satisfying, and text-property
matching.

By providing non-nil TEST you can test, for example:

* Whether the actual value of text property `invisible' belongs to the
  current `buffer-invisibility-spec'.

* Whether a particular face is among the faces that are the value of
  property `face'.

Non-nil optional arg FLIP simply swaps the separators and the kept
substrings - regardless of HOW the separating is defined.  The
substrings that would be returned if FLIP were nil are treated as the
separators, and the substrings that would be treated as separators if
FLIP were nil are returned as the result of splitting.

The code I have also defines the following functions (in addition to a
few helper functions).

First, 3 specializations of `split-string', corresponding to the 3
kinds of HOW:

* `split-string-by-regexp' - `split-string' specialized for a regexp
  HOW.  That is, split by separator regexp matching.  This is the
  behavior of today's `split-string'.

* `split-string-by-property' - `split-string' specialized for a
  property-value HOW.  That is, split by separator property-value
  matching.

* `split-string-by-predicate - `split-string' specialized for a
  char-predicate HOW.  That is, split by separator predicate
  satisfying.

Second, functions similar to `buffer-substring', which return the
region as a string, but which exclude or include only certain string
parts:

* `buffer-substring-of-propertied' - Return the parts that have a
  given PROPERTY.

* `buffer-substring-of-unpropertied' - Return the parts that do not
  have a given PROPERTY.

* `buffer-substring-of-visible' - Return the visible parts.

* `buffer-substring-of-invisible' - Return the invisible parts.

* `buffer-substring-of-faced' - Return the parts that have property
  `face'.

* `buffer-substring-of-unfaced' - Return the parts that do not have
  property `face'.

Example use case:

I use `buffer-substring-of-visible' in a function that I bind to
`filter-buffer-substring-function', to remove invisible text from the
region string (which I use as part of an indirect buffer name):

(lambda (beg end _delete)  ; Remove invisible text.
  (let ((strg   (buffer-substring-of-visible beg end)))
    (set-text-properties 0 (length strg) () strg)
  strg))



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Proposed enhancement for `split-string'
  2014-07-14 22:51 Proposed enhancement for `split-string' Drew Adams
@ 2014-07-15  0:03 ` Stephen J. Turnbull
  2014-07-18 12:24   ` Bozhidar Batsov
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen J. Turnbull @ 2014-07-15  0:03 UTC (permalink / raw)
  To: Drew Adams; +Cc: emacs-devel

Drew Adams writes:

 > The second arg, HOW, can be a regexp, giving the same behavior as now.
 > Alternatively, HOW can be (1) a character predicate or (2) a doubleton
 > plist (PROPERTY VALUE), where PROPERTY is a text property and VALUE is
 > one of its possible values.

Why not just allow it to be any function returning an interval (with
implicit argument = (point)), and provide appropriate functions to
accomplish the tasks you propose?

 > By providing non-nil TEST you can test, for example:
 > 
 > * Whether the actual value of text property `invisible' belongs to the
 >   current `buffer-invisibility-spec'.
 > 
 > * Whether a particular face is among the faces that are the value of
 >   property `face'.

A general predicate for HOW could do this, too.

 > Non-nil optional arg FLIP simply swaps the separators and the kept
 > substrings - regardless of HOW the separating is defined.

This can be done for the "standard" functions by providing an optional
FLIP argument, and using (lambda () (how-func 'flip-me)) as the HOW.
Alternatively you could provide flipped standard HOW functions.

I have no objection to a new function `split-string-à-la-drew' with
any signature you like, but `split-string' should keep as simple a
signature as possible.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Proposed enhancement for `split-string'
  2014-07-15  0:03 ` Stephen J. Turnbull
@ 2014-07-18 12:24   ` Bozhidar Batsov
  0 siblings, 0 replies; 3+ messages in thread
From: Bozhidar Batsov @ 2014-07-18 12:24 UTC (permalink / raw)
  To: Stephen J. Turnbull, Drew Adams; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

Drew’s suggestion can be implemented a different function in subr-x I guess.

—
Cheers, 
Bozhidar

On July 15, 2014 at 3:04:48 AM, Stephen J. Turnbull (stephen@xemacs.org) wrote:

Drew Adams writes:  

> The second arg, HOW, can be a regexp, giving the same behavior as now.  
> Alternatively, HOW can be (1) a character predicate or (2) a doubleton  
> plist (PROPERTY VALUE), where PROPERTY is a text property and VALUE is  
> one of its possible values.  

Why not just allow it to be any function returning an interval (with  
implicit argument = (point)), and provide appropriate functions to  
accomplish the tasks you propose?  

> By providing non-nil TEST you can test, for example:  
>  
> * Whether the actual value of text property `invisible' belongs to the  
> current `buffer-invisibility-spec'.  
>  
> * Whether a particular face is among the faces that are the value of  
> property `face'.  

A general predicate for HOW could do this, too.  

> Non-nil optional arg FLIP simply swaps the separators and the kept  
> substrings - regardless of HOW the separating is defined.  

This can be done for the "standard" functions by providing an optional  
FLIP argument, and using (lambda () (how-func 'flip-me)) as the HOW.  
Alternatively you could provide flipped standard HOW functions.  

I have no objection to a new function `split-string-à-la-drew' with  
any signature you like, but `split-string' should keep as simple a  
signature as possible.  


[-- Attachment #2: Type: text/html, Size: 2454 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-07-18 12:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-14 22:51 Proposed enhancement for `split-string' Drew Adams
2014-07-15  0:03 ` Stephen J. Turnbull
2014-07-18 12:24   ` Bozhidar Batsov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).