unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#17814: 24.3.91; better string manipulation in subr-x
@ 2014-06-19 19:10 Shigeru Fukaya
  2014-06-19 20:44 ` Stefan Monnier
  0 siblings, 1 reply; 6+ messages in thread
From: Shigeru Fukaya @ 2014-06-19 19:10 UTC (permalink / raw)
  To: 17814

[-- Attachment #1: Type: text/plain, Size: 671 bytes --]


Some string manipulation functions in subr-x have room to optimize.


string-trim-left, string-trim-right -- use `substring' and
`match-beginning/end' instead of `replace-match'.  The formers have
bytecodes and the latter not.

string-trim -- call string-trim-left first would be cost effective.
But, to change the code to trim both sides of the string at once might
be better.

string-remove-suffix -- change the last argument of substring will
shorten the code.


I change the string-trim defined using defun from defsubst, as its
string literal is somewhat big (Actually I suspect most of other
functions would also be better if defined by defun).


Regards,
Shigeru

[-- Attachment #2: subr-x.diff --]
[-- Type: application/octet-stream, Size: 1366 bytes --]

--- orig/subr-x.el	2014-03-21 14:34:40.000000000 +0900
+++ ./subr-x.el	2014-06-20 03:43:54.627390700 +0900
@@ -59,18 +59,23 @@
 (defsubst string-trim-left (string)
   "Remove leading whitespace from STRING."
   (if (string-match "\\`[ \t\n\r]+" string)
-      (replace-match "" t t string)
+      (substring string (match-end 0))
     string))
 
 (defsubst string-trim-right (string)
   "Remove trailing whitespace from STRING."
   (if (string-match "[ \t\n\r]+\\'" string)
-      (replace-match "" t t string)
+      (substring string 0 (match-beginning 0))
     string))
 
-(defsubst string-trim (string)
+(defun string-trim (string)
   "Remove leading and trailing whitespace from STRING."
-  (string-trim-left (string-trim-right string)))
+  ;;(string-trim-right (string-trim-left string))
+  (if (string-match (concat "\\`\\(?:[\s\t\n\r]+\\(?1:.*?\\)[\s\t\n\r]*"
+			    "\\|\\(?1:.*?\\)[\s\t\n\r]+\\)\\'")
+		    string)
+      (match-string 1 string)
+    string))
 
 (defsubst string-blank-p (string)
   "Check whether STRING is either empty or only whitespace."
@@ -85,7 +90,7 @@
 (defsubst string-remove-suffix (suffix string)
   "Remove SUFFIX from STRING if present."
   (if (string-suffix-p suffix string)
-      (substring string 0 (- (length string) (length suffix)))
+      (substring string 0 (- (length suffix)))
     string))
 
 (provide 'subr-x)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#17814: 24.3.91; better string manipulation in subr-x
  2014-06-19 19:10 bug#17814: 24.3.91; better string manipulation in subr-x Shigeru Fukaya
@ 2014-06-19 20:44 ` Stefan Monnier
  2014-06-20 17:14   ` Shigeru Fukaya
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Monnier @ 2014-06-19 20:44 UTC (permalink / raw)
  To: Shigeru Fukaya; +Cc: 17814

> I change the string-trim defined using defun from defsubst, as its
> string literal is somewhat big (Actually I suspect most of other
> functions would also be better if defined by defun).

The use of `defsubst' is so that subr-x.el is not needed at run-time.


        Stefan





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#17814: 24.3.91; better string manipulation in subr-x
  2014-06-19 20:44 ` Stefan Monnier
@ 2014-06-20 17:14   ` Shigeru Fukaya
  2014-06-20 19:14     ` Stefan Monnier
  0 siblings, 1 reply; 6+ messages in thread
From: Shigeru Fukaya @ 2014-06-20 17:14 UTC (permalink / raw)
  To: Stefan Monnier, 17814

>> I change the string-trim defined using defun from defsubst, as its
>> string literal is somewhat big (Actually I suspect most of other
>> functions would also be better if defined by defun).
>
>The use of `defsubst' is so that subr-x.el is not needed at run-time.

I see.  I didn't know that, and it's very good.
Then, the code is

(defsubst string-trim (string)
  "Remove leading and trailing whitespace from STRING."
  (string-trim-right (string-trim-left string)))

or

(defsubst string-trim (string)
  "Remove leading and trailing whitespace from STRING."
  (string-match "\\`[\s\t\n\r]*\\(.*?\\)[\s\t\n\r]*\\'" string)
  (if (or (< 0 (match-beginning 1)) (< (match-end 1) (match-end 0)))
      (match-string 1 string)
    string))

The latter is shorter in byte-compiled code, and call string-match
only once.  Literal string is seemingly larger, but the overhead of
a string object will cover it, I think.


Regards,
Shigeru





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#17814: 24.3.91; better string manipulation in subr-x
  2014-06-20 17:14   ` Shigeru Fukaya
@ 2014-06-20 19:14     ` Stefan Monnier
  2014-06-21  4:07       ` Shigeru Fukaya
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Monnier @ 2014-06-20 19:14 UTC (permalink / raw)
  To: Shigeru Fukaya; +Cc: 17814

>   (string-match "\\`[\s\t\n\r]*\\(.*?\\)[\s\t\n\r]*\\'" string)
>   (if (or (< 0 (match-beginning 1)) (< (match-end 1) (match-end 0)))
>       (match-string 1 string)
>     string))

The above string-match will fail on a string that has a newline, and the
subsequent code will use whatever was the old match-data, resulting in
broken behavior.

Other than that, I don't have any opinion on such changes (I've never
heard anyone complain about code size or cpu-time of any of those
functions, so I think it largely doesn't matter either way).

        Stefan





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#17814: 24.3.91; better string manipulation in subr-x
  2014-06-20 19:14     ` Stefan Monnier
@ 2014-06-21  4:07       ` Shigeru Fukaya
  2018-09-19  1:37         ` Noam Postavsky
  0 siblings, 1 reply; 6+ messages in thread
From: Shigeru Fukaya @ 2014-06-21  4:07 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 17814

>The above string-match will fail on a string that has a newline, and the
>subsequent code will use whatever was the old match-data, resulting in
>broken behavior.

"." in "\\`[\s\t\n\r]*\\(.*?\\)[\s\t\n\r]*\\'" must be "\\(.\\|\n\\)", sorry.

>Other than that, I don't have any opinion on such changes (I've never
>heard anyone complain about code size or cpu-time of any of those
>functions, so I think it largely doesn't matter either way).

Using string-trim-to-right and string-trim-to-left creates unnecessary
temporary string if both sides need triming may matter, then?

Anyway, I think I'm just sending a small proposal.  I don't care much
if you throw it away.  Thank you for spending your time.


Regards,
Shigeru
t





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#17814: 24.3.91; better string manipulation in subr-x
  2014-06-21  4:07       ` Shigeru Fukaya
@ 2018-09-19  1:37         ` Noam Postavsky
  0 siblings, 0 replies; 6+ messages in thread
From: Noam Postavsky @ 2018-09-19  1:37 UTC (permalink / raw)
  To: Shigeru Fukaya; +Cc: 17814, Stefan Monnier

close 17814
quit

Shigeru Fukaya <shigeru.fukaya@gmail.com> writes:

>>The above string-match will fail on a string that has a newline, and the
>>subsequent code will use whatever was the old match-data, resulting in
>>broken behavior.
>
> "." in "\\`[\s\t\n\r]*\\(.*?\\)[\s\t\n\r]*\\'" must be "\\(.\\|\n\\)", sorry.
>
>>Other than that, I don't have any opinion on such changes (I've never
>>heard anyone complain about code size or cpu-time of any of those
>>functions, so I think it largely doesn't matter either way).
>
> Using string-trim-to-right and string-trim-to-left creates unnecessary
> temporary string if both sides need triming may matter, then?
>
> Anyway, I think I'm just sending a small proposal.  I don't care much
> if you throw it away.  Thank you for spending your time.

An optimization similar to the one proposed here was done for
string-trim-to-{left,right} in [1: 1013e0392b].  I think the strim-trim
change isn't worth the extra complexity (especially since it's not even
entirely clear whether it would be faster/smaller), so I'm closing the
bug.

[1: 1013e0392b]: 2018-07-13 11:28:16 -0400
  Tweak subr-x.el substring functions
  https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=1013e0392b78ee0e2199fb51859dc9e939315f9b





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-09-19  1:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-19 19:10 bug#17814: 24.3.91; better string manipulation in subr-x Shigeru Fukaya
2014-06-19 20:44 ` Stefan Monnier
2014-06-20 17:14   ` Shigeru Fukaya
2014-06-20 19:14     ` Stefan Monnier
2014-06-21  4:07       ` Shigeru Fukaya
2018-09-19  1:37         ` Noam Postavsky

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).