unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* How to circumvent warning in batch mode
@ 2009-10-08 23:44 Decebal
  2009-10-09 13:43 ` Kevin Rodgers
       [not found] ` <mailman.8407.1255095844.2239.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 5+ messages in thread
From: Decebal @ 2009-10-08 23:44 UTC (permalink / raw)
  To: help-gnu-emacs

I have the following code:
emacs -batch -nw --eval='
  (let (
        (match-length)
        (reg-exp "^ +")
        (substitute-str "@")
        )
    (find-file "input")
    (goto-char (point-min))
    (while (re-search-forward "^ +" nil t)
      (setq match-length (- (point) (match-beginning 0)))
      (while (> match-length (length substitute-str))
        (setq substitute-str (concat substitute-str substitute-str)))
      (replace-match (substring substitute-str 0 match-length))
    )
    (write-file "outputEmacs")
  )
'
I have severall questions about it.
The input file is quite big and I get:
    File input is large (31MB), really open? (y or n)
Is there a way to circumvent this?
Is there a way to do this more efficient? This script needs about 20
seconds. When doing it with a Perl script, it takes about 6 seconds.
Instead of the '@' or chr$(64) I would like to use a nbsp or chr
$(160). But then the script needs almost 3 minutes. Also every space
is replaced by two characters chr$(194) + chr$(160).
What is going wrong here?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to circumvent warning in batch mode
  2009-10-08 23:44 How to circumvent warning in batch mode Decebal
@ 2009-10-09 13:43 ` Kevin Rodgers
  2009-10-09 14:42   ` Andreas Politz
       [not found]   ` <mailman.8415.1255099400.2239.help-gnu-emacs@gnu.org>
       [not found] ` <mailman.8407.1255095844.2239.help-gnu-emacs@gnu.org>
  1 sibling, 2 replies; 5+ messages in thread
From: Kevin Rodgers @ 2009-10-09 13:43 UTC (permalink / raw)
  To: help-gnu-emacs

Decebal wrote:
> I have the following code:
> emacs -batch -nw --eval='
>   (let (
>         (match-length)
>         (reg-exp "^ +")
>         (substitute-str "@")
>         )
>     (find-file "input")
>     (goto-char (point-min))
>     (while (re-search-forward "^ +" nil t)
>       (setq match-length (- (point) (match-beginning 0)))
>       (while (> match-length (length substitute-str))
>         (setq substitute-str (concat substitute-str substitute-str)))
>       (replace-match (substring substitute-str 0 match-length))
>     )
>     (write-file "outputEmacs")
>   )
> '
> I have severall questions about it.
> The input file is quite big and I get:
>     File input is large (31MB), really open? (y or n)
> Is there a way to circumvent this?

let-bind large-file-warning-threshold to nil around the call to find-file.

> Is there a way to do this more efficient? This script needs about 20
> seconds. When doing it with a Perl script, it takes about 6 seconds.

1. Put the code in a file (FILE.el) and byte-compile it.  Then instead of
    --eval 'CODE' on the command line, use --load FILE.elc

2. It looks like you are doing a lot of unnecessary string allocation with
    concat and substring:

    For every character after the first character in the match, you double the
    length of the replacement string until it is at least as long as the length
    of the match string, then you only use the number of characters that were in
    the match string anyway.  Change the loop to:

     (while (re-search-forward "^ +" nil t)
       (setq match-length (- (point) (match-beginning 0)))
       (if (> match-length 1)
	  (replace-match (make-string match-length ?@))
	(replace-match "@")))

    That could be improved further by caching each replacement string of length
    > 1, so it is only allocated once... But now, I can see that my version
    using make-string does the same amount of string allocation as yours using
    substring, and that your use of concat is infrequent (only needed when the
    match string jumps to a larger length than has been seen so far).  So caching
    the replacement string (in an array, indexed by its length) is the way to go.

> Instead of the '@' or chr$(64) I would like to use a nbsp or chr
> $(160). But then the script needs almost 3 minutes. Also every space
> is replaced by two characters chr$(194) + chr$(160).
> What is going wrong here?

In UTF-8, NBSP is 2 bytes: decimal 194 160 aka hex 00C2 00A0.

-- 
Kevin Rodgers
Denver, Colorado, USA





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to circumvent warning in batch mode
  2009-10-09 13:43 ` Kevin Rodgers
@ 2009-10-09 14:42   ` Andreas Politz
       [not found]   ` <mailman.8415.1255099400.2239.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 5+ messages in thread
From: Andreas Politz @ 2009-10-09 14:42 UTC (permalink / raw)
  To: help-gnu-emacs

Kevin Rodgers <kevin.d.rodgers@gmail.com> writes:

> Decebal wrote:
>> I have the following code:
>> emacs -batch -nw --eval='
>>   (let (
>>         (match-length)
>>         (reg-exp "^ +")
>>         (substitute-str "@")
>>         )
>>     (find-file "input")
>>     (goto-char (point-min))
>>     (while (re-search-forward "^ +" nil t)
>>       (setq match-length (- (point) (match-beginning 0)))
>>       (while (> match-length (length substitute-str))
>>         (setq substitute-str (concat substitute-str substitute-str)))
>>       (replace-match (substring substitute-str 0 match-length))
>>     )
>>     (write-file "outputEmacs")
>>   )
>> '
>> I have severall questions about it.
>> The input file is quite big and I get:
>>     File input is large (31MB), really open? (y or n)
>> Is there a way to circumvent this?
>
> let-bind large-file-warning-threshold to nil around the call to find-file.
>
>> Is there a way to do this more efficient? This script needs about 20
>> seconds. When doing it with a Perl script, it takes about 6 seconds.
>
> 1. Put the code in a file (FILE.el) and byte-compile it.  Then instead of
>    --eval 'CODE' on the command line, use --load FILE.elc
>
> 2. It looks like you are doing a lot of unnecessary string allocation with
>    concat and substring:
>

I would suggest removing the body of the while-loop, in order to see if
there is actually a significant amount of time spend there.

Depending on the file, a great deal goes probably into the
initialization of the major-mode.  Maybe you can use
`find-file-literally' or some other means, I don't know.

-ap





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to circumvent warning in batch mode
       [not found]   ` <mailman.8415.1255099400.2239.help-gnu-emacs@gnu.org>
@ 2009-10-10  8:23     ` Decebal
  0 siblings, 0 replies; 5+ messages in thread
From: Decebal @ 2009-10-10  8:23 UTC (permalink / raw)
  To: help-gnu-emacs

On Oct 9, 4:42 pm, Andreas Politz <poli...@fh-trier.de> wrote:
> I would suggest removing the body of the while-loop, in order to see if
> there is actually a significant amount of time spend there.

There the most time is spend. Without inner-loop it took 5 seconds.
Whithout the search for the regexp 3,5 seconds.
And without the write half a second.
The complete scripts takes 17,5 seconds.

When the inner loop  only has the setq for match-length it takes 5,5
seconds.
When I also have the loop to increase substitute-str it takes 6,5
seconds.
The complete scripts takes 17,5 seconds.
When I change the code to:
    (while (re-search-forward reg-exp nil t)
       (replace-match substitute-str)
    )
Then it takes 15 seconds.
So it looks like replace-match is very expensive. A candidate for
optimalisation?


> Depending on the file, a great deal goes probably into the
> initialization of the major-mode.  Maybe you can use
> `find-file-literally' or some other means, I don't know.

I allready changed to:
    (switch-to-buffer (find-file-noselect input-file t t))



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to circumvent warning in batch mode
       [not found] ` <mailman.8407.1255095844.2239.help-gnu-emacs@gnu.org>
@ 2009-10-10  8:50   ` Decebal
  0 siblings, 0 replies; 5+ messages in thread
From: Decebal @ 2009-10-10  8:50 UTC (permalink / raw)
  To: help-gnu-emacs

On Oct 9, 3:43 pm, Kevin Rodgers <kevin.d.rodg...@gmail.com> wrote:
> > The input file is quite big and I get:
> >     File input is large (31MB), really open? (y or n)
> > Is there a way to circumvent this?
>
> let-bind large-file-warning-threshold to nil around the call to find-file.

I allready use:
    (switch-to-buffer (find-file-noselect input-file t t))

> > Is there a way to do this more efficient? This script needs about 20
> > seconds. When doing it with a Perl script, it takes about 6 seconds.
>
> 1. Put the code in a file (FILE.el) and byte-compile it.  Then instead of
>     --eval 'CODE' on the command line, use --load FILE.elc

It is part of a script. So I think the compilation would be faster as
a load from disc. Also: how can I give parameters to an .elc
 file?

> 2. It looks like you are doing a lot of unnecessary string allocation with
>     concat and substring:
>
>     For every character after the first character in the match, you double the
>     length of the replacement string until it is at least as long as the length
>     of the match string, then you only use the number of characters that were in
>     the match string anyway.  Change the loop to:
>
>      (while (re-search-forward "^ +" nil t)
>        (setq match-length (- (point) (match-beginning 0)))
>        (if (> match-length 1)
>           (replace-match (make-string match-length ?@))
>         (replace-match "@")))

Will not work in my case. In the example the replace string is only a
character long, but it could also be for example '1234567890'.


>     That could be improved further by caching each replacement string of length
>     > 1, so it is only allocated once... But now, I can see that my version
>     using make-string does the same amount of string allocation as yours using
>     substring, and that your use of concat is infrequent (only needed when the
>     match string jumps to a larger length than has been seen so far).  So caching
>     the replacement string (in an array, indexed by its length) is the way to go.

Making the replacement string longer takes only about a second. The
real work is in the replace-match. Only the coders of Emacs can change
that.


> > Instead of the '@' or chr$(64) I would like to use a nbsp or chr
> > $(160). But then the script needs almost 3 minutes. Also every space
> > is replaced by two characters chr$(194) + chr$(160).
> > What is going wrong here?
>
> In UTF-8, NBSP is 2 bytes: decimal 194 160 aka hex 00C2 00A0.

That explains the two characters, but why does it akes so long?

Because I now use
    (switch-to-buffer (find-file-noselect input-file t t))
I do not have this problem anymore.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-10-10  8:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-08 23:44 How to circumvent warning in batch mode Decebal
2009-10-09 13:43 ` Kevin Rodgers
2009-10-09 14:42   ` Andreas Politz
     [not found]   ` <mailman.8415.1255099400.2239.help-gnu-emacs@gnu.org>
2009-10-10  8:23     ` Decebal
     [not found] ` <mailman.8407.1255095844.2239.help-gnu-emacs@gnu.org>
2009-10-10  8:50   ` Decebal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).