all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* iterating over a list while removing elements
@ 2014-03-19 11:39 lee
  2014-03-19 12:39 ` Stefan
  2014-03-19 13:11 ` Michael Albinus
  0 siblings, 2 replies; 11+ messages in thread
From: lee @ 2014-03-19 11:39 UTC (permalink / raw
  To: help-gnu-emacs

Hi,

what is the defined behaviour when you iterate over a list and remove
elements from that very list?  For example:


(defsubst multisearch-directory-ref-p (dots)
  "Return t when the string DOTS ends in a directory reference."
  (or
   (string-match "\\.$" dots)
   (string-match "\\.\\.$" dots)))

(defun multisearch-make-files-list (directory)
  "Return a list of files in DIRECTORY, with directory references
and directories removed."
  (let ((files-list (directory-files directory t)))
    (dolist (entry files-list files-list)
      (unless (and
	       (not (multisearch-directory-ref-p entry))
	       (file-directory-p entry)
	       (file-readable-p entry))
	(setq files-list (delete entry files-list))))))


Surprisingly, this /appears/ to work.  Can I take that for granted, or
is this a stupid thing to do?  It`s like someone pulling the chair
you`re about to sit on from underneath you ...


-- 
Knowledge is volatile and fluid.  Software is power.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating over a list while removing elements
  2014-03-19 11:39 lee
@ 2014-03-19 12:39 ` Stefan
  2014-03-20 16:02   ` lee
  2014-03-19 13:11 ` Michael Albinus
  1 sibling, 1 reply; 11+ messages in thread
From: Stefan @ 2014-03-19 12:39 UTC (permalink / raw
  To: help-gnu-emacs

>    (string-match "\\.$" dots)
>    (string-match "\\.\\.$" dots)))

You meant

    (string-match "\\.\\'" dots)
    (string-match "\\.\\.\\'" dots)))

> Surprisingly, this /appears/ to work.  Can I take that for granted, or
> is this a stupid thing to do?  It`s like someone pulling the chair
> you`re about to sit on from underneath you ...

This is undocumented, so better not rely on the details of the behavior.
You can rely on the fact that dolist will behave sanely, tho: it should
not go berzerk, it should go through at least all elements still
remaining in the list, and at most all elements that have been in
the list.

But you should better not assume that dolist will skip the "entry" you
just removed.  E.g. you could do the following, which should be somewhat
faster (since `delete' is O(n)):

(defun multisearch-make-files-list (directory)
  "Return a list of files in DIRECTORY, with directory references
and directories removed."
  (let ((files-list (directory-files directory t))
        (newlist '()))
    (dolist (entry files-list (nreverse newlist))
      (when (and
	       (not (multisearch-directory-ref-p entry))
	       (file-directory-p entry)
	       (file-readable-p entry))
	(push entry newlist)))))



        Stefan




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating over a list while removing elements
  2014-03-19 11:39 lee
  2014-03-19 12:39 ` Stefan
@ 2014-03-19 13:11 ` Michael Albinus
  2014-03-20 16:10   ` lee
  1 sibling, 1 reply; 11+ messages in thread
From: Michael Albinus @ 2014-03-19 13:11 UTC (permalink / raw
  To: help-gnu-emacs

lee <lee@yun.yagibdah.de> writes:

> Hi,

Hi,

> what is the defined behaviour when you iterate over a list and remove
> elements from that very list?  For example:
>
>
> (defsubst multisearch-directory-ref-p (dots)
>   "Return t when the string DOTS ends in a directory reference."
>   (or
>    (string-match "\\.$" dots)
>    (string-match "\\.\\.$" dots)))
>
> (defun multisearch-make-files-list (directory)
>   "Return a list of files in DIRECTORY, with directory references
> and directories removed."
>   (let ((files-list (directory-files directory t)))
>     (dolist (entry files-list files-list)
>       (unless (and
> 	       (not (multisearch-directory-ref-p entry))
> 	       (file-directory-p entry)
> 	       (file-readable-p entry))
> 	(setq files-list (delete entry files-list))))))

Side remark: You don't need `multisearch-directory-ref-p' when applying

  (let ((files-list (directory-files directory t directory-files-no-dot-files-regexp)))

Best regards, Michael.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating over a list while removing elements
       [not found] <mailman.17526.1395229207.10748.help-gnu-emacs@gnu.org>
@ 2014-03-19 13:12 ` Pascal J. Bourguignon
  2014-03-19 18:28   ` Joost Kremers
  2014-03-20 16:33   ` lee
  0 siblings, 2 replies; 11+ messages in thread
From: Pascal J. Bourguignon @ 2014-03-19 13:12 UTC (permalink / raw
  To: help-gnu-emacs

lee <lee@yun.yagibdah.de> writes:

> Hi,
>
> what is the defined behaviour when you iterate over a list and remove
> elements from that very list?  For example:
>
>
> (defsubst multisearch-directory-ref-p (dots)
>   "Return t when the string DOTS ends in a directory reference."
>   (or
>    (string-match "\\.$" dots)
>    (string-match "\\.\\.$" dots)))
>
> (defun multisearch-make-files-list (directory)
>   "Return a list of files in DIRECTORY, with directory references
> and directories removed."
>   (let ((files-list (directory-files directory t)))
>     (dolist (entry files-list files-list)
>       (unless (and
> 	       (not (multisearch-directory-ref-p entry))
> 	       (file-directory-p entry)
> 	       (file-readable-p entry))
> 	(setq files-list (delete entry files-list))))))
>
>
> Surprisingly, this /appears/ to work.  Can I take that for granted, or
> is this a stupid thing to do?  It`s like someone pulling the chair
> you`re about to sit on from underneath you ...


(require 'cl)

(defun multisearch-make-files-list (directory)
  "Return a list of files in DIRECTORY, with directory references
and directories removed."
  (remove-if (lambda (entry)
               (and (not (multisearch-directory-ref-p entry))
                    (file-directory-p entry)
                    (file-readable-p entry)))
              (directory-files directory t)))

However, your test conditions looks strange to me, compared to the
docstring.  In natural language, AND means OR, in general.

-- 
__Pascal Bourguignon__
http://www.informatimago.com/
"Le mercure monte ?  C'est le moment d'acheter !"


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating over a list while removing elements
  2014-03-19 13:12 ` iterating over a list while removing elements Pascal J. Bourguignon
@ 2014-03-19 18:28   ` Joost Kremers
  2014-03-20 17:34     ` lee
  2014-03-20 16:33   ` lee
  1 sibling, 1 reply; 11+ messages in thread
From: Joost Kremers @ 2014-03-19 18:28 UTC (permalink / raw
  To: help-gnu-emacs

Pascal J. Bourguignon wrote:
> (require 'cl)
>
> (defun multisearch-make-files-list (directory)
>   "Return a list of files in DIRECTORY, with directory references
> and directories removed."
>   (remove-if (lambda (entry)
>                (and (not (multisearch-directory-ref-p entry))
>                     (file-directory-p entry)
>                     (file-readable-p entry)))
>               (directory-files directory t)))

Or use --remove from the dash library. No need for lambda:

(require 'dash)

(defun multisearch-make-files-list (directory)
  "Return a list of files in DIRECTORY, with directory references
and directories removed."
  (--remove (or (multisearch-directory-ref-p it) ; or seems to better express the intention of the doc string.
                (file-directory-p it)
                (not (file-readable-p it)))
            (directory-files directory t)))

It's less portable, though, because dash doesn't come with Emacs.

Note, BTW, that file-directory-p returns t for "." and "..". It seems to
me that the only two names that directory-files could return that you
really want to exclude are those two,[1] so there's no need for
multisearch-directory-ref-p, I think. (Or is there?)

Joost



[1] Files can have dots in their names, so what do you want to do with a
file whose name ends in a dot? Or two? Unlikely, for sure, but not
impossible.

-- 
Joost Kremers                                   joostkremers@fastmail.fm
Selbst in die Unterwelt dringt durch Spalten Licht
EN:SiS(9)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating over a list while removing elements
  2014-03-19 12:39 ` Stefan
@ 2014-03-20 16:02   ` lee
  0 siblings, 0 replies; 11+ messages in thread
From: lee @ 2014-03-20 16:02 UTC (permalink / raw
  To: help-gnu-emacs

Stefan <monnier@iro.umontreal.ca> writes:

>>    (string-match "\\.$" dots)
>>    (string-match "\\.\\.$" dots)))
>
> You meant
>
>     (string-match "\\.\\'" dots)
>     (string-match "\\.\\.\\'" dots)))

Hm, what`s the difference?

>> Surprisingly, this /appears/ to work.  Can I take that for granted, or
>> is this a stupid thing to do?  It`s like someone pulling the chair
>> you`re about to sit on from underneath you ...
>
> This is undocumented, so better not rely on the details of the behavior.
> You can rely on the fact that dolist will behave sanely, tho: it should
> not go berzerk, it should go through at least all elements still
> remaining in the list, and at most all elements that have been in
> the list.
>
> But you should better not assume that dolist will skip the "entry" you
> just removed.  E.g. you could do the following, which should be somewhat
> faster (since `delete' is O(n)):

Duplicating the list is an approach I used at other places, and I was
wondering if there`s some way to do it without duplication.  The list
can be fairly large, and I`m using several, so saving some memory would
be nice, even if it usually doesn`t really matter.

Anyway, redundancy through `delete' isn`t good, letting aside that it
may yield unexpected results.

> (defun multisearch-make-files-list (directory)
>   "Return a list of files in DIRECTORY, with directory references
> and directories removed."
>   (let ((files-list (directory-files directory t))
>         (newlist '()))

Why not (newlist nil)?

I`ve got it working nicely now, you can find the whole thing at
https://github.com/lee-/emacs/tree/master/multisearch


-- 
Knowledge is volatile and fluid.  Software is power.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating over a list while removing elements
  2014-03-19 13:11 ` Michael Albinus
@ 2014-03-20 16:10   ` lee
  0 siblings, 0 replies; 11+ messages in thread
From: lee @ 2014-03-20 16:10 UTC (permalink / raw
  To: help-gnu-emacs

Michael Albinus <michael.albinus@gmx.de> writes:

> lee <lee@yun.yagibdah.de> writes:
>>
>> (defsubst multisearch-directory-ref-p (dots)
>>   "Return t when the string DOTS ends in a directory reference."
>>   (or
>>    (string-match "\\.$" dots)
>>    (string-match "\\.\\.$" dots)))
>>
>> (defun multisearch-make-files-list (directory)
>>   "Return a list of files in DIRECTORY, with directory references
>> and directories removed."
>>   (let ((files-list (directory-files directory t)))
>>     (dolist (entry files-list files-list)
>>       (unless (and
>> 	       (not (multisearch-directory-ref-p entry))
>> 	       (file-directory-p entry)
>> 	       (file-readable-p entry))
>> 	(setq files-list (delete entry files-list))))))
>
> Side remark: You don't need `multisearch-directory-ref-p' when applying
>
>   (let ((files-list (directory-files directory t directory-files-no-dot-files-regexp)))

True --- and (not (file-directory-p entry) returns nil for directory
refs.  I`ve come to use the MATCH argument of `directory-files', though,
and `multisearch-directory-ref-p' is easier because then I don`t need to
figure out a regexp that matches MATCH but not "." or "..".  And a
string match is less expensive than a file look-up ...

Which reminds me that I should put something in to save all file look-ups
for files which already are in buffers because they are not needed then
...


-- 
Knowledge is volatile and fluid.  Software is power.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating over a list while removing elements
  2014-03-19 13:12 ` iterating over a list while removing elements Pascal J. Bourguignon
  2014-03-19 18:28   ` Joost Kremers
@ 2014-03-20 16:33   ` lee
  1 sibling, 0 replies; 11+ messages in thread
From: lee @ 2014-03-20 16:33 UTC (permalink / raw
  To: help-gnu-emacs

"Pascal J. Bourguignon" <pjb@informatimago.com> writes:

> lee <lee@yun.yagibdah.de> writes:
>
>> Hi,
>>
>> what is the defined behaviour when you iterate over a list and remove
>> elements from that very list?  For example:
>>
>>
>> (defsubst multisearch-directory-ref-p (dots)
>>   "Return t when the string DOTS ends in a directory reference."
>>   (or
>>    (string-match "\\.$" dots)
>>    (string-match "\\.\\.$" dots)))
>>
>> (defun multisearch-make-files-list (directory)
>>   "Return a list of files in DIRECTORY, with directory references
>> and directories removed."
>>   (let ((files-list (directory-files directory t)))
>>     (dolist (entry files-list files-list)
>>       (unless (and
>> 	       (not (multisearch-directory-ref-p entry))
>> 	       (file-directory-p entry)
>> 	       (file-readable-p entry))
>> 	(setq files-list (delete entry files-list))))))
>>
>>
>> Surprisingly, this /appears/ to work.  Can I take that for granted, or
>> is this a stupid thing to do?  It`s like someone pulling the chair
>> you`re about to sit on from underneath you ...
>
>
> (require 'cl)
>
> (defun multisearch-make-files-list (directory)
>   "Return a list of files in DIRECTORY, with directory references
> and directories removed."
>   (remove-if (lambda (entry)
>                (and (not (multisearch-directory-ref-p entry))
>                     (file-directory-p entry)
>                     (file-readable-p entry)))
>               (directory-files directory t)))

Interesting :)  Looking at the docstring for `remove-if', I wonder how
many copies this makes?  ... Hm, does that even work?  How and why would
each list member be fed to the lambda thing?  And the lambda thing would
return either nil or t, which is not what I would want to remove or not
...  And no members of the list are nil: So would that remove anything
at all?

> However, your test conditions looks strange to me, compared to the
> docstring.  In natural language, AND means OR, in general.

Good catch --- I found my function returned the directory names instead
of the file names, that was a bug ...


(defun multisearch-make-files-list (directory &optional match)
  "Return a list of files in DIRECTORY, with directory references and
directories removed.

When MATCH is non-nil, only files that match the regexp MATCH are
included in the list."
  (let ((files-list (directory-files directory t match t))
	(clean-list nil))
    (dolist (entry files-list clean-list)
      (when (and
	     (not (multisearch-directory-ref-p entry))
	     (not (file-directory-p entry))
	     (file-readable-p entry))
	(setq clean-list (cons entry clean-list))))))


see https://github.com/lee-/emacs/tree/master/multisearch


-- 
Knowledge is volatile and fluid.  Software is power.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating over a list while removing elements
  2014-03-19 18:28   ` Joost Kremers
@ 2014-03-20 17:34     ` lee
  2014-03-20 20:16       ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: lee @ 2014-03-20 17:34 UTC (permalink / raw
  To: help-gnu-emacs

Joost Kremers <joost.m.kremers@gmail.com> writes:

> Note, BTW, that file-directory-p returns t for "." and "..". It seems to
> me that the only two names that directory-files could return that you
> really want to exclude are those two,[1] so there's no need for
> multisearch-directory-ref-p, I think. (Or is there?)

The idea is that (file-directory-p "..") may cause a file look-up which
can be avoided by string-matching.  String-matching is some magnitudes
faster than file look-ups.

When I try multisearch with with a source file from a somewhat large
application that has a bunch of #includes, it takes a couple (like 15 or
so) seconds to create the list of files, and it creates 123 additional
buffers.  When you look at the source of multisearch[1], you`ll see that
there can be a huge amount of look-ups, many of them on non-existing
files.

How does the disk cache deal with non-existing files?  The
meta-information is probably in the cache (more or less), yet there can
be no information for non-existing files.  Creating the list of files is
actually what takes most of the time.  Visiting them is really fast;
searching doesn`t take long, either.


[1]: https://github.com/lee-/emacs/tree/master/multisearch

> [1] Files can have dots in their names, so what do you want to do with a
> file whose name ends in a dot? Or two? Unlikely, for sure, but not
> impossible.

Hm.  I haven`t considered this possibility ...  I`ll have to change the
regexps used in `multisearch-directory-ref-p' ...

Hmmmm ... I`m testing this again, and there seems to be a bug
somewhere.  I`ll fix that tomorrow or so ...


-- 
Knowledge is volatile and fluid.  Software is power.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating over a list while removing elements
  2014-03-20 17:34     ` lee
@ 2014-03-20 20:16       ` Eli Zaretskii
  2014-03-21  5:25         ` lee
  0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2014-03-20 20:16 UTC (permalink / raw
  To: help-gnu-emacs

> From: lee <lee@yun.yagibdah.de>
> Date: Thu, 20 Mar 2014 18:34:58 +0100
> 
> Joost Kremers <joost.m.kremers@gmail.com> writes:
> 
> > Note, BTW, that file-directory-p returns t for "." and "..". It seems to
> > me that the only two names that directory-files could return that you
> > really want to exclude are those two,[1] so there's no need for
> > multisearch-directory-ref-p, I think. (Or is there?)
> 
> The idea is that (file-directory-p "..") may cause a file look-up which
> can be avoided by string-matching.

After you've read the directory, its entries, including "..", are in
the cache, so (file-directory-p "..") should not need to hit the disk.

> How does the disk cache deal with non-existing files?  The
> meta-information is probably in the cache (more or less), yet there can
> be no information for non-existing files.

Non-existing files are missing entries in their parent directories,
which are files by themselves, and thus cached.  So dealing with
non-existing files just means reading their directory (from memory,
not from disk) and noticing that a file by that name is not there.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating over a list while removing elements
  2014-03-20 20:16       ` Eli Zaretskii
@ 2014-03-21  5:25         ` lee
  0 siblings, 0 replies; 11+ messages in thread
From: lee @ 2014-03-21  5:25 UTC (permalink / raw
  To: help-gnu-emacs

Eli Zaretskii <eliz@gnu.org> writes:

>> From: lee <lee@yun.yagibdah.de>
>> Date: Thu, 20 Mar 2014 18:34:58 +0100
>> 
>> Joost Kremers <joost.m.kremers@gmail.com> writes:
>> 
>> > Note, BTW, that file-directory-p returns t for "." and "..". It seems to
>> > me that the only two names that directory-files could return that you
>> > really want to exclude are those two,[1] so there's no need for
>> > multisearch-directory-ref-p, I think. (Or is there?)
>> 
>> The idea is that (file-directory-p "..") may cause a file look-up which
>> can be avoided by string-matching.
>
> After you've read the directory, its entries, including "..", are in
> the cache, so (file-directory-p "..") should not need to hit the disk.
>
>> How does the disk cache deal with non-existing files?  The
>> meta-information is probably in the cache (more or less), yet there can
>> be no information for non-existing files.
>
> Non-existing files are missing entries in their parent directories,
> which are files by themselves, and thus cached.  So dealing with
> non-existing files just means reading their directory (from memory,
> not from disk) and noticing that a file by that name is not there.

That`s probably all true --- yet I think a simple string match may be
faster.  And if the cache is small, without the check it might yet cause
file-lookups ...


-- 
Knowledge is volatile and fluid.  Software is power.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-03-21  5:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.17526.1395229207.10748.help-gnu-emacs@gnu.org>
2014-03-19 13:12 ` iterating over a list while removing elements Pascal J. Bourguignon
2014-03-19 18:28   ` Joost Kremers
2014-03-20 17:34     ` lee
2014-03-20 20:16       ` Eli Zaretskii
2014-03-21  5:25         ` lee
2014-03-20 16:33   ` lee
2014-03-19 11:39 lee
2014-03-19 12:39 ` Stefan
2014-03-20 16:02   ` lee
2014-03-19 13:11 ` Michael Albinus
2014-03-20 16:10   ` lee

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.