() Michael Heerdegen <michael_heerdegen@web.de>
() Wed, 11 Oct 2017 19:56:26 +0200

   Robert Weiner <rsw@gnu.org> writes:

   > This seems incredibly complicated.  It would help if you
   > would state the general problem you are trying to solve and
   > the performance characteristics you need.  It certainly is
   > not a generic duplicate removal library.  Why can't you
   > flatten your list and then just apply a sequence of
   > predicate matches as needed or use hashing as mentioned in
   > the commentary?

   I guess the name is misleading, I'll try to find a better one.

How about "multi-pass-dups", then use/document "pass" everywhere
in the code/discussion?  (Currently, you use "stage" in the code,
and "step" in this thread.)

   Look at the example of finding files with equal contents in
   your file system: [...]

Can you think of another use-case?  That exercise will help
highlight the general (factorable) concepts to document well.
Conversely, if you cannot, maybe that's a hint that the
abstraction level is too high; some opportunity exists for
specialization (and thus optimization).

   In a second step, we have less many files.

This is the key motivation for multi-pass...

   Do you need a mathematical formulation of the abstract
   problem that the algorithm solves, and how it works?

...so briefly explaining how iteration mitigates the suffering
due to the (irreducible) N^2 might be good to do early on (in
Commentary), before giving examples.  Leading w/ a small bit of
theory caters to those readers already disposed to that style.

To help the rest of the readers, a common technique is to label
components of the theory (e.g., [1], [2], ...) and refer to
these later, in the concrete examples.  Those readers might
gloss over the theory at first (being indisposed) but the back
references invite them to make connections at their own pace.

In short, "it is wise" to show how "it is wise" and avoid saying
"it is wise" (according to this practiced "wise"ass :-D).

   (find-dups my-sequence-of-file-names
              (list (list (lambda (file) ...)
                          #'eq)
                    (list (lambda (file) ...)
                          #'equal)
                    (list (lambda (file) ...)
                          #'equal)))

IIUC the 2nd level ‘list’ is to associate each characterization
func w/ a comparison func.  I wonder if there is another way.
Too bad Emacs Lisp has no "object properties" like Guile, eh?

OTOH, the 1st level ‘list’ seems like a gratuitous hoop (read:
source of latent PEBKAC/complaints/redesign).  Why not move that
down-chain, so caller need not worry?  Something like:

#+begin_src emacs-lisp
(multi-pass-dups
 MY-SEQUENCE-OF-FILE-NAMES
 (list (lambda (file) ...)
       #'eq)
 (list (lambda (file) ...)
       #'equal)
 (list (lambda (file) ...)
       #'equal))
#+end_src

(I also use the all-caps-for-metavariables convention, here.)

-- 
Thien-Thi Nguyen -----------------------------------------------
 (defun responsep (query)
   (pcase (context query)
     (`(technical ,ml) (correctp ml))
     ...))                              748E A0E8 1CB8 A748 9BFA
--------------------------------------- 6CE4 6703 2224 4C80 7502