unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Comparing lists
@ 2023-09-11 18:53 Petteri Hintsanen
  2023-09-12  7:26 ` Emanuel Berg
  2023-09-13 10:34 ` Philip Kaludercic
  0 siblings, 2 replies; 7+ messages in thread
From: Petteri Hintsanen @ 2023-09-11 18:53 UTC (permalink / raw)
  To: help-gnu-emacs

Hello,

I'm writing tests for Emms.  There we have metadata structures like
this:

  '((identification-header
     (opus-head . "OpusHead")
     (channel-count . 1)
     (opus-version . 1))
    (comment-header
     (opus-tags . "OpusTags")
     (vendor-length . 13)
     (user-comments
      ((length . 38) (user-comment . "ENCODER=opusenc from opus-tools 0.1.10"))
      ((length . 7) (user-comment . "foo=bar"))
      ((length . 5) (user-comment . "Key=x")))
     (vendor-string . "libopus 1.3.1")
     (user-comments-list-length . 3)))

These come from bindat-unpack function.

The problem is that sometimes the same data can be in a reversed or
otherwise different order.  For example, the data above could be
arranged like this:

  '((comment-header
     (opus-tags . "OpusTags")
     (user-comments
      ((length . 7) (user-comment . "foo=bar"))
      ((length . 38) (user-comment . "ENCODER=opusenc from opus-tools 0.1.10"))
      ((length . 5) (user-comment . "Key=x")))
     (vendor-string . "libopus 1.3.1")
     (user-comments-list-length . 3)
     (vendor-length . 13))
    (identification-header
     (channel-count . 1)
     (opus-head . "OpusHead")
     (opus-version . 1)))

So the "siblings" in the hierarchy can switch places, but otherwise it
is the same data.

How to compare such lists for equality?
I devised this recursive func:

  (defun my-equal (x y)
    (cond ((and (not (proper-list-p x))
                (not (proper-list-p y)))
           (equal x y))
          ((and (proper-list-p x)
                (proper-list-p y))
           (seq-every-p (lambda (elt-x)
                          (seq-find (lambda (elt-y)
                                      (my-equal elt-x elt-y))
                                    y))
                        x))))

where the first cond is for comparing anything else but lists, and the
second cond is for comparing (nested) lists.  (Nevermind the complexity
or the fact that the function is not symmetrical.)

I'm not adept with lisp structures so I'm not particularly confident
about this.  Do you see any obvious problems?  Are there better options?

Thanks,
Petteri




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Comparing lists
  2023-09-11 18:53 Comparing lists Petteri Hintsanen
@ 2023-09-12  7:26 ` Emanuel Berg
  2023-09-12 12:19   ` Rudolf Schlatte
  2023-09-13 15:32   ` Petteri Hintsanen
  2023-09-13 10:34 ` Philip Kaludercic
  1 sibling, 2 replies; 7+ messages in thread
From: Emanuel Berg @ 2023-09-12  7:26 UTC (permalink / raw)
  To: help-gnu-emacs

Petteri Hintsanen wrote:

> I'm writing tests for Emms. There we have metadata
> structures like this:
>
>   '((identification-header
>      (opus-head . "OpusHead")
>      (channel-count . 1)
>      (opus-version . 1))
>     (comment-header
>      (opus-tags . "OpusTags")
>      (vendor-length . 13)
>      (user-comments
>       ((length . 38) (user-comment . "ENCODER=opusenc from opus-tools 0.1.10"))
>       ((length . 7) (user-comment . "foo=bar"))
>       ((length . 5) (user-comment . "Key=x")))
>      (vendor-string . "libopus 1.3.1")
>      (user-comments-list-length . 3)))
>
> These come from bindat-unpack function.
>
> The problem is that sometimes the same data can be in
> a reversed or otherwise different order [...]

So it is, basically, that data can be (1 2) or (2 1), and they
should be considered the same?

The solution seems to be, either you write your own function
to check for equality (as you did), _or_ you normalize the
data and then use a standard function, in this case `equal'.

So with normalization it could look like this:

(require 'cl-lib)

(cl-labels ((sort-list (lst) (sort lst #'<)))
  (equal (sort-list '(1 2))
         (sort-list '(2 1)) )) ; t

The more complex the data structure, the more complex the
normalization, or conversely, the more complex the custom
function to check for equality.

You can decide to have the data normalized all the time, or
normalize it for the comparison occasion only.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Comparing lists
  2023-09-12  7:26 ` Emanuel Berg
@ 2023-09-12 12:19   ` Rudolf Schlatte
  2023-09-13 15:32   ` Petteri Hintsanen
  1 sibling, 0 replies; 7+ messages in thread
From: Rudolf Schlatte @ 2023-09-12 12:19 UTC (permalink / raw)
  To: help-gnu-emacs

Emanuel Berg <incal@dataswamp.org> writes:

> Petteri Hintsanen wrote:
>
>> I'm writing tests for Emms. There we have metadata
>> structures like this:
>>
>>   '((identification-header
>>      (opus-head . "OpusHead")
>>      (channel-count . 1)
>>      (opus-version . 1))
>>     (comment-header
>>      (opus-tags . "OpusTags")
>>      (vendor-length . 13)
>>      (user-comments
>>       ((length . 38) (user-comment . "ENCODER=opusenc from opus-tools 0.1.10"))
>>       ((length . 7) (user-comment . "foo=bar"))
>>       ((length . 5) (user-comment . "Key=x")))
>>      (vendor-string . "libopus 1.3.1")
>>      (user-comments-list-length . 3)))
>>
>> These come from bindat-unpack function.
>>
>> The problem is that sometimes the same data can be in
>> a reversed or otherwise different order [...]
>
> So it is, basically, that data can be (1 2) or (2 1), and they
> should be considered the same?
>
> The solution seems to be, either you write your own function
> to check for equality (as you did), _or_ you normalize the
> data and then use a standard function, in this case `equal'.
>
> So with normalization it could look like this:
>
> (require 'cl-lib)
>
> (cl-labels ((sort-list (lst) (sort lst #'<)))
>   (equal (sort-list '(1 2))
>          (sort-list '(2 1)) )) ; t

Mind the documentation of sort, especially the sentence "SEQ is
modified by side effects.".  So, the above code should rather be:

(cl-labels ((sort-list (list) (sort (copy-list list) #'<)))
  (equal (sort-list '(1 2))
         (sort-list '(2 1)) )) ; t




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Comparing lists
  2023-09-11 18:53 Comparing lists Petteri Hintsanen
  2023-09-12  7:26 ` Emanuel Berg
@ 2023-09-13 10:34 ` Philip Kaludercic
  2023-09-13 15:41   ` Petteri Hintsanen
  1 sibling, 1 reply; 7+ messages in thread
From: Philip Kaludercic @ 2023-09-13 10:34 UTC (permalink / raw)
  To: Petteri Hintsanen; +Cc: help-gnu-emacs

Petteri Hintsanen <petterih@iki.fi> writes:


[...]

> The problem is that sometimes the same data can be in a reversed or
> otherwise different order.  For example, the data above could be
> arranged like this:

[...]

> I'm not adept with lisp structures so I'm not particularly confident
> about this.  Do you see any obvious problems?  Are there better options?

Have you considered using cl-set-difference or seq-set-equal-p?

> Thanks,
> Petteri



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Comparing lists
  2023-09-12  7:26 ` Emanuel Berg
  2023-09-12 12:19   ` Rudolf Schlatte
@ 2023-09-13 15:32   ` Petteri Hintsanen
  1 sibling, 0 replies; 7+ messages in thread
From: Petteri Hintsanen @ 2023-09-13 15:32 UTC (permalink / raw)
  To: help-gnu-emacs

Emanuel Berg <incal@dataswamp.org> writes:

> So it is, basically, that data can be (1 2) or (2 1), and they
> should be considered the same?

Yes.

> The more complex the data structure, the more complex the
> normalization, or conversely, the more complex the custom
> function to check for equality.

Right, some kind of sorting or normalization, is typically another way
to deal with problems like this.  In this case it is seems unappealing,
though.

Thanks,
Petteri



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Comparing lists
  2023-09-13 10:34 ` Philip Kaludercic
@ 2023-09-13 15:41   ` Petteri Hintsanen
  2023-09-15  2:12     ` Michael Heerdegen
  0 siblings, 1 reply; 7+ messages in thread
From: Petteri Hintsanen @ 2023-09-13 15:41 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: help-gnu-emacs

Philip Kaludercic <philipk@posteo.net> writes:

> Have you considered using cl-set-difference or seq-set-equal-p?

No.  I tried to search for set functions like those, but somehow I
missed them 🤦

seq-set-equal-p would fit the bill but does not seem to be recursive.
For example:

  (seq-set-equal-p '(a b (c d)) '(a b (d c))) ⇒ nil

Nonetheless it is probably very useful in other contexts.  Same for cl
set functions.

Thanks,
Petteri

(btw. C-x 8 e e is the best Emacs feature ever)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Comparing lists
  2023-09-13 15:41   ` Petteri Hintsanen
@ 2023-09-15  2:12     ` Michael Heerdegen
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Heerdegen @ 2023-09-15  2:12 UTC (permalink / raw)
  To: Petteri Hintsanen; +Cc: Philip Kaludercic, help-gnu-emacs

Petteri Hintsanen <petterih@iki.fi> writes:

> seq-set-equal-p would fit the bill but does not seem to be recursive.
> For example:
>
>   (seq-set-equal-p '(a b (c d)) '(a b (d c))) ⇒ nil

I think there is nothing in Emacs that matches your case exactly.  The
map.el library comes close: your data has the form of nested
alists, i.e. nested maps, but there is no equality test for maps
implemented in that library.

Developers want to provide a "set.el" library in the future, but nobody
has started to write one so far.


If your data is always a nested alist with symbols as keys, I would try
to compare two data objects A and B efficiently like this:

 - if not both A and B are `consp', compare using `equal' (no need for a
  `proper-list-p' test I think)

 - if they are both `consp' sort the elements in A and B using the
   symbol names in the `car's as keys

 - then you can sequentially compare the alist elements: all `cars'
   should be `eq' and all `cdr's should be recursively equal, and no
   alist should run out of elements before the other also runs out of
   elements

If the data objects are really big, using hash tables might be better
than sorting.

Michael.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-15  2:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-11 18:53 Comparing lists Petteri Hintsanen
2023-09-12  7:26 ` Emanuel Berg
2023-09-12 12:19   ` Rudolf Schlatte
2023-09-13 15:32   ` Petteri Hintsanen
2023-09-13 10:34 ` Philip Kaludercic
2023-09-13 15:41   ` Petteri Hintsanen
2023-09-15  2:12     ` Michael Heerdegen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).