all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Xinglu Chen <public@yoctocell.xyz>
To: Sarah Morgensen <iskarian@mgsn.dev>
Cc: "Ludovic Courtès" <ludo@gnu.org>, 50359@debbugs.gnu.org
Subject: [bug#50359] [PATCH 3/3] import: Add 'generic-git' updater.
Date: Thu, 16 Sep 2021 14:48:23 +0200	[thread overview]
Message-ID: <87h7ekr8iw.fsf@yoctocell.xyz> (raw)
In-Reply-To: <86czp8j38z.fsf@mgsn.dev>

[-- Attachment #1: Type: text/plain, Size: 18647 bytes --]

On Thu, Sep 16 2021, Sarah Morgensen wrote:

> Hi,
>
> Xinglu Chen <public@yoctocell.xyz> writes:
>
>> On Wed, Sep 15 2021, iskarian@mgsn.dev wrote:
>>
>>> Hi,
>>>
>>> September 10, 2021 9:21 AM, "Xinglu Chen" <public@yoctocell.xyz> wrote:
>>>
>>>> * guix/git.scm (ls-remote-refs): New procedure.
>>>> * tests/git.scm ("remote-refs" "remote-refs: only tags"): New tests.
>>>> * guix/import/git.scm: New file.
>>>> * doc/guix.texi (Invoking guix refresh): Document it.
>>>> * tests/import-git.scm: New test file.
>>>> * Makefile.am (MODULES, SCM_TESTS): Register the new files.
>>>> 
>>>> Co-authored-by: Sarah Morgensen <iskarian@mgsn.dev>
>>>
>>> Overall this is looking good.  Thank you for adding tests (for
>>> remote-refs as well!), much appreciated.  It looks like you've done
>>> some good polishing.  I see a few nits, which I'll point out in a
>>> separate email when I'm not on mobile.  I'll also give it a good test.
>>>
>>> But... I've been thinking about the overall approach for a couple
>>> days, because I'm not very happy with the complexity of my heuristic.
>>>
>>> There can be a lot of weird tags in a repository--look at the one for
>>> xf86-video-intel for example.  My heuristic attempts to capture the
>>> assumption that repostories tend to move from using "_" or "-" to "."
>>> but it does fail to account for moving to or from dates (because dates
>>> don't compare with normal versions).
>>
>> But if a repo moved from using versions to tags, or vice-versa, we still
>> wouldn’t know if say “3.0.1” is newer than “2021.03.02”.  We would have
>> to know when the “3.0.1” tag was created.
>
> You're right; I thought of that afterwards.
>
>> Maybe we could have a ‘release-tag-date-scheme?’ property, that way we
>> could just try to match dates?
>
> That seems like it might be the only way to handle it in some cases (if
> they have both versions and dates with a "." delimiter).

It doesn’t have to be “.” delimiter though; if they both have the same
delimiter it would be difficult to distinguish a version from a date,
e.g., “1-2-3” vs “2021-03-23”.

> (Though, we are actually interested in the *lack* of a date scheme.
> If they use a date scheme now, other versions will be disregarded, so
> we're fine; but if they use versions now and used a date scheme
> before, the versions will be discarded.)

I am not sure what you are trying to say, could you elaborate?

>>> I also realized that we are not using a very useful piece of
>>> information--the previous version/tag combo.  I expect that in the
>>> vast majority of cases, the version delimiter for the newest version
>>> will be the same as the version delimiter for the last known version.
>>> (Perhaps the prefix as well?)  Can we use this information to make our
>>> guesses better?  What do you think?
>>
>> That sounds like a good idea.  What should happen if the delimiter from
>> the previous version/tag combo is different from the one that the
>> ‘guess-delimiter’ procedure returns?  Should the one from the previous
>> version/tag combo take precedence.
>
> Consider:
>
>   prefix := 'tag-prefix or guess-prefix-from-current-version+tag or default
>   delim := 'tag-delim or guess-delim-from-current-version+tag or guess-delimiter
>   suffix := 'tag-suffix or default
>
> This should cover:
>
>   1. Format stayed the same (including date formats)
>   2. Format changed from (git-version ...) to proper version
>
> This does not otherwise cover a complete change in format, such as "_"
> -> ".", date(-) -> version, or version -> date(.), for which I could
> argue requiring a manual update is reasonable.

Yeah, it’s not really possible to automatically detect those kind of
changes.

> It also does not cover when the tags have both versions and dates with
> the same delimiter.
>
> Though it would be nice to see when such updates are available, is it
> worth some bogus results?  Are false positives better or false negatives
> better?

Hmm, good question!  If in the future we have some kind of bot that
automatically runs ‘guix refresh -u’, builds the updated package, and
send a patch to the mailing list, not having false positives might be
more important.  We could also have a ‘disable-tag-updater?’ property to
disable the updater for packages which gives false positive, or maybe
that will result in to many properties.

> Unless you/we want to pursue one or both of the above changes now, the
> latest patch LGTM (modulo my nits).

I would prefer to wait a bit with the improvements mentioned above.  The
current patch has been in the works a week or two already, so it’s
probably a good idea to get it merged, and try to solve the less
important issues later.  :-)

> Thanks for your work,

You are welcome!  And thanks for taking the time to test and review the
work.


On Thu, Sep 16 2021, Sarah Morgensen wrote (again):

> Hello,
>
> Here are my promised nits.
>
> Xinglu Chen <public@yoctocell.xyz> writes:
>
>> * guix/git.scm (ls-remote-refs): New procedure.
>> * tests/git.scm ("remote-refs" "remote-refs: only tags"): New tests.
>> * guix/import/git.scm: New file.
>> * doc/guix.texi (Invoking guix refresh): Document it.
>> * tests/import-git.scm: New test file.
>> * Makefile.am (MODULES, SCM_TESTS): Register the new files.
>>
>> Co-authored-by: Sarah Morgensen <iskarian@mgsn.dev>
>
> Again, much thanks for writing tests.
>
>> +@item generic-git
>> +a generic updater for packages hosted on Git repositories.  It tries to
>> +be smart about parsing Git tag names, but if it is not able to parse the
>> +tag name and compare tags correctly, users can define the following
>> +properties for a package.
>> +
>> +@itemize
>> +@item @code{release-tag-prefix}: a regular expression for matching a prefix of
>> +the tag name.
>> +
>> +@item @code{release-tag-suffix}: a regular expression for matching a suffix of
>> +the tag name.
>> +
>> +@item @code{release-tag-version-delimiter}: a string used as the delimiter in
>> +the tag name for separating the numbers of the version.
>> +@end itemize
>> +
>> +@lisp
>> +(package
>> +  (name "foo")
>> +  ;; ...
>> +  (properties
>> +    '((release-tag-prefix . "^release0-")
>> +      (release-tag-suffix . "[a-z]?$")
>> +      (release-tag-version-delimiter . ":"))))
>> +@end lisp
>> +
>> +By default, the updater will ignore pre-releases; to make it also look
>> +for pre-releases, set the @code{accept-pre-releases?} property to
>> +@code{#t}.
>
> Should this be itemized above?

That’s probably a good idea, since it is related to how the tag will be
parsed.

>> +\f
>> +;;
>> +;;; Remote operations.
>> +;;;
>> +
>> +(define* (remote-refs url #:key tags?)
>> +  "Return the list of references advertised at Git repository URL.  If TAGS?
>> +is true, limit to only refs/tags."
>> +  (define (ref? ref)
>> +    ;; Like `git ls-remote --refs', only show actual references.
>> +    (and (string-prefix? "refs/" ref)
>> +         (not (string-suffix? "^{}" ref))))
>> +
>> +  (define (tag? ref)
>> +    (string-prefix? "refs/tags/" ref))
>> +
>> +  (define (include? ref)
>> +    (and (ref? ref)
>> +         (or (not tags?) (tag? ref))))
>> +
>> +  (with-libgit2
>> +   (call-with-temporary-directory
>> +    (lambda (cache-directory)
>> +      (let* ((repository (repository-init cache-directory))
>> +             ;; Create an in-memory remote so we don't touch disk.
>> +             (remote (remote-create-anonymous repository url)))
>> +        (remote-connect remote)
>> +        (remote-disconnect remote)
>> +        (repository-close! repository)
>> +
>> +        (filter-map (lambda (remote)
>> +                      (let ((name (remote-head-name remote)))
>> +                        (and (include? name)
>> +                             name)))
>> +                    (remote-ls remote)))))))
>
> I discovered that this can segfault unless 'remote-disconnect' and
> possibly 'repository-close!' are called *after* copying the data out.
> I've attached a diff for this.

I don’t see a diff attached; maybe you forgot?  :-)

>> +\f
>> +;;; Updater
>> +
>> +(define %pre-release-words
>> +  '("alpha" "beta" "rc" "dev" "test"))
>
> I found a few packages that use "pre" as well.

Good catch, I noticed that as well when doing some more testing.

>> +
>> +(define %pre-release-rx
>> +  (map (cut make-regexp <> regexp/icase) %pre-release-words))
>> +
>> +(define* (version-mapping tags #:key prefix suffix delim pre-releases?)
>> +  "Given a list of Git TAGS, return a association list where the car is the
>                                        ^ an
>
>> +version corresponding to the tag, and the cdr is the name of the tag."
>> +  (define (guess-delimiter)
>> +    (let ((total (length tags))
>> +          (dots (reduce + 0 (map (cut string-count <> #\.) tags)))
>> +          (dashes (reduce + 0 (map (cut string-count <> #\-) tags)))
>> +          (underscores (reduce + 0 (map (cut string-count <> #\_) tags))))
>> +      (cond
>> +       ((>= dots (* total 0.35)) ".")
>> +       ((>= dashes (* total 0.8)) "-")
>> +       ((>= underscores (* total 0.8)) "_")
>> +       (else ""))))
>> +
>> +  (define delim-rx (regexp-quote (or delim (guess-delimiter))))
>> +  (define suffix-rx  (string-append (or suffix "") "$"))
>> +
>> +  
>> +  (define prefix-rx (string-append "^" (or prefix "[^[:digit:]]*")))
>> +  (define pre-release-rx
>> +    (if pre-releases?
>> +        (string-append ".*(" (string-join %pre-release-words "|") ").*")
>> +        ""))
>> +
>> +  (define tag-rx
>> +    (string-append prefix-rx "([[:digit:]][^" delim-rx "[:punct:]]*"
>> +                   "(" delim-rx "[^[:punct:]" delim-rx "]+)"
>> +                   ;; If there is are no delimiters, it could mean that the
>                                   ^ no "is"
>
>> +                   ;; version just contains one number (e.g., "2"), thus, use
>> +                   ;; "*" instead of "+" to match zero or more numbers.
>> +                   (if (string=? delim-rx "") "*" "+")
>
> Good catch.
>
>> +                   pre-release-rx ")" suffix-rx))
>> +
>> +  (define (get-version tag)
>> +    (let ((tag-match (regexp-exec (make-regexp tag-rx) tag)))
>> +      (and tag-match
>> +           (regexp-substitute/global
>> +            #f delim-rx (match:substring tag-match 1)
>> +            ;; Don't insert "." if there aren't any delimiters in the first
>
> Nit: "if there were no delimiters", to be consistent with above comment.

Noted.

>> +            ;; place.
>> +            'pre (if (string=? delim-rx "") "" ".") 'post))))
>
> One issue with returning a different delimiter than the package
> currently uses is that the automatic updater won't really work as-is.

Good point;, the tag name would be incorrect in those cases.

> Hmmm.  When things are modified so the updater gets both the version and
> the git-reference, it should be able to reverse-engineer things well
> enough there.

Ah, looks like (guix upstream) needs some work.  :-)

> I imagine this is really only going to be an issue with dates currently
> written as "2017-01-01", anyway.  I'll put my comments on that in reply
> to the other email.
>
>> +
>> +  (define (entry<? a b)
>> +    (eq? (version-compare (car a) (car b)) '<))
>> +
>> +  (stable-sort (filter-map (lambda (tag)
>> +                             (let ((version (get-version tag)))
>> +                               (and version (cons version tag))))
>> +                           tags)
>> +               entry<?))
>> +
>> +(define* (latest-tag url #:key prefix suffix delim pre-releases?)
>> +  "Return the latest tag available from the Git repository at URL."
>
> This returns two values (in preparation for the above-mentioned switch),
> so maybe something like "Return the latest version and corresponding tag
> available from..."

Good catch.

>> +  (define (pre-release? tag)
>> +    (any (cut regexp-exec <> tag)
>> +         %pre-release-rx))
>> +
>> +  (let* ((tags (map (cut string-drop <> (string-length "refs/tags/"))
>
> Should be "cute" so string-length is only evaluated once -- though it's
> probably optimized like that anyway.

Good catch, I keep forgetting that ‘cute’ exists.  :-)

>> +                    (remote-refs url #:tags? #t)))
>> +         (versions->tags
>> +          (version-mapping (if pre-releases?
>> +                               tags
>> +                               (filter (negate pre-release?) tags))
>> +                           #:prefix prefix
>> +                           #:suffix suffix
>> +                           #:delim delim
>> +                           #:pre-releases? pre-releases?)))
>> +    (cond
>> +     ((null? tags)
>> +      (git-no-tags-error))
>> +     ((null? versions->tags)
>> +      (git-no-valid-tags-error))
>> +     (else
>> +      (match (last versions->tags)
>> +        ((version . tag)
>> +         (values version tag)))))))
>> +
>> +(define (latest-git-tag-version package)
>> +  "Given a PACKAGE, return the latest version of it, or #f if the latest version
>> +could not be determined."
>> +  (guard (c ((or (git-no-tags-error? c) (git-no-valid-tags-error? c))
>> +             (warning (or (package-field-location package 'source)
>> +                          (package-location package))
>> +                      (G_ "~a for ~a~%")
>> +                      (condition-message c)
>> +                      (package-name package))
>> +             #f)
>> +            ((eq? (exception-kind c) 'git-error)
>> +             (warning (or (package-field-location package 'source)
>> +                          (package-location package))
>> +                      (G_ "failed to fetch Git repository for ~a~%")
>> +                      (package-name package))
>> +             #f))
>> +    (let* ((source (package-source package))
>> +           (url (git-reference-url (origin-uri source)))
>> +           (properties (package-properties package))
>> +           (tag-prefix (assq-ref properties 'release-tag-prefix))
>> +           (tag-suffix (assq-ref properties 'release-tag-suffix))
>> +           (tag-version-delimiter (assq-ref properties 'release-tag-version-delimiter))
>> +           (refresh-pre-releases? (assq-ref properties 'accept-pre-releases?)))
>> +      (latest-tag url
>> +                  #:prefix tag-prefix
>> +                  #:suffix tag-suffix
>> +                  #:delim tag-version-delimiter
>> +                  #:pre-releases? refresh-pre-releases?))))
>
> This is entirely a style preference, so only take this suggestion if you
> like it :)
>
>     (let* ((source (package-source package))
>            (url (git-reference-url (origin-uri source)))
>            (property (cute assq-ref (package-properties package) <>)))
>       (latest-tag url
>                   #:prefix (property 'release-tag-prefix)
>                   #:suffix (property 'release-tag-suffix)
>                   #:delim (property 'release-tag-version-delimiter)
>                   #:pre-releases? (property 'accept-pre-releases?)))))

That does look cleaner, thanks for the suggestion!

>> +
>> +(define (git-package? package)
>> +  "Whether the origin of PACKAGE is a Git repostiory."
>
> "Return true if PACKAGE is..."

“PACKAGE is a Git repository.” doesn’t really sound right, maybe “if
PACKAGE is hosted on a Git repository”?

>> +  (match (package-source package)
>> +    ((? origin? origin)
>> +     (and (eq? (origin-method origin) git-fetch)
>> +          (git-reference? (origin-uri origin))))
>> +    (_ #f)))
>> +
>> +(define (latest-git-release package)
>> +  "Return the latest release of PACKAGE."
>
> "Return an <upstream-source> for the latest...", to match the other
> updaters.
>
>> +  (let* ((name (package-name package))
>> +         (old-version (package-version package))
>> +         (url (git-reference-url (origin-uri (package-source package))))
>> +         (new-version (latest-git-tag-version package)))
>> +
>> +    (and new-version
>> +         (upstream-source
>> +          (package name)
>> +          (version new-version)
>> +          (urls (list url))))))
>> +
>> +(define %generic-git-updater
>> +  (upstream-updater
>> +   (name 'generic-git)
>> +   (description "Updater for packages hosted on Git repositories")
>> +   (pred git-package?)
>> +   (latest latest-git-release)))
>
> I tested this updater on all packages in .scm files starting with f
> through z, and I found the following packages with possibly bogus
> updates:
>
> --8<---------------cut here---------------start------------->8---
> javaxom

I assume you meant ‘java-xom’  :-)

That’s a weird scheme; setting the delimiter to “.” doesn’t help since
it thinks that “127” is greater than “1.3.7”.

> luakit
> ocproxy
> pitivi

‘pitivi’ has a pretty weird version string to begin with; it may be
better to change it to the date: “0.999.0-2021-05.0” -> “2021-05.0”.

> eid-mw
> libhomfly
> gnuradio
> welle-io

Setting the delimiter to "." fixes the issue.

> racket-minimal

Setting the prefix to "v" fixes this.

> milkytracker
> cl-portal
> kodi-cli
> openjdk
> java-bouncycastle
> hurd
> opencsg

Setting the suffix to "-release" fixes this.

> povray
> gpsbabel

Setting the prefix to "gpsbabel_" fixes this.

> go
> stepmania
> ocaml-mcl
>
> many minetest packages (minetest will have its own updater, though)
>
> ocaml4.07-core-kernel, ocamlbuild and many other ocaml packages
> (they seem to be covered by the github updater)
> --8<---------------cut here---------------end--------------->8---

Hmm, ‘guix refresh’ says that ‘ocamlbuild’ is already the latest
version.  But you are right, many of the packages are already taken care
of by the ‘github’ updater.

> The following packages suggest a version -> date update, which may or
> may not be bogus:
>
> --8<---------------cut here---------------start------------->8---
> cataclysm-dda
> autotrace
> lbalgtk
> nheko
> libqalculate
> cl-antik
> cl-antik-base
> cl-hu.dwim.stefil
> cl-stefil
> cl-gsll
> sbcl-cl-gserver
> --8<---------------cut here---------------end--------------->8---

Thanks for taking the time to find these false positive!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

  reply	other threads:[~2021-09-16 12:50 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-03 15:50 [bug#50359] [PATCH] import: Add 'generic-git' updater Xinglu Chen
2021-09-05  0:19 ` Sarah Morgensen
2021-09-05  1:03   ` Sarah Morgensen
2021-09-05 10:36   ` Xinglu Chen
2021-09-06  5:40     ` Sarah Morgensen
2021-09-06 12:20       ` Xinglu Chen
2021-09-07  1:00         ` Sarah Morgensen
2021-09-07 19:13           ` Xinglu Chen
2021-09-08 18:28             ` Xinglu Chen
2021-09-10  8:36               ` Ludovic Courtès
2021-09-10 13:23                 ` Xinglu Chen
2021-09-05 13:11   ` Xinglu Chen
2021-09-06  3:14     ` Sarah Morgensen
2021-09-10 16:20 ` [bug#50359] [PATCH 0/3] " Xinglu Chen
2021-09-10 16:21   ` [bug#50359] [PATCH 1/3] tests: git: Don't read from the users global Git config file Xinglu Chen
2021-09-10 16:21   ` [bug#50359] [PATCH 2/3] tests: git: Make 'tag' directive non-interactive Xinglu Chen
2021-09-13  8:03     ` Ludovic Courtès
2021-09-10 16:21   ` [bug#50359] [PATCH 3/3] import: Add 'generic-git' updater Xinglu Chen
2021-09-13  8:07     ` Ludovic Courtès
2021-09-16  9:09     ` Sarah Morgensen
2021-09-16 12:48       ` Xinglu Chen [this message]
2021-09-16 23:42         ` Sarah Morgensen
2021-09-17  7:48           ` Xinglu Chen
2021-09-17  8:04   ` [bug#50359] [PATCH v3 0/3] " Xinglu Chen
2021-09-17  8:04     ` [bug#50359] [PATCH v3 1/3] tests: git: Don't read from the users global Git config file Xinglu Chen
2021-09-17  8:04     ` [bug#50359] [PATCH v3 2/3] tests: git: Make 'tag' directive non-interactive Xinglu Chen
2021-09-17  8:04     ` [bug#50359] [PATCH v3 3/3] import: Add 'generic-git' updater Xinglu Chen
2021-09-18 17:47     ` bug#50359: [PATCH v3 0/3] " Ludovic Courtès
2021-09-15  8:44 ` [bug#50359] [PATCH 3/3] import: " iskarian
2021-09-15 11:59   ` Xinglu Chen
2021-09-16  9:46     ` Sarah Morgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h7ekr8iw.fsf@yoctocell.xyz \
    --to=public@yoctocell.xyz \
    --cc=50359@debbugs.gnu.org \
    --cc=iskarian@mgsn.dev \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.