On Thu, Sep 16 2021, Sarah Morgensen wrote: > Hi, > > Xinglu Chen writes: > >> On Wed, Sep 15 2021, iskarian@mgsn.dev wrote: >> >>> Hi, >>> >>> September 10, 2021 9:21 AM, "Xinglu Chen" wrote: >>> >>>> * guix/git.scm (ls-remote-refs): New procedure. >>>> * tests/git.scm ("remote-refs" "remote-refs: only tags"): New tests. >>>> * guix/import/git.scm: New file. >>>> * doc/guix.texi (Invoking guix refresh): Document it. >>>> * tests/import-git.scm: New test file. >>>> * Makefile.am (MODULES, SCM_TESTS): Register the new files. >>>> >>>> Co-authored-by: Sarah Morgensen >>> >>> Overall this is looking good. Thank you for adding tests (for >>> remote-refs as well!), much appreciated. It looks like you've done >>> some good polishing. I see a few nits, which I'll point out in a >>> separate email when I'm not on mobile. I'll also give it a good test. >>> >>> But... I've been thinking about the overall approach for a couple >>> days, because I'm not very happy with the complexity of my heuristic. >>> >>> There can be a lot of weird tags in a repository--look at the one for >>> xf86-video-intel for example. My heuristic attempts to capture the >>> assumption that repostories tend to move from using "_" or "-" to "." >>> but it does fail to account for moving to or from dates (because dates >>> don't compare with normal versions). >> >> But if a repo moved from using versions to tags, or vice-versa, we still >> wouldn’t know if say “3.0.1” is newer than “2021.03.02”. We would have >> to know when the “3.0.1” tag was created. > > You're right; I thought of that afterwards. > >> Maybe we could have a ‘release-tag-date-scheme?’ property, that way we >> could just try to match dates? > > That seems like it might be the only way to handle it in some cases (if > they have both versions and dates with a "." delimiter). It doesn’t have to be “.” delimiter though; if they both have the same delimiter it would be difficult to distinguish a version from a date, e.g., “1-2-3” vs “2021-03-23”. > (Though, we are actually interested in the *lack* of a date scheme. > If they use a date scheme now, other versions will be disregarded, so > we're fine; but if they use versions now and used a date scheme > before, the versions will be discarded.) I am not sure what you are trying to say, could you elaborate? >>> I also realized that we are not using a very useful piece of >>> information--the previous version/tag combo. I expect that in the >>> vast majority of cases, the version delimiter for the newest version >>> will be the same as the version delimiter for the last known version. >>> (Perhaps the prefix as well?) Can we use this information to make our >>> guesses better? What do you think? >> >> That sounds like a good idea. What should happen if the delimiter from >> the previous version/tag combo is different from the one that the >> ‘guess-delimiter’ procedure returns? Should the one from the previous >> version/tag combo take precedence. > > Consider: > > prefix := 'tag-prefix or guess-prefix-from-current-version+tag or default > delim := 'tag-delim or guess-delim-from-current-version+tag or guess-delimiter > suffix := 'tag-suffix or default > > This should cover: > > 1. Format stayed the same (including date formats) > 2. Format changed from (git-version ...) to proper version > > This does not otherwise cover a complete change in format, such as "_" > -> ".", date(-) -> version, or version -> date(.), for which I could > argue requiring a manual update is reasonable. Yeah, it’s not really possible to automatically detect those kind of changes. > It also does not cover when the tags have both versions and dates with > the same delimiter. > > Though it would be nice to see when such updates are available, is it > worth some bogus results? Are false positives better or false negatives > better? Hmm, good question! If in the future we have some kind of bot that automatically runs ‘guix refresh -u’, builds the updated package, and send a patch to the mailing list, not having false positives might be more important. We could also have a ‘disable-tag-updater?’ property to disable the updater for packages which gives false positive, or maybe that will result in to many properties. > Unless you/we want to pursue one or both of the above changes now, the > latest patch LGTM (modulo my nits). I would prefer to wait a bit with the improvements mentioned above. The current patch has been in the works a week or two already, so it’s probably a good idea to get it merged, and try to solve the less important issues later. :-) > Thanks for your work, You are welcome! And thanks for taking the time to test and review the work. On Thu, Sep 16 2021, Sarah Morgensen wrote (again): > Hello, > > Here are my promised nits. > > Xinglu Chen writes: > >> * guix/git.scm (ls-remote-refs): New procedure. >> * tests/git.scm ("remote-refs" "remote-refs: only tags"): New tests. >> * guix/import/git.scm: New file. >> * doc/guix.texi (Invoking guix refresh): Document it. >> * tests/import-git.scm: New test file. >> * Makefile.am (MODULES, SCM_TESTS): Register the new files. >> >> Co-authored-by: Sarah Morgensen > > Again, much thanks for writing tests. > >> +@item generic-git >> +a generic updater for packages hosted on Git repositories. It tries to >> +be smart about parsing Git tag names, but if it is not able to parse the >> +tag name and compare tags correctly, users can define the following >> +properties for a package. >> + >> +@itemize >> +@item @code{release-tag-prefix}: a regular expression for matching a prefix of >> +the tag name. >> + >> +@item @code{release-tag-suffix}: a regular expression for matching a suffix of >> +the tag name. >> + >> +@item @code{release-tag-version-delimiter}: a string used as the delimiter in >> +the tag name for separating the numbers of the version. >> +@end itemize >> + >> +@lisp >> +(package >> + (name "foo") >> + ;; ... >> + (properties >> + '((release-tag-prefix . "^release0-") >> + (release-tag-suffix . "[a-z]?$") >> + (release-tag-version-delimiter . ":")))) >> +@end lisp >> + >> +By default, the updater will ignore pre-releases; to make it also look >> +for pre-releases, set the @code{accept-pre-releases?} property to >> +@code{#t}. > > Should this be itemized above? That’s probably a good idea, since it is related to how the tag will be parsed. >> + >> +;; >> +;;; Remote operations. >> +;;; >> + >> +(define* (remote-refs url #:key tags?) >> + "Return the list of references advertised at Git repository URL. If TAGS? >> +is true, limit to only refs/tags." >> + (define (ref? ref) >> + ;; Like `git ls-remote --refs', only show actual references. >> + (and (string-prefix? "refs/" ref) >> + (not (string-suffix? "^{}" ref)))) >> + >> + (define (tag? ref) >> + (string-prefix? "refs/tags/" ref)) >> + >> + (define (include? ref) >> + (and (ref? ref) >> + (or (not tags?) (tag? ref)))) >> + >> + (with-libgit2 >> + (call-with-temporary-directory >> + (lambda (cache-directory) >> + (let* ((repository (repository-init cache-directory)) >> + ;; Create an in-memory remote so we don't touch disk. >> + (remote (remote-create-anonymous repository url))) >> + (remote-connect remote) >> + (remote-disconnect remote) >> + (repository-close! repository) >> + >> + (filter-map (lambda (remote) >> + (let ((name (remote-head-name remote))) >> + (and (include? name) >> + name))) >> + (remote-ls remote))))))) > > I discovered that this can segfault unless 'remote-disconnect' and > possibly 'repository-close!' are called *after* copying the data out. > I've attached a diff for this. I don’t see a diff attached; maybe you forgot? :-) >> + >> +;;; Updater >> + >> +(define %pre-release-words >> + '("alpha" "beta" "rc" "dev" "test")) > > I found a few packages that use "pre" as well. Good catch, I noticed that as well when doing some more testing. >> + >> +(define %pre-release-rx >> + (map (cut make-regexp <> regexp/icase) %pre-release-words)) >> + >> +(define* (version-mapping tags #:key prefix suffix delim pre-releases?) >> + "Given a list of Git TAGS, return a association list where the car is the > ^ an > >> +version corresponding to the tag, and the cdr is the name of the tag." >> + (define (guess-delimiter) >> + (let ((total (length tags)) >> + (dots (reduce + 0 (map (cut string-count <> #\.) tags))) >> + (dashes (reduce + 0 (map (cut string-count <> #\-) tags))) >> + (underscores (reduce + 0 (map (cut string-count <> #\_) tags)))) >> + (cond >> + ((>= dots (* total 0.35)) ".") >> + ((>= dashes (* total 0.8)) "-") >> + ((>= underscores (* total 0.8)) "_") >> + (else "")))) >> + >> + (define delim-rx (regexp-quote (or delim (guess-delimiter)))) >> + (define suffix-rx (string-append (or suffix "") "$")) >> + >> + >> + (define prefix-rx (string-append "^" (or prefix "[^[:digit:]]*"))) >> + (define pre-release-rx >> + (if pre-releases? >> + (string-append ".*(" (string-join %pre-release-words "|") ").*") >> + "")) >> + >> + (define tag-rx >> + (string-append prefix-rx "([[:digit:]][^" delim-rx "[:punct:]]*" >> + "(" delim-rx "[^[:punct:]" delim-rx "]+)" >> + ;; If there is are no delimiters, it could mean that the > ^ no "is" > >> + ;; version just contains one number (e.g., "2"), thus, use >> + ;; "*" instead of "+" to match zero or more numbers. >> + (if (string=? delim-rx "") "*" "+") > > Good catch. > >> + pre-release-rx ")" suffix-rx)) >> + >> + (define (get-version tag) >> + (let ((tag-match (regexp-exec (make-regexp tag-rx) tag))) >> + (and tag-match >> + (regexp-substitute/global >> + #f delim-rx (match:substring tag-match 1) >> + ;; Don't insert "." if there aren't any delimiters in the first > > Nit: "if there were no delimiters", to be consistent with above comment. Noted. >> + ;; place. >> + 'pre (if (string=? delim-rx "") "" ".") 'post)))) > > One issue with returning a different delimiter than the package > currently uses is that the automatic updater won't really work as-is. Good point;, the tag name would be incorrect in those cases. > Hmmm. When things are modified so the updater gets both the version and > the git-reference, it should be able to reverse-engineer things well > enough there. Ah, looks like (guix upstream) needs some work. :-) > I imagine this is really only going to be an issue with dates currently > written as "2017-01-01", anyway. I'll put my comments on that in reply > to the other email. > >> + >> + (define (entry> + (eq? (version-compare (car a) (car b)) '<)) >> + >> + (stable-sort (filter-map (lambda (tag) >> + (let ((version (get-version tag))) >> + (and version (cons version tag)))) >> + tags) >> + entry> + >> +(define* (latest-tag url #:key prefix suffix delim pre-releases?) >> + "Return the latest tag available from the Git repository at URL." > > This returns two values (in preparation for the above-mentioned switch), > so maybe something like "Return the latest version and corresponding tag > available from..." Good catch. >> + (define (pre-release? tag) >> + (any (cut regexp-exec <> tag) >> + %pre-release-rx)) >> + >> + (let* ((tags (map (cut string-drop <> (string-length "refs/tags/")) > > Should be "cute" so string-length is only evaluated once -- though it's > probably optimized like that anyway. Good catch, I keep forgetting that ‘cute’ exists. :-) >> + (remote-refs url #:tags? #t))) >> + (versions->tags >> + (version-mapping (if pre-releases? >> + tags >> + (filter (negate pre-release?) tags)) >> + #:prefix prefix >> + #:suffix suffix >> + #:delim delim >> + #:pre-releases? pre-releases?))) >> + (cond >> + ((null? tags) >> + (git-no-tags-error)) >> + ((null? versions->tags) >> + (git-no-valid-tags-error)) >> + (else >> + (match (last versions->tags) >> + ((version . tag) >> + (values version tag))))))) >> + >> +(define (latest-git-tag-version package) >> + "Given a PACKAGE, return the latest version of it, or #f if the latest version >> +could not be determined." >> + (guard (c ((or (git-no-tags-error? c) (git-no-valid-tags-error? c)) >> + (warning (or (package-field-location package 'source) >> + (package-location package)) >> + (G_ "~a for ~a~%") >> + (condition-message c) >> + (package-name package)) >> + #f) >> + ((eq? (exception-kind c) 'git-error) >> + (warning (or (package-field-location package 'source) >> + (package-location package)) >> + (G_ "failed to fetch Git repository for ~a~%") >> + (package-name package)) >> + #f)) >> + (let* ((source (package-source package)) >> + (url (git-reference-url (origin-uri source))) >> + (properties (package-properties package)) >> + (tag-prefix (assq-ref properties 'release-tag-prefix)) >> + (tag-suffix (assq-ref properties 'release-tag-suffix)) >> + (tag-version-delimiter (assq-ref properties 'release-tag-version-delimiter)) >> + (refresh-pre-releases? (assq-ref properties 'accept-pre-releases?))) >> + (latest-tag url >> + #:prefix tag-prefix >> + #:suffix tag-suffix >> + #:delim tag-version-delimiter >> + #:pre-releases? refresh-pre-releases?)))) > > This is entirely a style preference, so only take this suggestion if you > like it :) > > (let* ((source (package-source package)) > (url (git-reference-url (origin-uri source))) > (property (cute assq-ref (package-properties package) <>))) > (latest-tag url > #:prefix (property 'release-tag-prefix) > #:suffix (property 'release-tag-suffix) > #:delim (property 'release-tag-version-delimiter) > #:pre-releases? (property 'accept-pre-releases?))))) That does look cleaner, thanks for the suggestion! >> + >> +(define (git-package? package) >> + "Whether the origin of PACKAGE is a Git repostiory." > > "Return true if PACKAGE is..." “PACKAGE is a Git repository.” doesn’t really sound right, maybe “if PACKAGE is hosted on a Git repository”? >> + (match (package-source package) >> + ((? origin? origin) >> + (and (eq? (origin-method origin) git-fetch) >> + (git-reference? (origin-uri origin)))) >> + (_ #f))) >> + >> +(define (latest-git-release package) >> + "Return the latest release of PACKAGE." > > "Return an for the latest...", to match the other > updaters. > >> + (let* ((name (package-name package)) >> + (old-version (package-version package)) >> + (url (git-reference-url (origin-uri (package-source package)))) >> + (new-version (latest-git-tag-version package))) >> + >> + (and new-version >> + (upstream-source >> + (package name) >> + (version new-version) >> + (urls (list url)))))) >> + >> +(define %generic-git-updater >> + (upstream-updater >> + (name 'generic-git) >> + (description "Updater for packages hosted on Git repositories") >> + (pred git-package?) >> + (latest latest-git-release))) > > I tested this updater on all packages in .scm files starting with f > through z, and I found the following packages with possibly bogus > updates: > > --8<---------------cut here---------------start------------->8--- > javaxom I assume you meant ‘java-xom’ :-) That’s a weird scheme; setting the delimiter to “.” doesn’t help since it thinks that “127” is greater than “1.3.7”. > luakit > ocproxy > pitivi ‘pitivi’ has a pretty weird version string to begin with; it may be better to change it to the date: “0.999.0-2021-05.0” -> “2021-05.0”. > eid-mw > libhomfly > gnuradio > welle-io Setting the delimiter to "." fixes the issue. > racket-minimal Setting the prefix to "v" fixes this. > milkytracker > cl-portal > kodi-cli > openjdk > java-bouncycastle > hurd > opencsg Setting the suffix to "-release" fixes this. > povray > gpsbabel Setting the prefix to "gpsbabel_" fixes this. > go > stepmania > ocaml-mcl > > many minetest packages (minetest will have its own updater, though) > > ocaml4.07-core-kernel, ocamlbuild and many other ocaml packages > (they seem to be covered by the github updater) > --8<---------------cut here---------------end--------------->8--- Hmm, ‘guix refresh’ says that ‘ocamlbuild’ is already the latest version. But you are right, many of the packages are already taken care of by the ‘github’ updater. > The following packages suggest a version -> date update, which may or > may not be bogus: > > --8<---------------cut here---------------start------------->8--- > cataclysm-dda > autotrace > lbalgtk > nheko > libqalculate > cl-antik > cl-antik-base > cl-hu.dwim.stefil > cl-stefil > cl-gsll > sbcl-cl-gserver > --8<---------------cut here---------------end--------------->8--- Thanks for taking the time to find these false positive!