From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id KCQ+B3oeHmQsBQEASxT56A (envelope-from ) for ; Fri, 24 Mar 2023 23:04:42 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id EERpBnoeHmSYYAEAG6o9tA (envelope-from ) for ; Fri, 24 Mar 2023 23:04:42 +0100 Received: from lists.gnu.org (unknown [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id AED0D179D9 for ; Fri, 24 Mar 2023 23:04:41 +0100 (CET) Authentication-Results: aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1679695481; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:list-id:list-help:list-unsubscribe:list-subscribe: list-post; bh=Yvvj+EjVcaTO1rDgaf2FPhvvj3/lVC63F4iYYT+qynQ=; b=IyvtS/w3GFDOWfP6NDnebBZCByy1RwfIfbEWv/nZR5xezBzg+uYftujy1+94kfNMi7kyzp g/0T90Gv+Ly6YVEqVz6Ko0uhIKRAFhvpRxLOIlX8hdtv0OrYx2A3SthjAv++8n0He/uoYY hrF14RxzggSJs8/JhC4WMq0Rz5U7Z/mNYXkciW8jLWKOMNIH16jyzxXHE6cfFUkEGyd2LE 7BGoCmfHebp6YvPACa0B4p+NL5qCLiOyS45NAQLN9v7E59H3DWxTCZXclulS0hWlEkPkYN g2OOwenEgh1Ggbu7HiC0b6t/Zmo6bG9u+KLvKqW/IYob9yEVOx1irMFQl4/26Q== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1679695481; a=rsa-sha256; cv=none; b=Z3rI4DDzGbbKS59roXMPOd8OFdxgybPyLqYP0NpkZR0LTcoswv+8ASLG7M4VhWQEEaWGKF +bsvUO+xdnPEAjV3zRXJAx4nOcKP9MNcVTBrEpnxYHY9R89507BBj4XxeU7zf4JP4YiHhB d7/kNS8pH4K25sWSecyZr95Xl+z9uypKWr91QpaBKDlbuyRMIh7lvwIxSEsOdQe+RZJpGl TDU76RaBmpr921ktgU+CqiRoCbxyvnwA9mWpkPv9+TfsTi5HyTBi5y7K0qEfh9NU6LsIyY Gnn+wG3fGxsO4/y83pnKPgNIxwi8r+RmNSOvln3hGMW2a26SLyHNuDg80BZ2NQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=none Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pfpTk-0004Wv-28; Fri, 24 Mar 2023 18:01:44 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pfpTK-0004BI-IH for bug-guix@gnu.org; Fri, 24 Mar 2023 18:01:21 -0400 Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pfpTK-0000eY-44 for bug-guix@gnu.org; Fri, 24 Mar 2023 18:01:18 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pfjsc-0005ph-6q for bug-guix@gnu.org; Fri, 24 Mar 2023 12:03:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#62422: IRC channel log search results are not chronological for recent logs Resent-From: Hugo Buddelmeijer Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 24 Mar 2023 16:03:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 62422 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 62422@debbugs.gnu.org, rekado@elephly.net X-Debbugs-Original-To: bug-guix@gnu.org, Ricardo Wurmus Received: via spool by submit@debbugs.gnu.org id=B.167967377922409 (code B ref -1); Fri, 24 Mar 2023 16:03:02 +0000 Received: (at submit) by debbugs.gnu.org; 24 Mar 2023 16:02:59 +0000 Received: from localhost ([127.0.0.1]:41201 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pfjsV-0005pJ-Mj for submit@debbugs.gnu.org; Fri, 24 Mar 2023 12:02:59 -0400 Received: from lists.gnu.org ([209.51.188.17]:42200) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pfjsQ-0005p6-5G for submit@debbugs.gnu.org; Fri, 24 Mar 2023 12:02:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pfjsG-0008Tf-WC for bug-guix@gnu.org; Fri, 24 Mar 2023 12:02:43 -0400 Received: from mail-lf1-f43.google.com ([209.85.167.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pfjsC-0001kR-Cj for bug-guix@gnu.org; Fri, 24 Mar 2023 12:02:39 -0400 Received: by mail-lf1-f43.google.com with SMTP id h25so2864203lfv.6 for ; Fri, 24 Mar 2023 09:02:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679673744; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Yvvj+EjVcaTO1rDgaf2FPhvvj3/lVC63F4iYYT+qynQ=; b=Lf39o91fNXB3aCVcQkDrqdBk8XJMM06BrSR2uaDb111ekDPWWqDm6LKaiOREZu0nn3 3KyEm1aZuDPMaZn3Fsr0b8PpCxr6ei/FlyGG/NmJ3mpXyeBppDEsUQi+nURbiRUnK4gU Tvm3gj6D15ZxBV6m1Rt+Y07b9unyP6BCog3Z2iaApkKIeSgVL3DjTm8Q0a7moIhp48n6 aOrHd7iQQS7tf3ZN2Y6hlFORQoHgobUuKd0KRMorA+IimXHX+yPGvBCJQ2p/7agh+fct mbRLCVGXtHE31W4G/rFObDqPVAY8ZaRpoAef9UXE/ZCkzLyxAbqQAV5IFD7h/B/xnJ31 4tdg== X-Gm-Message-State: AAQBX9cprJVDOLqAQOv3pDcpc9IucsO4pwF94YDEyVqMgAf/AIOGTWy1 X8ZBEz7MXH2Kn2hEq0ElMUDh5lLuCWBNVnJwfGPAPuA46JpLQYQZ X-Google-Smtp-Source: AKy350Y+DUCPsLVHGlilqVoQ4l787YaMdhWVBsYxMNVKyNDCqVq8rZdDHx7puORgJ8FXn22MZsbh85s24wdRRcotdRg= X-Received: by 2002:a17:906:9619:b0:932:446:b2f7 with SMTP id s25-20020a170906961900b009320446b2f7mr1483828ejx.6.1679672336311; Fri, 24 Mar 2023 08:38:56 -0700 (PDT) MIME-Version: 1.0 From: Hugo Buddelmeijer Date: Fri, 24 Mar 2023 16:38:44 +0100 Message-ID: Content-Type: multipart/alternative; boundary="000000000000f1e90105f7a7303b" Received-SPF: pass client-ip=209.85.167.43; envelope-from=blackshift@gmail.com; helo=mail-lf1-f43.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: X-Migadu-Queue-Id: AED0D179D9 X-Spam-Score: 0.26 X-Migadu-Spam-Score: 0.26 X-Migadu-Scanner: scn0.migadu.com List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: bug-guix-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-TUID: sbDv88Ez/+Lr --000000000000f1e90105f7a7303b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi all, Ricardo, Searching through the IRC channel logs on https://logs.guix.gnu.org/ will show a list of matches sorted on date in descending order, except for matches from this February or March, those are at the bottom, often beyond the 100 match limit. For example, 'vdirsyncer' results in 31 matches (at the time of writing): https://logs.guix.gnu.org/guix/search?query=3Dvdirsyncer > 2023-01-10 [15:09:09] this machine has installed emacs, emacs-guix, ... > 2023-01-10 [15:12:26] For context, =E2=80=98guix size emacs emacs-= guix ... > 2022-01-18 [04:43:24] At least, vdirsyncer builds when you simply ... > 2022-01-17 [16:29:41] Hey there :) I'm currently an Arch ... > 2020-11-30 [23:29:57] jonsger: No, I'm not using radicale. It was ... > 2020-11-30 [23:31:08] sneek: later tell jonsger: No, I'm not using ... > 2020-04-29 [09:34:26] it also came up in vdirsyncer on ... > ... > 2016-01-24 [22:45:28] I don't even think you can run vdirsyncer ..= . > 2015-12-10 [00:10:51] All that and vdirsyncer doesn't even build ... > 2015-12-09 [22:39:51] https://github.com/untitaker/vdirsyncer/ ... > 2023-02-25 [03:03:54] "#61557 - vdirsyncer fails to verify ... > 2023-02-25 [03:08:01] "vdirsyncer fails to verify ... > 2023-02-25 [03:09:41] nckx: hmmm when I searched mobile ... > 2023-02-25 [03:10:49] ok yeah, it's just not tagged or ... > 2023-02-25 [03:36:53] "vdirsyncer fails to verify ... > 2023-02-25 [03:38:16] lechner: no, against vdirsyncer > 2023-02-25 [03:46:06] "vdirsyncer fails to verify certificates" All hits from February and March of this year are at the bottom of the list, while the rest is in chronological order. (The 'vdirsyncer' example was chosen because it occurs regularly, but not too often.) The list cuts off after about 100 matches, so it is impossible to find recent matches for more popular terms.The most recent chats are usually more interesting, for example when debugging an issue that occured recently. E.g. a search for Python shows nothing beyond 2023-01-31: https://logs.guix.gnu.org/guix/search?query=3Dpython So my question is, can we improve the sort order of the IRC logs? I did a bit of investigating myself and discovered the maintenance repository with the hydra directory. There is so much to learn from that directory. However, I could not really figure out what could be the problem. My hypothesis, which is more like a wild guess: - It seems the sorting is done implicitly by xapian, which will just return the matching lines in whatever order they are inserted. - Something went wrong at the transition between January 31th and February 1th, that required manual cleanup. Evidence: there are logs with a tilde in the filename, 2023-01-31.log~ and 2023-02-01.log~. - The database was emptied and repopulated to prevent entries from early in the morning of 2023-02-01 to be counted as beyond-midnight on 2023-01-31. This put all the lines in the correct order, hence correct sorting up till then. - Subsequent lines are added with the mcron job and are therefore at the end of the database, and thus at the end of the result set (beyond the limit of 100). Side note: the ~ files cause some lines to show up three times, e.g. https://logs.guix.gnu.org/guix/search?query=3D557816d497d3e9d25901370903d51= 2d6f6991aa3 > 2023-01-31 [04:52:19] dcunit3d: here's another great config: https://github.com/jsoo1/dotfiles/blob/557816d497d3e9d25901370903d512d6f699= 1aa3/emacs/init.el > 2023-02-01.log~ [04:52:19] dcunit3d: here's another great config: https://github.com/jsoo1/dotfiles/blob/557816d497d3e9d25901370903d512d6f699= 1aa3/emacs/init.el > 2023-01-31.log~ [04:52:19] dcunit3d: here's another great config: https://github.com/jsoo1/dotfiles/blob/557816d497d3e9d25901370903d512d6f699= 1aa3/emacs/init.el Side side note: those ~ entries cannot be clicked on, because (define stamp (basename file-name ".log")) lets goggles think that the ".log" is part of the date. What I don't understand is why the matches are not sorted correctly. It seems to me that (Enquire-set-sort-by-value enq 0 #f) would sort by the value of slot 0, which seems to be the date-stamp. But I don't really have a good mental model of how xapian works or what value slots actually are. (Maybe value slots start at 1 and selecting 0 means do not use any of them?= ) I tried to compare the results of #guix with those of other channels, but it seems that the logs of most other channels are either not indexed at all, or inconsistently. For example, searching for ACTION (which is a "/me" command it seems) in #spritely shows only 11 matches spread over 5 days, while it is a very common occurrence: https://logs.guix.gnu.org/spritely/search?query=3DACTION Cheers, Hugo --000000000000f1e90105f7a7303b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi all, Ricardo,

<= div>Searching through the IRC channel logs on https://logs.guix.gnu.org/ will show a list of matches sorted= on date in descending order, except for matches from this February or Marc= h, those are at the bottom, often beyond the 100 match limit.
For example, 'vdirsyncer' results in 31 matches (at the= time of writing): https://logs.guix.gnu.org/guix/search?query=3Dvdirsyncer=

> 2023-01-10 [15:09:09] <elb> this machi= ne has installed emacs, emacs-guix, ...
> 2023-01-10 [15:12:26] <n= ckx> For context, =E2=80=98guix size emacs emacs-guix ...
> 2022-0= 1-18 [04:43:24] <lfam> At least, vdirsyncer builds when you simply ..= .
> 2022-01-17 [16:29:41] <johnhamelink> Hey there :) I'm c= urrently an Arch ...
> 2020-11-30 [23:29:57] <lfam> jonsger: No= , I'm not using radicale. It was ...
> 2020-11-30 [23:31:08] <= lfam> sneek: later tell jonsger: No, I'm not using ...
> 2020-= 04-29 [09:34:26] <efraim> it also came up in vdirsyncer on ...
>= ; ...
> 2016-01-24 [22:45:28] <lfam> I don't even th= ink you can run vdirsyncer ...
> 2015-12-10 [00:10:51] <lfam> A= ll that and vdirsyncer doesn't even build ...
> 2015-12-09 [22:39= :51] <lfam> http= s://github.com/untitaker/vdirsyncer/ ...
> 2023-02-25 [03:03:54] = <fruit-loops> "#61557 - vdirsyncer fails to verify ...
> 2= 023-02-25 [03:08:01] <fruit-loops> "vdirsyncer fails to verify .= ..
> 2023-02-25 [03:09:41] <elb> nckx: hmmm when I searched mob= ile ...
> 2023-02-25 [03:10:49] <elb> ok yeah, it's just no= t tagged or ...
> 2023-02-25 [03:36:53] <fruit-loops> "vdi= rsyncer fails to verify ...
> 2023-02-25 [03:38:16] <elb> lechn= er: no, against vdirsyncer
> 2023-02-25 [03:46:06] <fruit-loops>= ; "vdirsyncer fails to verify certificates"

<= div>All hits from February and March of this year are at the bottom of the = list, while the rest is in chronological order. (The 'vdirsyncer' = example was chosen because it occurs regularly, but not too often.) The lis= t cuts off after about 100 matches, so it is impossible to find recent matc= hes for more popular terms.The most recent chats are usually more interesti= ng, for example when debugging an issue that occured recently. E.g. a searc= h for Python shows nothing beyond 2023-01-31:

So my question is, can= we improve the sort order of the IRC logs?

I did a bit of investigating myself and = discovered the maintenance repository with the hydra directory. There is so= much to learn from that directory.

However, I= could not really figure out what could be the problem. My hypothesis, whic= h is more like a wild guess:
- It seems the sorting is done i= mplicitly by xapian, which will just return the matching lines in whatever = order they are inserted.
- Something went wrong at the transition= between January 31th and February 1th, that required manual cleanup. Evide= nce: there are logs with a tilde in the filename, 2023-01-31.log~ and 2023-= 02-01.log~.
- The database was emptied and repopulated to prevent= entries from early in the morning of 2023-02-01 to be counted as beyond-mi= dnight on 2023-01-31. This put all the lines in the correct order, hence co= rrect sorting up till then.
- Subsequent lines are add= ed with the mcron job and are therefore at the end of the database, and thu= s at the end of the result set (beyond the limit of 100).

Side side note: those ~ entries = cannot be clicked on, because (define stamp (basename file-name ".log&= quot;)) lets goggles think that the ".log" is part of the date.

What I don't understand is why t= he matches are not sorted correctly. It seems to me that (Enquire-set-sort-= by-value enq 0 #f) would sort by the value of slot 0, which seems to be the= date-stamp. But I don't really have a good mental model of how xapian = works or what value slots actually are. (Maybe value slots start at 1 and s= electing 0 means do not use any of them?)

I tried to compare the results of #guix with those of other channels= , but it seems that the logs of most other channels are either not indexed = at all, or inconsistently. For example, searching for ACTION (which is a &q= uot;/me" command it seems) in #spritely shows only 11 matches spread o= ver 5 days, while it is a very common occurrence:

Cheers,
Hugo







--000000000000f1e90105f7a7303b--