unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#71238: Installer image consistently fails to run system init due to TLS error
@ 2024-05-28  0:56 Richard Sent
  2024-05-28  2:31 ` Richard Sent
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Richard Sent @ 2024-05-28  0:56 UTC (permalink / raw)
  To: 71238

Hi Guix!

The dump was uploaded as installer-dump-2e7b5f8f. Here's the installer's
error message:

--8<---------------cut here---------------start------------->8---
May 27 22:15:49 localhost installer[708]: substitute: ^[[Kupdating substitutes from 'https://bordeaux.guix.gnu.org'...   0.0%guix substitute: error: TLS error in procedure 'write_to_session_record_port': Error in the push function. 
May 27 22:15:49 localhost installer[708]: guix system: ^[[1;31merror: ^[[0m`/gnu/store/rdjjpch9m9xv4rdhhr6sv044qd322pj8-guix-command substitute' died unexpectedly 
--8<---------------cut here---------------end--------------->8---

A Guix system installation generated via the following method
consistently fails to install due to a TLS error when updating
substitutes.

--8<---------------cut here---------------start------------->8---
gibraltar :( guix$ guix time-machine -q -- describe
  guix 00384ae
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 00384aedbc6a371aaf90ca344a446952fdd5a6b3
gibraltar :) guix$ guix time-machine -q -- system image --image-type=iso9660 -e '(@ (gnu system install) installation-os)'
guix system: warning: Consider running 'guix pull' followed by
'guix system reconfigure' to get up-to-date packages and security updates.

Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Computing Guix derivation for 'x86_64-linux'... \
/gnu/store/x9a2nflpnfpr8i3n1759hw7wg2qv5mm8-image.iso
--8<---------------cut here---------------end--------------->8---

Curiously this does NOT occur the first time substitutes are fetched. I
observed it occur three times when substitutes were fetched after Ruby
was substituted. I don't mean to say that Ruby is the problem (that'd be
crazy), just that the TLS error occurs at the same stage in the
installation every time. NOT randomly.

I've never encountered this error before on any other Guix machine on my
network. The installation machine was using a wired connection.

Possibly related: [1], [2], [3], [4]

Notably [3] claims to have a fix that was merged into Guix, but 00384ae
is after said fix was merged (and yes, $ guix describe on the installer
image does report 00384ae)

[1]: https://issues.guix.gnu.org/66786
[2]: https://issues.guix.gnu.org/48903
[3]: https://lists.gnu.org/archive/html/guix-devel/2024-03/msg00150.html
[4]: https://issues.guix.gnu.org/70244

-- 
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#71238: Installer image consistently fails to run system init due to TLS error
  2024-05-28  0:56 bug#71238: Installer image consistently fails to run system init due to TLS error Richard Sent
@ 2024-05-28  2:31 ` Richard Sent
  2024-05-28  4:44   ` Richard Sent
  2024-08-13 11:38 ` Jonathan Brielmaier via Bug reports for GNU Guix
  2024-09-04 22:01 ` Roman Riabenko via Bug reports for GNU Guix
  2 siblings, 1 reply; 8+ messages in thread
From: Richard Sent @ 2024-05-28  2:31 UTC (permalink / raw)
  To: 71238

> Curiously this does NOT occur the first time substitutes are fetched. I
> observed it occur three times when substitutes were fetched after Ruby
> was substituted. I don't mean to say that Ruby is the problem (that'd be
> crazy), just that the TLS error occurs at the same stage in the
> installation every time. NOT randomly.
>
> I've never encountered this error before on any other Guix machine on my
> network. The installation machine was using a wired connection.

I have now tried the v1.4.0 installer and get a failure at (seemingly)
the same point in the install, although the actual error is different.
See installer-dump-22e789d5.

What the heck is going on here? Those two images are wildly different
and are downloading wildly different sets of substitutes.

-- 
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#71238: Installer image consistently fails to run system init due to TLS error
  2024-05-28  2:31 ` Richard Sent
@ 2024-05-28  4:44   ` Richard Sent
  2024-05-28  5:36     ` adanskana
  2024-05-29  1:44     ` Richard Sent
  0 siblings, 2 replies; 8+ messages in thread
From: Richard Sent @ 2024-05-28  4:44 UTC (permalink / raw)
  To: 71238; +Cc: adanskana, lars.bilke, ludo, ekaitz

Richard Sent <richard@freakingpenguin.com> writes:

> What the heck is going on here? Those two images are wildly different
> and are downloading wildly different sets of substitutes.

Bad news. I connected my device to a different network with just an
ordinary consumer router and the installation succeeded (using the guix
00384aed media). Ordinary my devices are behind a opnsense router with a
/very/ lightly-customized firewall. To me, this means there are three
possibilities, none of which is particularly comforting:

1. There was a transient network issue for ~3 hours when I attempted to
install Guix ~4 times using different installation media that caused a
specific TLS handshake to fail.

2. A specific TLS handshake Guix undertakes during the installation
process fails to pass one of the built-in firewall rules shipped with
opnsense.

3. Some other odd aspect of my network messes things up for a specific
TLS handshake.

My money is on 2 given how this is a seemingly common issue on
enterprise networks [1] and the rules I have added seem irrelevant. (I'd
rather not talk openly about my firewall rules in an archived public
forum, but can discuss off-list). However, there is another comment in
that thread that says IT didn't notice any firewall blocking.

>> Sometimes, usually when I'm on an enterprise network like my
>> university's of library's wifi, the `guix substitute` process dies
>> with a "TLS error in procedure 'write_to_session_record_port': Error
>> in the push function" error message. My connection is rock-solid
>> otherwise, and sometimes it doesn't happen at all.
> I get the same error on guix pull almost always when I am on my
> enterprise network. Re-running guix pull a second time also almost
> always then runs fine. I checked with our IT: nothing suspicious on
> the network, i.e. no firewall blocking.

Running Guix pull to work around the problem is great...... unless
you're trying to install Guix via the guided installer! :) In my case it
also wasn't guix pull that was failing.

I want to emphasize that the error occured in the same phase of the
installer every time, it was not the first handshake, no other machine
has ever had this issue, and the installer was (3/4 times) on a commit
that should include the fix described in [1].

I'm happy to assist with debugging this, although I'm not some TLS
networking genius so trying to solve it outright is probably beyond me.
I'd also LOVE to hear if other people using a largely stock opnsense or
other firewall software encountered this issue, particularly with the
installation media.

At some point I'll attempt to gradually "de-enterprise" parts of my
network and see exactly when (if ever) the problem is resolved. Due to
the nature of the problem, reliably reproducing it in the future will be
a challenge.

CC'ing people involved in [1] because this is just so weird and I don't
want it to be consigned to the dustbins of history. I don't think we
heard anyone with the issue explicitly say the fix resolved or at least
mitigated the problem.

[1]: https://lists.gnu.org/archive/html/guix-devel/2024-03/msg00150.html

-- 
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#71238: Installer image consistently fails to run system init due to TLS error
  2024-05-28  4:44   ` Richard Sent
@ 2024-05-28  5:36     ` adanskana
  2024-05-29  1:44     ` Richard Sent
  1 sibling, 0 replies; 8+ messages in thread
From: adanskana @ 2024-05-28  5:36 UTC (permalink / raw)
  To: Richard Sent, 71238, ekaitz, lars.bilke, ludo

Hi Richard

On 5/28/24 4:44 AM, Richard Sent <richard@freakingpenguin.com> wrote:
> Richard Sent <richard@freakingpenguin.com> writes:
> 
> > What the heck is going on here? Those two images are wildly different
> > and are downloading wildly different sets of substitutes.
> 
> Bad news. I connected my device to a different network with just an
> ordinary consumer router and the installation succeeded (using the guix
> 00384aed media). Ordinary my devices are behind a opnsense router with a
> /very/ lightly-customized firewall. To me, this means there are three
> possibilities, none of which is particularly comforting:
> 
> 1. There was a transient network issue for ~3 hours when I attempted to
> install Guix ~4 times using different installation media that caused a
> specific TLS handshake to fail.
> 
> 2. A specific TLS handshake Guix undertakes during the installation
> process fails to pass one of the built-in firewall rules shipped with
> opnsense.
> 
> 3. Some other odd aspect of my network messes things up for a specific
> TLS handshake.
> 
> My money is on 2 given how this is a seemingly common issue on
> enterprise networks [1] and the rules I have added seem irrelevant. (I'd
> rather not talk openly about my firewall rules in an archived public
> forum, but can discuss off-list). However, there is another comment in
> that thread that says IT didn't notice any firewall blocking.
> 
> >> Sometimes, usually when I'm on an enterprise network like my
> >> university's of library's wifi, the `guix substitute` process dies
> >> with a "TLS error in procedure 'write_to_session_record_port': Error
> >> in the push function" error message. My connection is rock-solid
> >> otherwise, and sometimes it doesn't happen at all.
I was actually going to reopen this issue, as I'm still encountering this bug in the exact same scenarios. Nothing has changed at all. 
> > I get the same error on guix pull almost always when I am on my
> > enterprise network. Re-running guix pull a second time also almost
> > always then runs fine. I checked with our IT: nothing suspicious on
> > the network, i.e. no firewall blocking.
> 
> Running Guix pull to work around the problem is great...... unless
> you're trying to install Guix via the guided installer! :) In my case it
> also wasn't guix pull that was failing.
> 
> I want to emphasize that the error occured in the same phase of the
> installer every time, it was not the first handshake, no other machine
> has ever had this issue, and the installer was (3/4 times) on a commit
> that should include the fix described in [1].
> 
> I'm happy to assist with debugging this, although I'm not some TLS
> networking genius so trying to solve it outright is probably beyond me.
> I'd also LOVE to hear if other people using a largely stock opnsense or
> other firewall software encountered this issue, particularly with the
> installation media.
Same, I'm happy to assist. The test that Ludo' provided to try and reproduce the bug doesn't work as referenced in previous emails. Is there some way I can attatch a debugger to a guile process running `guix upgrade` or something like that? 
> 
> At some point I'll attempt to gradually "de-enterprise" parts of my
> network and see exactly when (if ever) the problem is resolved. Due to
> the nature of the problem, reliably reproducing it in the future will be
> a challenge.
> 
> CC'ing people involved in [1] because this is just so weird and I don't
> want it to be consigned to the dustbins of history. I don't think we
> heard anyone with the issue explicitly say the fix resolved or at least
> mitigated the problem.
Thanks for CC'ing me. Yes, the problem was never resolved. For someone just upgrading their system, it's annoying, but can be mitigated pretty easily. For someone trying to install Guix, on the other hand, this is a intensely annoying problem. After my exams are finished in a couple weeks I want to try and fix this problem and also upgrade GRUB to fix issues with it recognising ext4 partitons with certain features enabled properly. 
> 
> [1]: https://lists.gnu.org/archive/html/guix-devel/2024-03/msg00150.html
> 
> 
Anyway, please let me know how I can help. If someone could help me attaching some sort of debugger, I can reproduce the error fairly easily on my uni's wifi if I do a `guix gc -d 2w && guix upgrade && sudo guix system reconfigure config.scm`. The sheer number of substitutes downloaded seems to be enough for it to happen at least once.

Warmly, 
Ada




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#71238: Installer image consistently fails to run system init due to TLS error
  2024-05-28  4:44   ` Richard Sent
  2024-05-28  5:36     ` adanskana
@ 2024-05-29  1:44     ` Richard Sent
  2024-06-10  5:33       ` adanskana
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Sent @ 2024-05-29  1:44 UTC (permalink / raw)
  To: 71238; +Cc: adanskana, lars.bilke, ludo, ekaitz

Richard Sent <richard@freakingpenguin.com> writes:

> 1. There was a transient network issue for ~3 hours when I attempted to
> install Guix ~4 times using different installation media that caused a
> specific TLS handshake to fail.
>
> 2. A specific TLS handshake Guix undertakes during the installation
> process fails to pass one of the built-in firewall rules shipped with
> opnsense.
>
> 3. Some other odd aspect of my network messes things up for a specific
> TLS handshake.
>
> My money is on 2 given how this is a seemingly common issue on
> enterprise networks [1] and the rules I have added seem irrelevant. (I'd
> rather not talk openly about my firewall rules in an archived public
> forum, but can discuss off-list). However, there is another comment in
> that thread that says IT didn't notice any firewall blocking.

I ran the 1.4.0 installer again today behind my opnsense router and it
completed successfully, which is horrifying. I was hoping starting from
a constant image would make the error reproducible but that doesn't seem
to be the case. Even with a consistent system image and network, it's
only reproducible for somewhere between a few hours and one day. Perhaps
server load plays a part?

(Technically my process was a little bit different. Instead of fully
completing the graphical installer I swapped to a TTY after activating
the wired connection, mounted the root fs, and run $ guix system build
/mnt/etc/config.scm, where config.scm was unmodified since initial
installation. I'd be stunned if this caused the change in behavior but
figured I'd mention for completeness.)

-- 
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#71238: Installer image consistently fails to run system init due to TLS error
  2024-05-29  1:44     ` Richard Sent
@ 2024-06-10  5:33       ` adanskana
  0 siblings, 0 replies; 8+ messages in thread
From: adanskana @ 2024-06-10  5:33 UTC (permalink / raw)
  To: Richard Sent, 71238, lars.bilke, ludo, ekaitz

Hi all, 

On 29/05/2024 01:44, Richard Sent <richard@freakingpenguin.com> wrote:
> Richard Sent <richard@freakingpenguin.com> writes:
> 
> > 1. There was a transient network issue for ~3 hours when I attempted to
> > install Guix ~4 times using different installation media that caused a
> > specific TLS handshake to fail.
> >
> > 2. A specific TLS handshake Guix undertakes during the installation
> > process fails to pass one of the built-in firewall rules shipped with
> > opnsense.
> >
> > 3. Some other odd aspect of my network messes things up for a specific
> > TLS handshake.
> >
> > My money is on 2 given how this is a seemingly common issue on
> > enterprise networks [1] and the rules I have added seem irrelevant. (I'd
> > rather not talk openly about my firewall rules in an archived public
> > forum, but can discuss off-list). However, there is another comment in
> > that thread that says IT didn't notice any firewall blocking.
> 
> I ran the 1.4.0 installer again today behind my opnsense router and it
> completed successfully, which is horrifying. I was hoping starting from
> a constant image would make the error reproducible but that doesn't seem
> to be the case. Even with a consistent system image and network, it's
> only reproducible for somewhere between a few hours and one day. Perhaps
> server load plays a part?
> 
> (Technically my process was a little bit different. Instead of fully
> completing the graphical installer I swapped to a TTY after activating
> the wired connection, mounted the root fs, and run $ guix system build
> /mnt/etc/config.scm, where config.scm was unmodified since initial
> installation. I'd be stunned if this caused the change in behavior but
> figured I'd mention for completeness.)
> 
> 

I've mananged to reproduce this bug. First, I run `sudo guix gc delete-generations && guix gc -d 2w` to clear my store. Then I run `guix upgrade && sudo guix system -L /home/ada/dotfiles/guix/ reconfigure --fallback /home/ada/dotfiles/guix/ada/system/kissakoira.scm` to redownload all of those deleted store items. The process 9/10 will fail halfway through the upgrade process. Then, a retry will work without a hitch. Even re-gc-ing my system will not let me reproduce the bug - I need to restart my system. Then, the likelyhood it works is 7/10 until the next day (just my perception). By the way, this is on my university's network.

I managed to capture the problem happening under strace using this command `strace -ff -tt -o log_up.strace -s 500 guix upgrade && sudo strace -ff -tt -o log_sr.strace -s 500 sudo guix system -L /home/ada/dotfiles/guix/ reconfigure --fallback /home/ada/dotfiles/guix/ada/system/kissakoira.scm`. I've uploaded the logs to my Google Drive[1]. You can use `strace-log-merge log_up.strace` to view to merged logs. 

As I can reproduce this error fairly consistently now, please let me know if you want me to run any more tools to capture more data.

Warmly,
Ada

[1] https://drive.google.com/file/d/104DVqyMLGRi4imWzvFQ6TahAiRRKdR4_/view?usp=drive_link




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#71238: Installer image consistently fails to run system init due to TLS error
  2024-05-28  0:56 bug#71238: Installer image consistently fails to run system init due to TLS error Richard Sent
  2024-05-28  2:31 ` Richard Sent
@ 2024-08-13 11:38 ` Jonathan Brielmaier via Bug reports for GNU Guix
  2024-09-04 22:01 ` Roman Riabenko via Bug reports for GNU Guix
  2 siblings, 0 replies; 8+ messages in thread
From: Jonathan Brielmaier via Bug reports for GNU Guix @ 2024-08-13 11:38 UTC (permalink / raw)
  To: 71238; +Cc: Christopher Baines

[CC'ing Chris as bordeaux expert]

I'm affected by this issue as well. On both my desktop machines with
Guix System installed for quite a while (so not in the installer).

It also happens at different ISPs:
* Telefonica O2 DSL, Germany
* NetCom BW (ASN 47297)

On one machine I removed the bordeaux server from the substitutes
servers as this is super annoying. On the other I'm in the proccess of
removing it...

~Jonathan




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#71238: Installer image consistently fails to run system init due to TLS error
  2024-05-28  0:56 bug#71238: Installer image consistently fails to run system init due to TLS error Richard Sent
  2024-05-28  2:31 ` Richard Sent
  2024-08-13 11:38 ` Jonathan Brielmaier via Bug reports for GNU Guix
@ 2024-09-04 22:01 ` Roman Riabenko via Bug reports for GNU Guix
  2 siblings, 0 replies; 8+ messages in thread
From: Roman Riabenko via Bug reports for GNU Guix @ 2024-09-04 22:01 UTC (permalink / raw)
  To: 71238

[-- Attachment #1: Type: text/plain, Size: 1645 bytes --]

Hello

I had a similar issue with a 6to4 tunnel but I guess that the same
cause may be relevant to a VPN or some other type of connection that
affects the MTU.

The pull operation would fail sporadically, usually on the same package
— most of the time that would be a relatively large package like the
kernel but not necessarily. The failure would often be a TLS failure
but I found it to be a common failure when there is a connectivity
issue because it is the TLS that tries to establish or restore the
connection.

I admit that I had similar issues with connecting to just a few
seemingly random websites over the tunnel, but the download of
substitutes from the bordeaux server seemed to be the most affected.
The tunnel customer support suggested that it is an MTU issue. I tried
a lot of different MTU values. Some of the values I tried fixed all my
connection issues except for the pull operation until I found how to
set up the tunnel in a way that automatically discovers and changes the
MTU dynamically.

I thought it was too specific to my router, so I posted my findings
elsewhere: "How can I set up 6to4 on MikroTik and configure MTU to fix
connection failures and update guix?"
https://superuser.com/q/1808662/1203531

I recently followed a link to this bug report from a help mailing list
thread:
"guix pull/guix upgrade often fails over VPN with TLS error message"
https://lists.gnu.org/archive/html/help-guix/2024-09/msg00014.html
I found the descriptions of the issue familiar, so I am sharing an idea
that it might be related to the MTU of the connection to the substitute
server.

Roman

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-09-04 22:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28  0:56 bug#71238: Installer image consistently fails to run system init due to TLS error Richard Sent
2024-05-28  2:31 ` Richard Sent
2024-05-28  4:44   ` Richard Sent
2024-05-28  5:36     ` adanskana
2024-05-29  1:44     ` Richard Sent
2024-06-10  5:33       ` adanskana
2024-08-13 11:38 ` Jonathan Brielmaier via Bug reports for GNU Guix
2024-09-04 22:01 ` Roman Riabenko via Bug reports for GNU Guix

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).