unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#36402: installation error
@ 2019-06-27  1:47 Juan
  2019-06-27 21:08 ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: Juan @ 2019-06-27  1:47 UTC (permalink / raw)
  To: 36402

[-- Attachment #1: Type: text/plain, Size: 3011 bytes --]

Hi!

I ran into some trouble while attempting to install Guix SD (1.0.1.x86_64). It happens when I try to do the guided graphical installation, I'll transcript the whole text here:

"
The installer has encountered an unexpected problem. The backtrace is displayed below. Please report it by email to <bug-guix@gnu.org>.

In ice-9/boot-9.scm:
    829:9 19 (catch srfi-34 #<procedure 2636000 at ./gnu/installer/steps.scm:144:7 ()> #<procedure 25db4b0 at ./gnu/installer/steps.scm:144:7 (keyc)> _)
    829:9 18 (catch srfi-34 #<procedure 2695e00 at ./gnu/installer/steps.scm:144:7 ()> #<procedure 25db460 at ./gnu/installer/steps.scm:144:7 (keyc)> _)
    829:9 17 (catch srfi-34 #<procedure 2695c00 at ./gnu/installer/steps.scm:144:7 ()> #<procedure 25db410 at ./gnu/installer/steps.scm:144:7 (keyc)> _)
    829:9 16 (catch srfi-34 #<procedure 1079dc0 at ./gnu/installer/steps.scm:144:7 ()> #<procedure 107df00 at ./gnu/installer/steps.scm:144:7 (keyc)> _)
In ./gnu/installer/steps.scm:
    182:21 15(_)
In ./gnu/installer/newt/partition.scm:
    755:33 14 (run-partitioning-page)
In ./gnu/installer/parted.scm:
   1010:14 13 (auto-partition! #<<disk> bytestructure: #<bytestructure 0x106d840>> #:scheme _)
    870:21 12 (loop _ _ _)
    863:17 11 (loop _ 2617712816 1289318400)
    771:25 10 (mkpart #<<disk> bytestructure: #<bytestructure 0x106d840>> _ #:previous-partition _)
In parted/structs.scm:
    552:19 9 (pointer->partition _)
     132:3 8 (pointer->bytestructure #<pointer 0x0> #<bytestructure-descriptor 0x29f4740>)
In unknown file:
    7 (pointer->bytevector #<pointer 0x0> 88 #<undefined> #<undefined>)
In ice-9/boot.scm:
    751:25 6 (dispatch-exception 5 null-pointer-error ("pointer->bytevector" "null pointer dereference" () ()))
In ice-9/eval.scm:
    619:8 5 (_ #(#(#<directory (guile-user) b67140> #<<installer> name: newt init: #<procedure init ()> exit: #procedure exit ()> exit-error:
#<procedure exit-error (file key args> final-p...>) ...))

    619:8 4 (_ #(#(#(#<directory (guile-user) b67140> #<<installer> name: newt init: #<procedure init ()> exit: #procedure exit ()> exit-error:
#<procedure exit-error (file key args> fi...>) ...) #))
In ice-9/ports.scm:
    462:17 3 (call-with-output-file _ _ #:binary _ #:encoding _)
In ice-9/eval.scm:
    619:8 2 (_ #(#(#<directory (guile-user) b67140> null-pointer-error ("pointer->bytevector" "null pointer dereference" () ())) #<output: /tmp/last-installer-error 12>))
    159:9 1 (_ #(#(#<directory (guile-user) b67140> null-pointer-error ("pointer->bytevector" "null pointer dereference" () ())) #<output: /tmp/last-installer-error 12>))
In unknown file:
    0 (make-stack #t)
ice-9/eval.scm:159:9: In procedure pointer->bytevector: null pointer dereference
"

Here are the specifics of my computer:
ASUS MAXIMUS VI IMPACT ACPI BIOS Revision 1301
CPU: Intel Core 15-4440 CPU @ 3.19GHz

I'm not sure if it's a hardware compatibility problem, a bug in the guided graphical installation, or something else.

Thanks in advance, kind regards.

[-- Attachment #2: Type: text/html, Size: 4292 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-06-27  1:47 bug#36402: installation error Juan
@ 2019-06-27 21:08 ` Ludovic Courtès
  2019-06-29 15:47   ` Mathieu Othacehe
  2019-08-31 18:43   ` Mathieu Othacehe
  0 siblings, 2 replies; 13+ messages in thread
From: Ludovic Courtès @ 2019-06-27 21:08 UTC (permalink / raw)
  To: Juan; +Cc: Mathieu Othacehe, 36402

Hi Juan,

Juan <r5jm@protonmail.com> skribis:

> I ran into some trouble while attempting to install Guix SD (1.0.1.x86_64). It happens when I try to do the guided graphical installation, I'll transcript the whole text here:

[...]

>     755:33 14 (run-partitioning-page)
> In ./gnu/installer/parted.scm:
>    1010:14 13 (auto-partition! #<<disk> bytestructure: #<bytestructure 0x106d840>> #:scheme _)
>     870:21 12 (loop _ _ _)
>     863:17 11 (loop _ 2617712816 1289318400)
>     771:25 10 (mkpart #<<disk> bytestructure: #<bytestructure 0x106d840>> _ #:previous-partition _)
> In parted/structs.scm:
>     552:19 9 (pointer->partition _)
>      132:3 8 (pointer->bytestructure #<pointer 0x0> #<bytestructure-descriptor 0x29f4740>)
> In unknown file:
>     7 (pointer->bytevector #<pointer 0x0> 88 #<undefined> #<undefined>)
> In ice-9/boot.scm:
>     751:25 6 (dispatch-exception 5 null-pointer-error ("pointer->bytevector" "null pointer dereference" () ()))

That looks like what was reported at
<https://issues.guix.gnu.org/issue/35858>, so I’ve merged both.  Thanks
for the report, Juan!

Mathieu, in the same spirit as
<https://issues.guix.gnu.org/issue/35783>, I think we have an object
life cycle and memory management issue.

I hadn’t noticed but we’re doing manual memory management by calling
things like ‘disk-destroy’ in the installer.  That’s crash-prone and
best avoided.

The usual way to handle it in bindings is by:

  1. Adding pointer finalizers.  So for example the pointer object
     associated with a <disk> record would have a finalizer that calls
     ‘ped_disk_destroy’.

  2. Having a weak-key hash table to track object dependencies when
     needed.  So, if a <device> aggregates a <disk>, there must be an
     entry in the hash table that maps the <device> to the <disk>.  That
     way, we ensure that the <disk> object remains live as long as the
     <device> is live.

We can expose “close” functions that free OS resources such as file
descriptors, but we should not expose deallocation functions like
‘ped_disk_destroy’; instead, we let the GC call them when the objects
become unreachable.

Does that make sense?

I think we should audit and adjust Guile-Parted in that spirit.  WDYT?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-06-27 21:08 ` Ludovic Courtès
@ 2019-06-29 15:47   ` Mathieu Othacehe
  2019-07-02 13:25     ` Ludovic Courtès
  2019-08-31 18:43   ` Mathieu Othacehe
  1 sibling, 1 reply; 13+ messages in thread
From: Mathieu Othacehe @ 2019-06-29 15:47 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 36402, Juan


Hey Ludo,

> Does that make sense?
>
> I think we should audit and adjust Guile-Parted in that spirit.  WDYT?

Yes, it seems like the right thing to do. I'll try to apply those
changes to Guile-Parted next week. However, as we cannot reproduce those
null-pointer issues, we won't be sure if we fixed them for sure.

In the meantime, and completely unrelated, I created a new
wip-cross-system branch to fix most of the cross-compilation issues
preventing from cross-building a Guix System.

Thanks,

Mathieu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-06-29 15:47   ` Mathieu Othacehe
@ 2019-07-02 13:25     ` Ludovic Courtès
  0 siblings, 0 replies; 13+ messages in thread
From: Ludovic Courtès @ 2019-07-02 13:25 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 36402, Juan

Hi,

Mathieu Othacehe <m.othacehe@gmail.com> skribis:

>> Does that make sense?
>>
>> I think we should audit and adjust Guile-Parted in that spirit.  WDYT?
>
> Yes, it seems like the right thing to do. I'll try to apply those
> changes to Guile-Parted next week. However, as we cannot reproduce those
> null-pointer issues, we won't be sure if we fixed them for sure.

Perhaps we can reproduce them by adding a bunch of calls to ‘gc’ in the
code here and there.  That’s often a good way to stress-test memory
management.

> In the meantime, and completely unrelated, I created a new
> wip-cross-system branch to fix most of the cross-compilation issues
> preventing from cross-building a Guix System.

Neat!  Consider opening a new issue for this.  :-)

Thank you,
Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-06-27 21:08 ` Ludovic Courtès
  2019-06-29 15:47   ` Mathieu Othacehe
@ 2019-08-31 18:43   ` Mathieu Othacehe
  2019-09-01 19:55     ` Ludovic Courtès
  1 sibling, 1 reply; 13+ messages in thread
From: Mathieu Othacehe @ 2019-08-31 18:43 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 36402, Juan


Hey Ludo,

> Does that make sense?
>
> I think we should audit and adjust Guile-Parted in that spirit.  WDYT?

Sorry for the delay!

I followed your advice and hid all destroy related functions behind
pointer finalizers. I also added some unit tests to Guile-Parted.

I pushed everything but feel free to comment! Then, I'll make a new
release and try to adapt the installer to those changes.

Thanks,

Mathieu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-08-31 18:43   ` Mathieu Othacehe
@ 2019-09-01 19:55     ` Ludovic Courtès
  2019-09-02  9:50       ` Mathieu Othacehe
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2019-09-01 19:55 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 36402, Juan

Howdy,

Mathieu Othacehe <m.othacehe@gmail.com> skribis:

> I followed your advice and hid all destroy related functions behind
> pointer finalizers. I also added some unit tests to Guile-Parted.

Nice!

I tried “guix build guile-parted --with-branch=guile-parted=master”, but
that fails because ‘build-aux/test-driver.scm’ is missing from the repo,
I think.

It might be useful to add calls to ‘gc’ here and there in the tests to
stress-test memory management.

> I pushed everything but feel free to comment! Then, I'll make a new
> release and try to adapt the installer to those changes.

Awesome, thanks a lot!

Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-09-01 19:55     ` Ludovic Courtès
@ 2019-09-02  9:50       ` Mathieu Othacehe
  2019-09-03  9:13         ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: Mathieu Othacehe @ 2019-09-02  9:50 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 36402, Juan


Hey,

I pushed the missing file :).

> It might be useful to add calls to ‘gc’ here and there in the tests to
> stress-test memory management.

Inserting gc calls here:

--8<---------------cut here---------------start------------->8---
(test-assert "partition-remove extended"
  (with-tmp-device
   "device-extended.iso"
   (lambda (new-device)
     (let* ((device (get-device new-device))
            (disk (disk-new device))
            (partitions (disk-partitions disk))
            (extended-partition (find extended-partition? partitions)))
       (gc) ; <-- Try to destroy disk?
       (disk-remove-partition* disk extended-partition)
       (gc)
       (equal? (extended-partition-count disk) 0)))))
--8<---------------cut here---------------end--------------->8---

causes a segfault. Is it legal to call GC here? Do you have any clue on
how to investigate what the GC is doing?

Thanks,

Mathieu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-09-02  9:50       ` Mathieu Othacehe
@ 2019-09-03  9:13         ` Ludovic Courtès
  2019-09-04 12:31           ` Mathieu Othacehe
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2019-09-03  9:13 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 36402, Juan

Hello,

Mathieu Othacehe <m.othacehe@gmail.com> skribis:

>> It might be useful to add calls to ‘gc’ here and there in the tests to
>> stress-test memory management.
>
> Inserting gc calls here:
>
> (test-assert "partition-remove extended"
>   (with-tmp-device
>    "device-extended.iso"
>    (lambda (new-device)
>      (let* ((device (get-device new-device))
>             (disk (disk-new device))
>             (partitions (disk-partitions disk))
>             (extended-partition (find extended-partition? partitions)))
>        (gc) ; <-- Try to destroy disk?
>        (disk-remove-partition* disk extended-partition)
>        (gc)
>        (equal? (extended-partition-count disk) 0)))))
>
> causes a segfault. Is it legal to call GC here? Do you have any clue on
> how to investigate what the GC is doing?

GC might run at any time, so yes, it’s valid to insert calls to ‘gc’
anywhere.  So this is good, this is kind of issue we want to catch.  :-)

To investigate, I would recommend re-reading how memory management works
in Parted.  Questions such as:

  1. Can Parted free a C object (disk, partition, etc.) behind your
     back?  Is there a way to prevent it?

  2. When a Parted object aggregates another object, how’s memory
     managed?  For example, if a “disk” aggregates (refers to) a
     “partition”, who’s responsible for freeing that partition?

  3. Relatedly, if, say, a “disk” aggregates a “partition”, do you make
     sure on the Scheme side that you do not free the partition while
     the disk is still alive?

     You can make sure this doesn’t happen by using a weak-key hash
     table, as discussed before, where the key is the disk and the value
     is the list of partitions it aggregates.

If you can get a backtrace from the core dump, that might give clues.

Setting the environment variable:

  export GLIBC_TUNABLES=glibc.malloc.check=1

might tell you if it’s a double-free error or something.

You can also use Valgrind though libgc creates a lot of noise there.

Please share whatever you gather before you get depressed.  ;-)

HTH!

Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-09-03  9:13         ` Ludovic Courtès
@ 2019-09-04 12:31           ` Mathieu Othacehe
  2019-09-05  8:32             ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: Mathieu Othacehe @ 2019-09-04 12:31 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 36402, Juan


Hey Ludo,

> Please share whatever you gather before you get depressed.  ;-)

Thanks a lot for your help, it was really useful! I used valgrind a lot
to understand what was happening.

Ok so here's a little summary:

* Using ped_device_get Parted returns new devices. It may return already
existing devices if the given path is already known. Users are
responsible for their destruction calling ped_device_destroy.

* Using ped_disk_new Parted returns new disks from devices. Users are
responsible for their destruction calling ped_disk_destroy.

* A disk contains partitions. A user can remove partitions without
deleting them. It is also possible to delete them calling
ped_partition_destroy. On disk destruction, all the partitions
associated are destroyed.

Here's how memory is managed in Guile-Parted:

* ped_device_destroy is set a finalizer for device pointers.

* ped_disk_destroy is set a finalizer for disk pointers. Device object
associated to disks are recorded in a weak key hash table, to ensure
that the lifetime of a disk is shorted than that of its device.

* No finalizer is set for partition pointers. However, disk associated
to partitions are recorded in a weak key hash table, to ensure that the
lifetime of a partition is shorter that that of its disk. The user can
access ped_disk_remove_partition function from Parted but cannot acces
ped_partition_destroy function. Partition destruction is done by Parted
of disk destruction.

And here is what was going wrong:

ped_device_get and ped_device_get_next can return pointers to already
existing device object. So set-pointer-finalizer! was possibly called
multiple times on the same device pointer, resulting in calling
ped_device_destroy multiple times on the same device pointer.

To prevent that, I created a weak value hash table to make sure that one
<device> object maps to exactly one device pointer, and that the pointer
finalizer is set only once. See commit b35839b.

I also added (gc) calls at various locations in the tests in commit
728fd01.

WDYT?

Thanks,

Mathieu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-09-04 12:31           ` Mathieu Othacehe
@ 2019-09-05  8:32             ` Ludovic Courtès
  2019-09-05 13:53               ` Mathieu Othacehe
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2019-09-05  8:32 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 36402, Juan

Hello!

Mathieu Othacehe <m.othacehe@gmail.com> skribis:

> And here is what was going wrong:
>
> ped_device_get and ped_device_get_next can return pointers to already
> existing device object. So set-pointer-finalizer! was possibly called
> multiple times on the same device pointer, resulting in calling
> ped_device_destroy multiple times on the same device pointer.
>
> To prevent that, I created a weak value hash table to make sure that one
> <device> object maps to exactly one device pointer, and that the pointer
> finalizer is set only once. See commit b35839b.

Good catch!

I confirm that:

  guix build guile-parted --with-branch=guile-parted=master --check

passed several times in a row.  :-)

b35839b LGTM!

(‘define-wrapped-pointer-type’ takes care of this, but we can’t use it
while we use bytestructures (info "(guile) Void Pointers and Byte
Access").)

It seems to me that the fix should be not just for ‘pointer->device!’
but for all the ‘pointer->RECORD!’ procedures, where we potentially have
similar scenarios, and where we’d rather have:

  (eq? (pointer->X ptr) (pointer->X ptr))

So perhaps you should define your own ‘define-wrapped-type’ macro that
does ‘define-record-type’ + the weak hash table thing, and replace all
‘define-record-type’ instances in structs.scm with
‘define-wrapped-type’.  How does that sound?

Thank you!

Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-09-05  8:32             ` Ludovic Courtès
@ 2019-09-05 13:53               ` Mathieu Othacehe
  2019-09-08 19:35                 ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: Mathieu Othacehe @ 2019-09-05 13:53 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 36402, Juan


Hey,

> So perhaps you should define your own ‘define-wrapped-type’ macro that
> does ‘define-record-type’ + the weak hash table thing, and replace all
> ‘define-record-type’ instances in structs.scm with
> ‘define-wrapped-type’.  How does that sound?

Seems like the right thing to do :) However, I had a look to all Parted
functions which result is passed to a pointer->X! function, and except
ped_device_get, they always return newly allocated objects. So I guess
we are safe for now.

Thanks,

Mathieu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-09-05 13:53               ` Mathieu Othacehe
@ 2019-09-08 19:35                 ` Ludovic Courtès
  2020-03-18 17:56                   ` Mathieu Othacehe
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2019-09-08 19:35 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 36402, Juan

Hello,

Mathieu Othacehe <m.othacehe@gmail.com> skribis:

>> So perhaps you should define your own ‘define-wrapped-type’ macro that
>> does ‘define-record-type’ + the weak hash table thing, and replace all
>> ‘define-record-type’ instances in structs.scm with
>> ‘define-wrapped-type’.  How does that sound?
>
> Seems like the right thing to do :) However, I had a look to all Parted
> functions which result is passed to a pointer->X! function, and except
> ped_device_get, they always return newly allocated objects. So I guess
> we are safe for now.

OK, sounds good!

Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#36402: installation error
  2019-09-08 19:35                 ` Ludovic Courtès
@ 2020-03-18 17:56                   ` Mathieu Othacehe
  0 siblings, 0 replies; 13+ messages in thread
From: Mathieu Othacehe @ 2020-03-18 17:56 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 36402-done, Juan


Hello,

This has hopefully been resolved by Guile-Parted 0.0.2 update, so
closing!

Thanks,

Mathieu

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-03-18 17:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-27  1:47 bug#36402: installation error Juan
2019-06-27 21:08 ` Ludovic Courtès
2019-06-29 15:47   ` Mathieu Othacehe
2019-07-02 13:25     ` Ludovic Courtès
2019-08-31 18:43   ` Mathieu Othacehe
2019-09-01 19:55     ` Ludovic Courtès
2019-09-02  9:50       ` Mathieu Othacehe
2019-09-03  9:13         ` Ludovic Courtès
2019-09-04 12:31           ` Mathieu Othacehe
2019-09-05  8:32             ` Ludovic Courtès
2019-09-05 13:53               ` Mathieu Othacehe
2019-09-08 19:35                 ` Ludovic Courtès
2020-03-18 17:56                   ` Mathieu Othacehe

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).