From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Marusich Subject: Using GDB to debug Guix-installed software (e.g. virsh) Date: Mon, 16 Jul 2018 23:35:53 -0700 Message-ID: <87in5ez7w6.fsf@gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:50834) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ffJad-0005nK-So for help-guix@gnu.org; Tue, 17 Jul 2018 02:36:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ffJab-0000s9-Ta for help-guix@gnu.org; Tue, 17 Jul 2018 02:36:03 -0400 Received: from mail-pg1-x535.google.com ([2607:f8b0:4864:20::535]:42448) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ffJab-0000rr-JR for help-guix@gnu.org; Tue, 17 Jul 2018 02:36:01 -0400 Received: by mail-pg1-x535.google.com with SMTP id y4-v6so8486429pgp.9 for ; Mon, 16 Jul 2018 23:36:01 -0700 (PDT) Received: from garuda.local ([2601:602:9d02:4725:6495:ba21:1ebe:620a]) by smtp.gmail.com with ESMTPSA id t3-v6sm395612pfk.161.2018.07.16.23.35.57 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 16 Jul 2018 23:35:58 -0700 (PDT) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-guix-bounces+gcggh-help-guix=m.gmane.org@gnu.org Sender: "Help-Guix" To: help-guix@gnu.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Hi, I sometimes want to debug Guix-installed software using GDB. Unfortunately, I've only been successful with trivial programs like GNU Hello. All of my attempts to debug actual problems have failed because I can't seem to get GDB to behave. It's a bit frustrating to bang my head on stuff like this by myself, so I'm hoping somebody with more experience can offer some advice. Let's start with what might be a real bug. I've noticed that Guix's virsh command (from the libvirt package) emits a suspicious error when you try to list devices: =2D-8<---------------cut here---------------start------------->8--- $ virsh nodedev-list error: Failed to count node devices error: this function is not supported by the connection driver: virNodeNumO= fDevices =2D-8<---------------cut here---------------end--------------->8--- Apparently, because this function is "not supported", it is also not possible to use virt-manager to assign PCI devices to a libvirt domain. That's what I was trying to do when I stumbled across this issue. Anyway, this virsh problem occurs even when I invoke the command as root, so it probably isn't a permissions issue. I searched the Internet for errors like this, but I didn't find anything helpful. Every guide I've read so far seems to suggest that this invocation should just work. But it doesn't. Why? If you want to try reproducing this issue on GuixSD, make sure you have a libvirt-service-type service and a virtlog-service-type service in your operating system configuration declaration: =2D-8<---------------cut here---------------start------------->8--- (service libvirt-service-type (libvirt-configuration (unix-sock-group "libvirt"))) (service virtlog-service-type) =2D-8<---------------cut here---------------end--------------->8--- For good measure, make sure your user is in the "libvirt" group, too: =2D-8<---------------cut here---------------start------------->8--- (user-account (name "marusich") (comment "Chris Marusich") (group "users") (supplementary-groups '("wheel" "netdev" "video" "libvirt")) (home-directory "/home/marusich")) =2D-8<---------------cut here---------------end--------------->8--- Reconfigure and restart if necessary. Then run virsh: =2D-8<---------------cut here---------------start------------->8--- $ virsh nodedev-list error: Failed to count node devices error: this function is not supported by the connection driver: virNodeNumO= fDevices =2D-8<---------------cut here---------------end--------------->8--- At this point, there are two possibilities: either everything is fine, and this error is expected, or something is wrong. If somebody knows that this is expected, I'd love to hear about it. However, let's operate on the assumption that something is wrong. How might we debug it? One way to debug it is to use GDB to investigate precisely why this failure occurred. There are probably other ways to debug the issue, but I want to focus on using GDB because this email is more about the problems I've had with GDB than the virsh issue. To begin, I create a directory where I'll do my debugging: =2D-8<---------------cut here---------------start------------->8--- $ mkdir ~/debug $ cd ~/debug =2D-8<---------------cut here---------------end--------------->8--- Let's get the virsh source so we can get GDB to tell us where we are in the code as we debug it: =2D-8<---------------cut here---------------start------------->8--- $ tar -xf $(guix build -S libvirt) =2D-8<---------------cut here---------------end--------------->8--- For me, this unpacks the source to: /home/marusich/debug/libvirt-4.3.0 Note that the function virNodeNumOfDevices is defined in /home/marusich/debug/libvirt-4.3.0/libvirt-4.3.0/src/libvirt-nodedev.c and called on line 254 of /home/marusich/debug/libvirt-4.3.0/tools/virsh-nodedev.c in the virshNodeDeviceListCollect function. I'd like to debug the code for virNodeNumOfDevices using GDB to see what's going on. To do this, I'm going to need the debug symbols, but the libvirt package doesn't have a debug output. Let's define a version of it that does. I put the following package definition into the file /home/marusich/debug/my-libvirt.scm: =2D-8<---------------cut here---------------start------------->8--- (define-module (my-libvirt) #:use-module (guix packages) #:use-module (gnu packages virtualization)) (define-public my-libvirt (package (inherit libvirt) (name "my-libvirt") (outputs '("out" "debug")))) =2D-8<---------------cut here---------------end--------------->8--- Let's build it and install both outputs into a new profile: =2D-8<---------------cut here---------------start------------->8--- $ GUIX_PACKAGE_PATH=3D/home/marusich/debug guix package -p /home/marusich/d= ebug/profile -i my-libvirt my-libvirt:debug =2D-8<---------------cut here---------------end--------------->8--- Let's make sure the new virsh still reports the same error: =2D-8<---------------cut here---------------start------------->8--- $ /home/marusich/debug/profile/bin/virsh nodedev-list error: Failed to count node devices error: this function is not supported by the connection driver: virNodeNumO= fDevices =2D-8<---------------cut here---------------end--------------->8--- Great! Let's debug it with GDB. First, make sure your ~/.gdbinit doesn't exist, otherwise your results might be different from mine. Then let's start GDB: =2D-8<---------------cut here---------------start------------->8--- $ gdb =2D-8<---------------cut here---------------end--------------->8--- Tell it where the debug files live: =2D-8<---------------cut here---------------start------------->8--- (gdb) set debug-file-directory /home/marusich/debug/profile/lib/debug =2D-8<---------------cut here---------------end--------------->8--- Tell it where the source lives: =2D-8<---------------cut here---------------start------------->8--- (gdb) directory /home/marusich/debug/libvirt-4.3.0/src Source directories searched: /home/marusich/debug/libvirt-4.3.0/src:$cdir:$= cwd (gdb) directory /home/marusich/debug/libvirt-4.3.0/tools Source directories searched: /home/marusich/debug/libvirt-4.3.0/tools:/home= /marusich/debug/libvirt-4.3.0/src:$cdir:$cwd =2D-8<---------------cut here---------------end--------------->8--- Tell it to use the file and read the symbols: =2D-8<---------------cut here---------------start------------->8--- (gdb) file /home/marusich/debug/profile/bin/virsh Reading symbols from /home/marusich/debug/profile/bin/virsh...Reading symbo= ls from /home/marusich/debug/profile/lib/debug//gnu/store/mx3rmbpg6lhl0yxl9= djbx49nfps9lwqi-my-libvirt-4.3.0/bin/virsh.debug...done. done. =2D-8<---------------cut here---------------end--------------->8--- Set the program's arguments: =2D-8<---------------cut here---------------start------------->8--- (gdb) set args nodedev-list =2D-8<---------------cut here---------------end--------------->8--- Set a breakpoint on the function virNodeNumOfDevices: =2D-8<---------------cut here---------------start------------->8--- (gdb) break virNodeNumOfDevices Breakpoint 1 at 0x28610 =2D-8<---------------cut here---------------end--------------->8--- Uh oh. This is our first sign of a problem: The breakpoint is associated with some sort of memory address, rather than a location in a file. Anyway, let's run the program: =2D-8<---------------cut here---------------start------------->8--- (gdb) run Starting program: /gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.= 3.0/bin/virsh nodedev-list warning: the debug information found in "/home/marusich/debug/profile/lib/d= ebug//gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/lib/libvi= rt.so.0.4003.0.debug" does not match "/gnu/store/mx3rmbpg6lhl0yxl9djbx49nfp= s9lwqi-my-libvirt-4.3.0/lib/libvirt.so.0" (CRC mismatch). warning: the debug information found in "/home/marusich/debug/profile/lib/d= ebug//gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/lib/libvi= rt.so.0.4003.0.debug" does not match "/gnu/store/mx3rmbpg6lhl0yxl9djbx49nfp= s9lwqi-my-libvirt-4.3.0/lib/libvirt.so.0" (CRC mismatch). [Thread debugging using libthread_db enabled] Using host libthread_db library "/gnu/store/l4lr0f5cjd0nbsaaf8b5dmcw1a1yypr= 3-glibc-2.27/lib/libthread_db.so.1". [New Thread 0x7ffff2219700 (LWP 16097)] Thread 1 "virsh" hit Breakpoint 1, 0x00007ffff768cdc0 in virNodeNumOfDevice= s () from /gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/lib/li= bvirt.so.0 =2D-8<---------------cut here---------------end--------------->8--- We hit the breakpoint - great! However, it seems GDB did not load the debug information for libvirt because of a CRC mismatch. Indeed, the backtrace seems to suggest that GDB knows about some of the source files, but not all of them: =2D-8<---------------cut here---------------start------------->8--- (gdb) bt #0 0x00007ffff768cdc0 in virNodeNumOfDevicesw () from /gnu/store/mx3rmbpg6lhl0yxl9djbx49nfps9lwqi-my-libvirt-4.3.0/lib/li= bvirt.so.0 #1 0x00005555555a816e in virshNodeDeviceListCollect (flags=3D0,=20 ncapnames=3D, capnames=3D0x0, ctl=3D0x7fffffffb460) at virsh-nodedev.c:254 #2 cmdNodeListDevices (ctl=3D0x7fffffffb460, cmd=3D) at virsh-nodedev.c:472 #3 0x00005555555b8911 in vshCommandRun (ctl=3D0x7fffffffb460,=20 cmd=3D0x55555583d850) at vsh.c:1318 #4 0x000055555557ea65 in main (argc=3D2, argv=3D0x7fffffffb7f8) at virsh.c= :932 =2D-8<---------------cut here---------------end--------------->8--- I wanted to see what was happening in the virNodeNumOfDevices function, which came from libvirt.so.0. Unfortunately, that's the library with the CRC mismatch. This means I'm totally blocked from investigating any further using GDB. I could set step-mode to "on" to step through the machine code without debug symbols, but as they say: "That is an exercise left to the reader." I have seen this CRC mismatch problem twice now when trying to debug issues with Guix-installed software. The other time was while attempting to debug a segfault in vinagre: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D30591 What is wrong? Am I using GDB wrong? Is there a bug in the part of the gnu-build-system that creates the debug files which might be causing the CRC mismatch? I'm aware of the fact that the gnu-build-system takes advantage of the .gnu-debuglink stuff ((gdb) Separate Debug Files), but to be honest I haven't done a lot of GDB debugging, so part of me wonders if this is just a case of "user error". If so, please help me understand what I'm doing wrong. Thank you, =2D-=20 Chris --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEy/WXVcvn5+/vGD+x3UCaFdgiRp0FAltNjkkACgkQ3UCaFdgi Rp1R+hAAgk3RFIggFzIPIB3TCwaUU+/oU/RjZWh/H1JbUnb15JUBXPjqEjqQiBew AwzTmuWCI3bcFbesPazzfqCKAaiJ3gO91aLjyuRZn1IEMzjB3CvHe0QTWb6tYe+e cjFay+Fmr/0CDs8A2B2zHEMe12Oz23T2lU85/lW1C6R8jCEqnSTANWTUaq91txCo KlCplCgHhmYINVpRGXCP4BZGmscmWZsgsYJT7tLqdpDWsbOZjx0eNLvRrLMEdpNH ysQy/Zlm0/mUAD7sF+PeWARZnMryQbnMWC3OpD67eoiHGGPIl06c9aQwYXzzpuH7 H5iJu3agj+EYoegRUj5jjDWk7jNlq/ZaWfHKojMkXAoJpvG9sT12hDSV0x4z41Mf uKNZO0F9q+lqBjzIYS7mD8RaJG4FxXH6Pf8d62ybNqyZSgb6INzEE3roWzV8dAe9 fyZtE3BCUYiCu1IwiVjxq7NEAA7XTEPgw6CX/Sm7O+43S9gTuP7aGbfnvepZNLw2 C2d/pW+XV9WG2bONF4v0wm0UYWaT1V/0ki/ETD75v3Wdd3Ejq7n8F8EF2D1R9YSR FdC79GNGqwbGqj1WbSiuwRPGxHpqVn+lpGXs/KZuocjDWJQlfIRDUXVk7RPr2Ft1 HFAKJrXC60qapVS76uAri2f3s2yhduVwBYQXFSW1+IMnlg4epco= =xFsG -----END PGP SIGNATURE----- --=-=-=--