From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: The emacs_backtrace "feature" Date: Fri, 21 Sep 2012 12:49:17 +0300 Message-ID: <83lig3yaci.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1348220968 1146 80.91.229.3 (21 Sep 2012 09:49:28 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 21 Sep 2012 09:49:28 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 21 11:49:33 2012 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TEzrX-00055K-U9 for ged-emacs-devel@m.gmane.org; Fri, 21 Sep 2012 11:49:32 +0200 Original-Received: from localhost ([::1]:58656 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEzrT-0001HQ-HW for ged-emacs-devel@m.gmane.org; Fri, 21 Sep 2012 05:49:27 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:49041) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEzrQ-0001GX-1I for emacs-devel@gnu.org; Fri, 21 Sep 2012 05:49:25 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TEzrO-0001zV-3L for emacs-devel@gnu.org; Fri, 21 Sep 2012 05:49:23 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:46749) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEzrN-0001z8-5y for emacs-devel@gnu.org; Fri, 21 Sep 2012 05:49:22 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MAP00F001SKFY00@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Fri, 21 Sep 2012 12:49:01 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MAP00EGG1XOTTC0@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Fri, 21 Sep 2012 12:49:01 +0300 (IDT) X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:153427 Archived-At: Based on my experience, I expect this "feature" to be hated, by users and Emacs maintainers alike. My experience is based on years of working with the DJGPP development environment. DJGPP (www.delorie.com/djgpp/) is a Posix-compliant development environment, based on ported GNU tools and an independently written standard C library, for developing 32-bit protected-mode programs that run on MS-DOS and compatible systems. In particular, the MS-DOS build of Emacs uses DJGPP. In DJGPP, displaying the backtrace on fatal errors is the default, because core files are not supported. So, when a DJGPP-compiled program crashes, it displays a register dump and a backtrace. Here's a typical example (I deliberately truncated the backtrace at the end, which was much longer in reality): Exiting due to signal SIGABRT Raised at eip=0012f2a6 eax=002ee7fc ebx=00000120 ecx=00000000 edx=00000000 esi=003a533d edi=002f4cc0 ebp=002ee8a8 esp=002ee7f8 program=H:\test\emacs-djgpp\emacs\src\temacs.exe cs: sel=0257 base=02c30000 limit=0104ffff ds: sel=025f base=02c30000 limit=0104ffff es: sel=025f base=02c30000 limit=0104ffff fs: sel=022f base=0001d580 limit=0000ffff gs: sel=027f base=00000000 limit=0010ffff ss: sel=025f base=02c30000 limit=0104ffff App stack: [002eed94..002d5d94] Exceptn stack: [002d5c68..002d3d28] Call frame traceback EIPs: 0x0012f1c4 0x0012f2a6 0x00118377 0x0011191d 0x00068cff 0x00068c41 A companion utility program captures the addresses and the executable file name from the screen, and adds the corresponding function name plus offset to each line (if the executable was not stripped), and also the source file/line information, if that info is found. Example: Call frame traceback EIPs: 0x0001039f execute_builtin+191, file c:/djgpp/gnu/bash-2.03/execute_cmd.c, line 2878 0x00010840 execute_builtin_or_function+176, file c:/djgpp/gnu/bash-2.03/execute_cmd.c, line 3173 0x0001011b execute_simple_command+659, file c:/djgpp/gnu/bash-2.03/execute_cmd.c, line 2745 0x0000de00 execute_command_internal+1876, file c:/djgpp/gnu/bash-2.03/execute_cmd.c, line 824 0x0000d459 execute_command+69, file c:/djgpp/gnu/bash-2.03/execute_cmd.c, line 314 As nice as this looks, it has several disadvantages: . Many real-life backtraces are long and quickly scroll off the screen. If you didn't make a point of setting up very large screen buffers of your shell windows, or redirect standard error to a file, you'll lose precious information. Since these precautions are only taken when one expects a crash, guess how many times these measures are in place when they are needed. . Many calls to emacs_backtrace in the current sources limit the number of backtrace frames to 10, but that is an arbitrary limitation which will be too small in most, if not all, situations. Check out the crash backtraces posted to the bug tracker. As an extreme (but quite frequent) data point, crashes in GC tend to have many hundreds, and sometimes many thousands, of frames in them. In reality, there's no way of knowing how many frames will be there, and how many of them will be needed to get enough useful information for finding the problem. I predict that more often than not we will be looking at useless backtraces, while users who reported those backtraces will rightfully expect us to find the bug and fix it. . The backtrace is written to the standard error file handle. Is that handle always guaranteed to be available and connected to a screen or a disk file that the user can find afterwards? E.g., if Emacs is invoked from an environment which redirects that handle to the null device, the information will be lost. (On MS-Windows, GUI applications launched by clicking a desktop icon have this handle closed, so anything written to it disappears without a trace; I don't know if Posix desktops have something similar.) . Last, but not least, even if the drawbacks described above are not an issue in some particular crash report, using the limited information it provides can be quite difficult, especially if the crash happened in a binary compiled by a different compiler version than yours, let alone on an architecture different from the one used by the person who tries to get some sense out of it. Here's an example of what emacs_backtrace will produce (slightly edited from what you see on http://linux.die.net/man/3/backtrace_symbols_fd): Backtrace: ./emacs(myfunc4+0x5c) [0x80487f0] ./emacs [0x8048871] ./emacs(myfunc3+0x21) [0x8048894] ./emacs(myfunc2+0x1a) [0x804888d] ./emacs(myfunc1+0x1a) [0x804888d] ./emacs(main+0x65) [0x80488fb] /lib/libc.so.6(__libc_start_main+0xdc) [0xb7e38f9c] ./emacs [0x8048711] It doesn't even show the source line info, like DJGPP did. Translating myfunc1+0x1a etc. into source-level info is not an easy task, unless you are lucky and there's only one place where it calls myfunc2. If not, you are left with guesswork. Making sense of the backtrace without being able to get at the corresponding source lines is not for the faint at heart. More often than not, the Emacs maintainers will be tempted to ignore such a report, and ask for a GDB backtrace instead. So given all of the above, I'm asking why do we want this feature? Why not use the good old core dump files? They have all the information that is needed for debugging the crash, while the above falls short of that mark by a large measure. It seems like a step backward. I always thought that the lack of core files in DJGPP was a serious limitations, so I'm amazed to see modern environments actually _wanting_ that limited debug feature in favor of core dumps and real debuggability. Until now, the only uses I saw for the 'backtrace' function were when a debugger couldn't be used at all, or the core file couldn't be produced due to system-level requirements, such as limited disk space or some stringent time constraints. But here we do that voluntarily and by default. Why? Having said all that, I'm not really interested in disputing these points. I wanted to communicate my own, mostly negative, experience of many years using a similar feature. If more information is required, in particular about DJGPP and how it created and used the backtraces, I will gladly provide answers to any questions. Otherwise, I guess we will find soon enough whether this is a great feature or not.