From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Sheng Yang (=?UTF-8?Q?=E6=9D=A8=E5=9C=A3?=) Newsgroups: gmane.emacs.bugs Subject: bug#31995: 26.1; Condition-case failed to catch error Date: Thu, 12 Jul 2018 17:29:44 -0700 Message-ID: <6be07045-d79a-26a9-cd63-e2c294cd0187@gmail.com> References: <0af47dad-c396-7e0d-04e2-ba029a5a37d8@gmail.com> <87d0vtbfzu.fsf@gmail.com> <53dc622c-b09f-2251-0a9f-854f55a5642d@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="------------34611D2D2442FB65E4ED3C9C" X-Trace: blaine.gmane.org 1531441687 18526 195.159.176.226 (13 Jul 2018 00:28:07 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 13 Jul 2018 00:28:07 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 Cc: Paul Eggert , 31995@debbugs.gnu.org To: Noam Postavsky Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Jul 13 02:28:03 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fdlwI-0004jo-Rt for geb-bug-gnu-emacs@m.gmane.org; Fri, 13 Jul 2018 02:28:03 +0200 Original-Received: from localhost ([::1]:34609 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fdlyP-0002Lk-Sx for geb-bug-gnu-emacs@m.gmane.org; Thu, 12 Jul 2018 20:30:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50378) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fdlyI-0002LP-8y for bug-gnu-emacs@gnu.org; Thu, 12 Jul 2018 20:30:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fdlyE-0000xl-AK for bug-gnu-emacs@gnu.org; Thu, 12 Jul 2018 20:30:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:49648) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fdlyE-0000xF-55 for bug-gnu-emacs@gnu.org; Thu, 12 Jul 2018 20:30:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1fdlyD-0004cI-SU for bug-gnu-emacs@gnu.org; Thu, 12 Jul 2018 20:30:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Sheng Yang (=?UTF-8?Q?=E6=9D=A8=E5=9C=A3?=) Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 13 Jul 2018 00:30:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 31995 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: wontfix Original-Received: via spool by 31995-submit@debbugs.gnu.org id=B31995.153144179517711 (code B ref 31995); Fri, 13 Jul 2018 00:30:01 +0000 Original-Received: (at 31995) by debbugs.gnu.org; 13 Jul 2018 00:29:55 +0000 Original-Received: from localhost ([127.0.0.1]:57545 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fdly7-0004bb-Cw for submit@debbugs.gnu.org; Thu, 12 Jul 2018 20:29:55 -0400 Original-Received: from mail-pl0-f47.google.com ([209.85.160.47]:41978) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fdly5-0004bJ-FO for 31995@debbugs.gnu.org; Thu, 12 Jul 2018 20:29:54 -0400 Original-Received: by mail-pl0-f47.google.com with SMTP id w8-v6so11385844ply.8 for <31995@debbugs.gnu.org>; Thu, 12 Jul 2018 17:29:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:references:openpgp:autocrypt:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=AHYvLs1uirPUDIFOQ80FyXNdhQvVSnwEtnITVF0+IPs=; b=qS7qjhC2riXBzI5hFOG65d1PbLFzJAdFCTdrNQpHKO2Yp5gF2AOyTMN2JYQhtx/Y4E Cts/MPxyN4sS26Tt7Ef7r9ilI7giCJ5FP79WrcRFfHpDq8fkgzMG7RT6VFCxxvAcaFUN JspzHfy9zteSAOrqHkrHPOGjp7QBOKhfclWPLzs2F1iDrNxceEq7bY8w9UaqnupEcbKz L6rz/VeOuOlb/5UjXL3VGefaiv2fgCBJou2zuw4mr1mI3/81mdw3TAHEwEu3qHAAaUh2 hQb+QPlZq4zSH/IhC2Qnicl5Z/9slQle1fmmvpWWzY5zaooTrMkco6AgrEbTbofL/F2Q KU1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:openpgp:autocrypt :message-id:date:user-agent:mime-version:in-reply-to :content-language; bh=AHYvLs1uirPUDIFOQ80FyXNdhQvVSnwEtnITVF0+IPs=; b=bWwocb/DZVQpm6X0Uixy2271V9+JNpvWwNzUqTTId1Qx/d3kewuX0edWZxsaVrckOo yK4+WgI25icw/LI16YSfSncPIKbpZD8Y4Z2UTu87Nm9Qd5bMXviNN1dJk35b00gO2i/M YwzmzSkSliaUpQQM6C313zq1zqDTl9AtVQH+Pta8k5swfmy2N5czokYL66i4YQDLyjeW L0i9/bngwctwXa8PVdxVB9dU00JvH9bXkBY8lVWdZZbCJXc25OTyiI8wCAxmNAU2KLDm lcZnMwI1jPjcUkkTWXQoVSLe/VsearchgW504bBG5DWbm7SoqQi7jwN05JzenHU2EwT1 U6Xg== X-Gm-Message-State: AOUpUlHZwMlGW0ZS4Id4xTX87YY9eqap7eatv6dHogtat+SZRqazkI9b P3SLdSQNdOFETyLkJoSRkRg= X-Google-Smtp-Source: AAOMgpfstLXpjryxLZ+baummy1A+qKCVD+KkT8SHqCtW3RTyvzBqymdZKFUjlHnFtLX2oQZBXVquMQ== X-Received: by 2002:a17:902:d711:: with SMTP id w17-v6mr4075434ply.200.1531441787661; Thu, 12 Jul 2018 17:29:47 -0700 (PDT) Original-Received: from [10.31.1.34] (ip-17-36-244-173.west.us.northamericancoax.com. [173.244.36.17]) by smtp.googlemail.com with ESMTPSA id r71-v6sm59241318pfg.43.2018.07.12.17.29.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Jul 2018 17:29:46 -0700 (PDT) Openpgp: preference=signencrypt Autocrypt: addr=yangsheng6810@gmail.com; prefer-encrypt=mutual; keydata= xsFNBFdsSCsBEADfdtrzA6KOe4bZ60N/W6U+6/J+sJjmEYcas77GPSIZY+G/WufPPWUxVOgd GjEs02wlBCSxzhu5QB/oba9iLPAwQQBDGz98wQqcQd6zmbrj5a9aVVdlVXwuKD6UlyQZqQ9s LTX/rYfFXvSS5LNX11R6Xw4/cMvqsQ4x6lY7IdxerQRFbpYtRDmsyK5+40EPFX5necmd7i67 YphGf/KYAuxNUbGM+zomrD0xwBcb6IVfp8vC4X/wb8nhB5F1AE1K3f9sCDWX9TYJJCtK4pk1 JjBVWqcPNR6ccBYQFQ2xhRgrDP3Eua10Q4uszN82Xc7B9LgnXO8sVpaxMpiXDSADVQg4V3bu 4J96CmFa6WN6OinzwHGQU9CWdBgjeauVRSfweG0c3s26zJDdJSCWIuhi+P4qrQhc+CbHReQM PUKdCPNEKPGKNY/VoCxnnbehWamOe0tZQgbWM/jerLbCpaEt1nCR2grppcf4yw9/FpediYU5 1Npp1jMFB+ujXjhDY0lRfKw/SCUhrx9JFXGjI0H/9ss/GLV2t80KwXXJvd5IbL1VQ9ZGSQ5i VKEznJOGuQA88qUfGOD/0FEtOQ9GrKJv42H8nACTdT2ZK1ORbB/aXuvdp+VfYer+j0DIl9rG yMzMG5xdJPGyhSVIKywGEhX9HGeBCFMDrAevl+YEHJKFDLoBlQARAQABzSDmnajlnKMgPHlh bmdzaGVuZzY4MTBAZ21haWwuY29tPsLBfQQTAQgAJwUCV2xIKwIbIwUJCWYBgAULCQgHAgYV CAkKCwIEFgID In-Reply-To: <53dc622c-b09f-2251-0a9f-854f55a5642d@gmail.com> Content-Language: en-US X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:148478 Archived-At: This is a multi-part message in MIME format. --------------34611D2D2442FB65E4ED3C9C Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable @Paul Eggert: I am cc-ing you because you are the author of commit f0a1e9ec and may be more familiar with this topic. Please ignore my previous email, I thought condition-case WAS able to catch C stack overflow before commit f0a1e9ec, but it seems not the case, or at least not related to this bug. After some code reading and debugging, I find the problem: in commit f0a1e9ec, the read_buffer for read1 is moved from a static variable to an array stackbuf of size MAX_ALLOCA located on stack. MAX_ALLOCA is defined to be 16 * 1024. So every recursion of read1 will eat up 16KB of stack, and thousands of recursions (not uncommon for a deeply nested structure) quickly use up whole stack and cause stack overflow. One solution is to make stackbuf much smaller. I set it to 16, and this bug disappeared. Though 16 may be too aggressive, 16 * 1024 is way too big for a stack-based buffer in a function that may recur thousands of times. To make things worse, the buffer is totally a waste of space when read1 is dealing with everything ("[", "]", "(", ")", "#", "=3D", numbers= , etc.) other than the name of a symbol (usually tens of characters) or a string, which is the only case when we would need a really long buffer. A conservative choice would be a number higher than 40 or 80, making the buffer long enough to hold any symbol, as people usually do not have symbol longer than the one of half the width of a terminal. A more aggressive choice is to totally remove the buffer and only allocate it on heap. This comes at a cost of possible slow down because memory allocation on heap is usually slower than on stack. The reason why this was not the case before commit f0a1e9ec is that this buffer is reused by every recursion of read1, and is not a problem. As a reference, MAX_ALLOCA is defined in src/lisp.h for SAFE_ALLOCA, which allocate memory on stack if its size is less than MAX_ALLOCA, and allocate memory on heap otherwise. The usage for SAFE_ALLOCA and a preparation macro USE_SAFE_ALLOCA seems pretty complicated and I am not able to figure out. On 07/11/2018 10:46 PM, Sheng Yang (=E6=9D=A8=E5=9C=A3) wrote: > condition-case was able to catch C stack overflow before commit > f0a1e9ec. I understand that recovering from C stack overflow is > magical and can be tricky, but emacs is capable of this thanks to all > of your efforts. The only part missing is re-throwing this as a lisp > exception, which should not be as hard as recovering from C stack > overflow. > > Here is why this feature can be important. When we open a file, > find-file-hook will call many functions, including but not limited to > undo-tree. These functions read additional files (undo-tree, project > file, dir-local, etc.) and perform tasks. To guard against file > corruption and other problems, all reads are wrapped in some try-catch > clause. However, the trust in these try-catch clauses are let down, > and a single file corruption (or a file that can cause C stack > overflow) ruins the whole process of loading file with a mysterious > message of"Recovered from C stack overflow". I don't think this is > acceptable. > > From a lisp programmer's perspective, if exceptions should occur, they > should be caught. This is exactly the behavior that condition-case and > other try-catch clause promise. > > I am not an expert in C, debugging the C part of emacs can be painful > for me. Therefore I bisected and found the offending commits (see my > original bug report). Hope this can help you pin point the problem and > fix the bug. > > On 07/11/2018 02:48 PM, Noam Postavsky wrote: >> retitle 31995 Condition-case can't catch C stack overflow >> tags 31995 + wontfix >> quit >> >> Sheng Yang (=E6=9D=A8=E5=9C=A3) writes: >> >>> It seems that the function call ~(read (current-buffer))~ causes C st= ack >>> overflow. Though I personally believe the undo-tree file is not >>> corrupted, I assume this error should be caught by condition-case eve= n >>> if the file to read is indeed corrupted. >> The file is not corrupted, it's just that the recursion goes too deep >> during reading. However, I don't think condition-case can reasonably >> catch C stack overflow. As it is, recovering from C stack overflow at= >> all is a bit controversial, which is why we have the >> attempt-stack-overflow-recovery variable which you can set to nil in >> order to reliably segfault instead. > > --=20 > Sheng Yang(=E6=9D=A8=E5=9C=A3) > PhD student > Computer Science Department > University of Maryland, College Park > E-mail:yangsheng6810@gmail.com --=20 Sheng Yang(=E6=9D=A8=E5=9C=A3) PhD student Computer Science Department University of Maryland, College Park E-mail:yangsheng6810@gmail.com --------------34611D2D2442FB65E4ED3C9C Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
@Paul Eggert: I am cc-ing you because you are the author of commit f0a1e9ec and may be more familiar with this topic.

Please ignore my previous email, I thought condition-case WAS able to catch C stack overflow before commit f0a1e9ec, but it seems not the case, or at least not related to this bug.

After some code reading and debugging, I find the problem: in commit f0a1e9ec, the read_buffer for read1 is moved from a static variable to an array stackbuf of size MAX_ALLOCA located on stack. MAX_ALLOCA is defined to be 16 * 1024. So every recursion of read1 will eat up 16KB of stack, and thousands of recursions (not uncommon for a deeply nested structure) quickly use up whole stack and cause stack overflow.

One solution is to make stackbuf much smaller. I set it to 16, and this bug disappeared. Though 16 may be too aggressive, 16 * 1024 is way too big for a stack-based buffer in a function that may recur thousands of times. To make things worse, the buffer is totally a waste of space when read1 is dealing with everything ("[", "]", "(", ")", "#", "=", numbers, etc.) other than the name of a symbol (usually tens of characters) or a string, which is the only case when we would need a really long buffer. A conservative choice would be a number higher than 40 or 80, making the buffer long enough to hold any symbol, as people usually do not have symbol longer than the one of half the width of a terminal. A more aggressive choice is to totally remove the buffer and only allocate it on heap. This comes at a cost of possible slow down because memory allocation on heap is usually slower than on stack. The reason why this was not the case before commit f0a1e9ec is that this buffer is reused by every recursion of read1, and is not a problem.

As a reference, MAX_ALLOCA is defined in src/lisp.h for SAFE_ALLOCA, which allocate memory on stack if its size is less than MAX_ALLOCA, and allocate memory on heap otherwise. The usage for SAFE_ALLOCA and a preparation macro USE_SAFE_ALLOCA seems pretty complicated and I am not able to figure out.

On 07/11/2018 10:46 PM, Sheng Yang (杨圣) wrote:
condition-case was able to catch C stack overflow before commit f0a1e9ec. I understand that recovering from C stack overflow is magical and can be tricky, but emacs is capable of this thanks to all of your efforts. The only part missing is re-throwing this as a lisp exception, which should not be as hard as recovering from C stack overflow.

Here is why this feature can be important. When we open a file, find-file-hook will call many functions, including but not limited to undo-tree. These functions read additional files (undo-tree, project file, dir-local, etc.) and perform tasks. To guard against file corruption and other problems, all reads are wrapped in some try-catch clause. However, the trust in these try-catch clauses are let down, and a single file corruption (or a file that can cause C stack overflow) ruins the whole process of loading file with a mysterious message of"Recovered from C stack overflow". I don't think this is acceptable.

From a lisp programmer's perspective, if exceptions should occur, they should be caught. This is exactly the behavior that condition-case and other try-catch clause promise.

I am not an expert in C, debugging the C part of emacs can be painful for me. Therefore I bisected and found the offending commits (see my original bug report). Hope this can help you pin point the problem and fix the bug.

On 07/11/2018 02:48 PM, Noam Postavsky wrote:
retitle 31995 Condition-case can't catch C stack overflow
tags 31995 + wontfix
quit

Sheng Yang (杨圣) <yangsheng6810@gmail.com> writes:

It seems that the function call ~(read (current-buffer))~ causes C stack
overflow. Though I personally believe the undo-tree file is not
corrupted, I assume this error should be caught by condition-case even
if the file to read is indeed corrupted.
The file is not corrupted, it's just that the recursion goes too deep
during reading.  However, I don't think condition-case can reasonably
catch C stack overflow.  As it is, recovering from C stack overflow at
all is a bit controversial, which is why we have the
attempt-stack-overflow-recovery variable which you can set to nil in
order to reliably segfault instead.

-- 
Sheng Yang(杨圣)
PhD student
Computer Science Department
University of Maryland, College Park
E-mail:yangsheng6810@gmail.com

-- 
Sheng Yang(杨圣)
PhD student
Computer Science Department
University of Maryland, College Park
E-mail:yangsheng6810@gmail.com
--------------34611D2D2442FB65E4ED3C9C--