bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Paul Eggert <eggert@cs.ucla.edu>
To: Eli Zaretskii <eliz@gnu.org>
Cc: johnw@gnu.org, 24206@debbugs.gnu.org
Subject: bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault
Date: Wed, 17 Aug 2016 10:41:52 -0700	[thread overview]
Message-ID: <11031c1e-c784-0ba2-4b6c-4fab0cb92354@cs.ucla.edu> (raw)
In-Reply-To: <8360qzfmz3.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 5924 bytes --]

Eli Zaretskii wrote:

>> The changes were motivated by bug fixes, not style.
>
> That's not what I see.  E.g., this hunk simply replaces valid code by
> an equivalently valid code:
>
>   -	  if (multibyte)
>   -	    {
>   -	      int len;
>   -
>   -	      STRING_CHAR_AND_LENGTH (strp, len);
>   -	      if (len == 1)
>   -		*bufp = *strp;
>   -	      else
>   -		memcpy (bufp, strp, len);
>   -	      strp += len;
>   -	      bufp += len;
>   -	      nchars++;
>   -	    }
>   -	  else
>   -	    *bufp++ = *strp++, nchars++;
>   +	  /* Fall through to copy one char.  */

Some change in this area was needed because the 'multibyte' flag went away. 
While doing that, I noticed that discarding all the code made this 
somewhat-tricky area easier to follow. It's not merely that the old multibyte 
code is unnecessarily long and hard to follow; it's that the old code does 
something fairly-typical (copy a multibyte character) in an unusual way, which 
is too likely to lead the reader into incorrectly thinking that there is 
something actually unusual about the action. Misleading code like this really 
cries out to be rewritten, particularly if the rewriting simply ionvolves 
deleting it.

In short, the main motivation here was clarity, not merely style.

(I hope I don't have to go into such details to defend every code change I 
install! I'm finding it difficult-enough now to find time to improve Emacs.)

> Same here:
>
>   -      else if (strp[0] == '\\' && strp[1] == '[')
>   +      else if (strp[0] == '\\' && strp[1] == '['
>   +	       && (close_bracket
>   +		   = memchr (strp + 2, ']',
>   +			     SDATA (str) + strbytes - (strp + 2))))
> 	  {
>   -	  ptrdiff_t start_idx;
> 	    bool follow_remap = 1;
>
>   -	  strp += 2;		/* skip \[ */
>   -	  start = strp;
>   -	  start_idx = start - SDATA (string);
>   -
>   -	  while ((strp - SDATA (string)
>   -		  < SBYTES (string))
>   -		 && *strp != ']')
>   -	    strp++;
>   -	  length_byte = strp - start;
>   -
>   -	  strp++;		/* skip ] */

This one is not merely a style change. The old code matched \[ even if not 
followed by ], the new code does not. This is an intended improvement. I plead 
guilty to the charge that the new code is also shorter and clearer.

> and here (which, for some reason, loses part of a comment, and IMO
> makes it half a riddle for the uninitiated):
>
>   -	  /* Note the Fwhere_is_internal can GC, so we have to take
>   -	     relocation of string contents into account.  */
>   -	  strp = SDATA (string) + idx;
>   -	  start = SDATA (string) + start_idx;
>   +	  /* Take relocation of string contents into account.  */
>   +	  strp = SDATA (str) + idx;
>   +	  start = strp - length_byte - 1;

The new comment came because I copied it from somewhere else in the interest of 
consistency. You're right, I omitted some commentary in the process. I thought 
the omitted info obvious, but evidently you think otherwise. It's obviously no 
big deal, so I brought it back by applying the attached patch to master.

> What code generated bogus null bytes?

For example, (substitute-command-keys "\\=") generated "\0".

> I'm not saying it isn't fine to make such changes, I'm urging you and
> the others to resist the temptation of doing so unless really
> necessary.  We are operating in the area of diminishing returns, and
> too many times introduce regressions into code that was working
> properly for decades.

This particular code has been buggy for decades in unusual areas. There is no 
harm in simplifying it when fixing the bugs. On the contrary, we should 
encourage bug fixes that simplify code.

> Where's the O(N**2) performance

When the buffer grew slightly, it was reallocated to be slightly bigger and the 
old data was copied to the new; this is an O(N**2) algorithm, where N is the 
final buffer size. The new approach doubles the buffer size instead (actually, 
multiplies it by 1.5, but that's good enough to bring worst-case behavior down 
to O(N)). This sort of thing is standard programming practice when growing a 
buffer whose eventual size is not yet known.

> and why does performance matter in this function anyway?

It usually doesn't, but it might in the worst case, so I figured I might as well 
fix the O(N**2) problem while I was fixing related bugs. This is a good thing to 
do in master.

> Unlike at that time, I now think
> this was a bad move, because Emacs 25.1 will have the disabled
> conversion in it, so by the time we release the code in master, it
> would be an incompatible change.

If that's the main objection, then let's change Emacs 25 to behave similarly. 
This would be a simple and conservative change to Emacs 25. But even if you 
don't want to change Emacs 25 (and thus you want to Emacs 25 to continue to be 
less-compatible with Emacs 24), it's OK to change this minor detail back to the 
way Emacs 24 does things.

> (I also don't see how it is related to the
> original bug report, which AFAIU was about (message "`foo'") that
> still behaves as in the bug report.)

Alan wanted something that he could put into his .emacs that would cause 
(message PERCENTLESS) to output the string PERCENTLESS as-is, assuming 
PERCENTLESS lacks %. This was the point of his original bug report; his original 
example involved ` and ' but he wanted the same behavior for ‘ and ’, a point 
that became clear during the discussion of Bug#23425. In Message #95 of that bug 
report I proposed the change in question, and in Message #104 you said it 
sounded good to you.

This is a contentious area, and unless there's good reason I'd rather let 
sleeping dogs lie and stick with master's current behavior here.

> (Mumbles something about Emacs maintenance being a lonely business...)

But we have all these nice conversations! :-)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-src-doc.c-Fsubstitute_command_keys-Clarify-GC-commen.patch --]
[-- Type: text/x-diff; name="0001-src-doc.c-Fsubstitute_command_keys-Clarify-GC-commen.patch", Size: 1092 bytes --]

From 70a5e67e9a072b6f22343fc0c7eed91dfdaf8025 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 17 Aug 2016 10:15:53 -0700
Subject: [PATCH] * src/doc.c (Fsubstitute_command_keys): Clarify GC comments.

---
 src/doc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/doc.c b/src/doc.c
index 4b91831..6376398 100644
--- a/src/doc.c
+++ b/src/doc.c
@@ -821,7 +821,8 @@ Otherwise, return a new string.  */)
 	      goto do_remap;
 	    }

-	  /* Take relocation of string contents into account.  */
+	  /* Fwhere_is_internal can GC, so take relocation of string
+	     contents into account.  */
 	  strp = SDATA (str) + idx;
 	  start = strp - length_byte - 1;

@@ -936,7 +937,8 @@ Otherwise, return a new string.  */)
 	    bufp += length_byte;
 	    nchars += length;

-	    /* Take relocation of string contents into account.  */
+	    /* Some of the previous code can GC, so take relocation of
+	       string contents into account.  */
 	    strp = SDATA (str) + idx;

 	    continue;
-- 
2.5.5

next prev parent reply	other threads:[~2016-08-17 17:41 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11 18:55 bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault Phil
2016-08-11 20:05 ` Eli Zaretskii
2016-08-11 23:51   ` Philipp Stephani
2016-08-13  8:32     ` Eli Zaretskii
2016-08-13 12:25       ` Nicolas Petton
2016-08-14  6:33         ` John Wiegley
2016-08-14  4:54 ` Paul Eggert
2016-08-14 14:27   ` Eli Zaretskii
2016-08-14 14:51     ` Paul Eggert
2016-08-14 17:18       ` Eli Zaretskii
2016-08-15  2:04         ` Paul Eggert
2016-08-15 16:09           ` Eli Zaretskii
2016-08-15 16:46             ` Andreas Schwab
2016-08-15 18:43               ` Paul Eggert
2016-08-15 19:04                 ` Eli Zaretskii
2016-08-15 18:51               ` Eli Zaretskii
2016-08-15 19:05                 ` John Wiegley
2016-08-15 20:41                 ` Paul Eggert
2016-08-16 14:38                   ` Eli Zaretskii
2016-08-16 15:25                     ` John Wiegley
2016-08-16 16:09                       ` Nicolas Petton
2016-08-18 16:30                       ` Nicolas Petton
2016-08-18 16:41                         ` John Wiegley
2016-08-18 17:35                           ` Eli Zaretskii
2016-08-16 17:37                     ` Paul Eggert
2016-08-16 17:45                       ` John Wiegley
2016-08-16 17:55                         ` Paul Eggert
2016-08-16 17:57                           ` John Wiegley
2016-08-16 18:44                           ` Dmitry Gutov
2016-08-16 18:31                       ` Eli Zaretskii
2016-08-16 14:52                   ` Eli Zaretskii
2016-08-16 21:07                     ` Paul Eggert
2016-08-17 15:12                       ` Eli Zaretskii
2016-08-17 17:41                         ` Paul Eggert [this message]
2016-08-17 18:06                           ` Eli Zaretskii
2016-08-17 20:52                             ` Paul Eggert
2016-08-18 14:30                               ` Eli Zaretskii
2016-08-18 18:33                                 ` Paul Eggert
2016-08-18 18:58                                   ` Eli Zaretskii
2016-08-17 17:50                       ` Dmitry Gutov
2016-08-14 15:21   ` Dmitry Gutov
2016-08-15  1:53     ` Paul Eggert
2016-08-15  1:57       ` Dmitry Gutov
2016-08-15  2:05         ` Paul Eggert
2016-08-14 17:21   ` Eli Zaretskii
2016-08-14 20:16     ` Paul Eggert
2016-08-15  1:12       ` Paul Eggert

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:4b91831 dfblob:6376398 )
 OR (
bs:"* src/doc.c (Fsubstitute_command_keys): Clarify GC comments." )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11031c1e-c784-0ba2-4b6c-4fab0cb92354@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=24206@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=johnw@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.