all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: p.stephani2@gmail.com, johnw@gnu.org, nicolas@petton.fr,
	24206@debbugs.gnu.org
Subject: bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault
Date: Mon, 15 Aug 2016 19:09:40 +0300	[thread overview]
Message-ID: <83popaf1yz.fsf@gnu.org> (raw)
In-Reply-To: <b98070aa-10db-dfb9-a698-2786caf478cd@cs.ucla.edu> (message from Paul Eggert on Sun, 14 Aug 2016 19:04:42 -0700)

> Cc: p.stephani2@gmail.com, 24206@debbugs.gnu.org, johnw@gnu.org,
>  nicolas@petton.fr
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 14 Aug 2016 19:04:42 -0700
> 
> Eli Zaretskii wrote:
> > Its multibyteness is entirely in Emacs's imagination.
> 
> Sure, but Emacs should not substitute "\342\200\230" for "`". The point of 
> text-quoting-style is to substitute quotes, not byte string encodings of quotes.

I'm not sure.  We never discussed what should Emacs do when
substitute-command-keys is called on a unibyte non-ASCII string which
requires quote substitution.  Other substitutions, including those
that produce ASCII quote characters, previously would leave the
unibyte string unibyte.  But with your changes, any substitution
converts the string into multibyte:

  (multibyte-string-p (substitute-command-keys "\200\\[goto-char]"))
    => t

I think this is might be a subtle regression, because some code might
just find itself mixing multibyte and unibyte strings where previously
there were only unibyte strings.

> >> > More generally, Fsubstitute_command_keys is quite confused about unibyte
> >> > versus multibyte issues. It merges together a number of strings, and
> >> > assumes that they are all multibyte iff the original string is
> >> > multibyte, which is obviously not true in general.
> > Could you please point out the specific places where this is done?
> 
> OK, here's a contrived example. Run this code in emacs-25:
> 
> (progn
>    (setq km (make-keymap))
>    (define-key km "≠" 'global-set-key)
>    (substitute-command-keys "\200\\<km>\\[global-set-key]"))
> 
> This should return a 2-character string equal to "\200≠".

I'm not sure your expectations are correct: as the original string is
unibyte, the output of "\200≠", which is multibyte, might not be what
the users expect.  They might expect "\200\342\211\240" instead.

> But in Emacs 25 it dumps core, at least on my platform (Fedora 23
> x86-64). And in Emacs 24 on my platform it returns a malformed
> string that prints as "\242\1340" but has length 2. I suppose we
> could make Emacs 24 dump core too, though I haven't tried hard to do
> that.

The errors are easily fixed, though.  Below I show 2 patches.  The
first one should go to master (after reverting yours), and IMO is also
safe enough for emacs-25.  But if it is deemed not safe enough for the
release, the second patch is safer.  The second patch doesn't produce
"\200≠" in your test case, but neither did Emacs 24, so this is not a
regression.

Comments?  Let's decide on what to do with emacs-25 first, since that
blocks the release, and then discuss master if needed.

Thanks.

--- src/doc.c~0	2016-06-20 08:49:44.000000000 +0300
+++ src/doc.c	2016-08-15 11:24:07.894579900 +0300
@@ -738,8 +738,9 @@ Otherwise, return a new string.  */)
   unsigned char const *start;
   ptrdiff_t length, length_byte;
   Lisp_Object name;
-  bool multibyte;
+  bool multibyte, pure_ascii;
   ptrdiff_t nchars;
+  Lisp_Object orig_string = Qnil;
 
   if (NILP (string))
     return Qnil;
@@ -752,6 +753,20 @@ Otherwise, return a new string.  */)
   enum text_quoting_style quoting_style = text_quoting_style ();
 
   multibyte = STRING_MULTIBYTE (string);
+  /* Pure-ASCII unibyte input strings should produce unibyte strings
+     if substitution doesn't yield non-ASCII bytes, otherwise they
+     should produce multibyte strings.  */
+  pure_ascii = SBYTES (string) == count_size_as_multibyte (SDATA (string),
+							   SCHARS (string));
+  /* If the input string is unibyte and includes non-ASCII characters,
+     make a multibyte copy, so as to be able to return the original
+     unibyte string if no substitution eventually happens.  */
+  if (!multibyte && !pure_ascii)
+    {
+      orig_string = string;
+      string = Fstring_make_multibyte (Fcopy_sequence (string));
+      multibyte = true;
+    }
   nchars = 0;
 
   /* KEYMAP is either nil (which means search all the active keymaps)
@@ -933,8 +948,8 @@ Otherwise, return a new string.  */)
 
 	subst_string:
 	  start = SDATA (tem);
-	  length = SCHARS (tem);
 	  length_byte = SBYTES (tem);
+	  length = SCHARS (tem);
 	subst:
 	  nonquotes_changed = true;
 	subst_quote:
@@ -956,8 +971,8 @@ Otherwise, return a new string.  */)
 	       && quoting_style == CURVE_QUOTING_STYLE)
 	{
 	  start = (unsigned char const *) (strp[0] == '`' ? uLSQM : uRSQM);
-	  length = 1;
 	  length_byte = sizeof uLSQM - 1;
+	  length = 1;
 	  idx = strp - SDATA (string) + 1;
 	  goto subst_quote;
 	}
@@ -995,6 +1010,8 @@ Otherwise, return a new string.  */)
 	    }
 	}
     }
+  else if (!NILP (orig_string))
+    tem = orig_string;
   else
     tem = string;
   xfree (buf);


--- src/doc.c~0	2016-06-20 08:49:44.000000000 +0300
+++ src/doc.c	2016-08-15 11:13:15.132137200 +0300
@@ -738,7 +738,7 @@ Otherwise, return a new string.  */)
   unsigned char const *start;
   ptrdiff_t length, length_byte;
   Lisp_Object name;
-  bool multibyte;
+  bool multibyte, pure_ascii;
   ptrdiff_t nchars;
 
   if (NILP (string))
@@ -752,6 +752,11 @@ Otherwise, return a new string.  */)
   enum text_quoting_style quoting_style = text_quoting_style ();
 
   multibyte = STRING_MULTIBYTE (string);
+  /* Pure-ASCII unibyte input strings should produce unibyte strings
+     if substitution doesn't yield non-ASCII bytes, otherwise they
+     should produce multibyte strings.  */
+  pure_ascii = SBYTES (string) == count_size_as_multibyte (SDATA (string),
+							   SCHARS (string));
   nchars = 0;
 
   /* KEYMAP is either nil (which means search all the active keymaps)
@@ -933,8 +938,11 @@ Otherwise, return a new string.  */)
 
 	subst_string:
 	  start = SDATA (tem);
-	  length = SCHARS (tem);
 	  length_byte = SBYTES (tem);
+	  if (multibyte || pure_ascii)
+	    length = SCHARS (tem);
+	  else
+	    length = length_byte;
 	subst:
 	  nonquotes_changed = true;
 	subst_quote:
@@ -956,8 +964,11 @@ Otherwise, return a new string.  */)
 	       && quoting_style == CURVE_QUOTING_STYLE)
 	{
 	  start = (unsigned char const *) (strp[0] == '`' ? uLSQM : uRSQM);
-	  length = 1;
 	  length_byte = sizeof uLSQM - 1;
+	  if (multibyte || pure_ascii)
+	    length = 1;
+	  else
+	    length = length_byte;
 	  idx = strp - SDATA (string) + 1;
 	  goto subst_quote;
 	}





  reply	other threads:[~2016-08-15 16:09 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11 18:55 bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault Phil
2016-08-11 20:05 ` Eli Zaretskii
2016-08-11 23:51   ` Philipp Stephani
2016-08-13  8:32     ` Eli Zaretskii
2016-08-13 12:25       ` Nicolas Petton
2016-08-14  6:33         ` John Wiegley
2016-08-14  4:54 ` Paul Eggert
2016-08-14 14:27   ` Eli Zaretskii
2016-08-14 14:51     ` Paul Eggert
2016-08-14 17:18       ` Eli Zaretskii
2016-08-15  2:04         ` Paul Eggert
2016-08-15 16:09           ` Eli Zaretskii [this message]
2016-08-15 16:46             ` Andreas Schwab
2016-08-15 18:43               ` Paul Eggert
2016-08-15 19:04                 ` Eli Zaretskii
2016-08-15 18:51               ` Eli Zaretskii
2016-08-15 19:05                 ` John Wiegley
2016-08-15 20:41                 ` Paul Eggert
2016-08-16 14:38                   ` Eli Zaretskii
2016-08-16 15:25                     ` John Wiegley
2016-08-16 16:09                       ` Nicolas Petton
2016-08-18 16:30                       ` Nicolas Petton
2016-08-18 16:41                         ` John Wiegley
2016-08-18 17:35                           ` Eli Zaretskii
2016-08-16 17:37                     ` Paul Eggert
2016-08-16 17:45                       ` John Wiegley
2016-08-16 17:55                         ` Paul Eggert
2016-08-16 17:57                           ` John Wiegley
2016-08-16 18:44                           ` Dmitry Gutov
2016-08-16 18:31                       ` Eli Zaretskii
2016-08-16 14:52                   ` Eli Zaretskii
2016-08-16 21:07                     ` Paul Eggert
2016-08-17 15:12                       ` Eli Zaretskii
2016-08-17 17:41                         ` Paul Eggert
2016-08-17 18:06                           ` Eli Zaretskii
2016-08-17 20:52                             ` Paul Eggert
2016-08-18 14:30                               ` Eli Zaretskii
2016-08-18 18:33                                 ` Paul Eggert
2016-08-18 18:58                                   ` Eli Zaretskii
2016-08-17 17:50                       ` Dmitry Gutov
2016-08-14 15:21   ` Dmitry Gutov
2016-08-15  1:53     ` Paul Eggert
2016-08-15  1:57       ` Dmitry Gutov
2016-08-15  2:05         ` Paul Eggert
2016-08-14 17:21   ` Eli Zaretskii
2016-08-14 20:16     ` Paul Eggert
2016-08-15  1:12       ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83popaf1yz.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=24206@debbugs.gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=johnw@gnu.org \
    --cc=nicolas@petton.fr \
    --cc=p.stephani2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.