bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Paul Eggert <eggert@cs.ucla.edu>
To: Eli Zaretskii <eliz@gnu.org>
Cc: johnw@gnu.org, 24206@debbugs.gnu.org
Subject: bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault
Date: Thu, 18 Aug 2016 11:33:36 -0700	[thread overview]
Message-ID: <7296d150-48aa-0c39-9421-63fffa499d2e@cs.ucla.edu> (raw)
In-Reply-To: <83lgzudu97.fsf@gnu.org>

Eli Zaretskii wrote:

> The code that got removed was the easy and intuitive part: it dealt
> with processing single-byte strings one byte at a time.  The
> hard-to-read part of the code is still with us.  We have less 'if'
> conditionals, but that's hardly the main complication in the original
> code.

Sure, but removing unnecessary easy stuff lets the reader see the hard stuff 
more clearly.

> You are missing my point: the code on master now processes a string,
> that could be either unibyte or multibyte, using only multibyte
> methods.  With the flag in place, each kind of string would have used
> the method that's natural with it.  The way things are now, one has to
> think hard about what the code does to convince oneself it's valid.

The way things were before it was even harder, because one had to worry not only 
about processing Emacs-encoded text, one also had to worry about processing 
unibyte text containing non-ASCII bytes. The code is simpler now, because it 
needs only to process Emacs-encoded text.

The old code might have flown despite its problems, if all the input data were 
consistent (i.e., either all unibyte, or all with Emacs-encoded text). But 
inputs need not be consistent, so the old approach simply did not work.

As I take it, your principal objection to the new code is not to its internals: 
it's that substitute-command-keys can now return a multibyte string even when 
all the input data is unibyte. I don't think that's a big deal, but if this is 
the primary reason for our lengthy conversation, I can move things forward by 
changing the code so that it instead returns a unibyte string when all the input 
data are unibyte. Would that suffice?

>>     I don't see why it is tricky, we do that in Emacs in other places.
>>
>> Really? A call to STRING_CHAR_AND_LENGTH followed by a length test followed by a call to memcpy for length > 1 and a special case inline copy for length == 1? When copying multibyte data? Where else does Emacs do that?
>
> What exactly confuses you in that snippet?

Nothing confuses me in that snippet. I know what the snippet does, now that I've 
read and understood it. It is a longwinded and unnecessarily tricky way of doing 
something simple.

> The call to
> STRING_CHAR_AND_LENGTH itself? we have that in umpteen other places.
> The single-byte optimization of not calling memcpy?  That's standard
> practice in C.

I'm not talking about each individual line of code in that snippet. I am talking 
about the entire construction. A reader must look at all 14 lines to deduce what 
it does, deduce that it's unnecessarily complicated, and deduce that the 
unnecessary complication is not a sign that something unusual is going on. No 
place else in Emacs has this construction.

> If you need an example for using
> STRING_CHAR_AND_LENGTH while copying text, you can find it in
> copy_text, for example.

copy_text does something quite different. When copying multibyte text, it does a 
single memcpy for the entire string. copy_text does not call memcpy for each 
multibyte character, and nothing in copy_text is particularly close to the 
snippet in question.

> you replaced it with a
> fall-through, which is harder to follow and easier to introduce subtle
> bugs.

True, but the fall-through is clearly marked in comments. The new way is a bit 
more efficent (smaller code space, more likely to fit in cache); the old way 
duplicated character-copying code, a practice that also introduces subtle bugs 
-- in addition to temporarily mystifying the reader because the old code 
duplicates were not identical (itself a sign of cruft). With all that in mind, 
it was reasonable to switch from the old way to the new, despite the 
disadvantage you mention.

>>         Alan wanted something that he could put into his .emacs that would cause
>>         (message PERCENTLESS) to output the string PERCENTLESS as-is, assuming
>>         PERCENTLESS lacks %. This was the point of his original bug report; his original
>>         example involved ` and ' but he wanted the same behavior for ‘ and ’, a point
>>         that became clear during the discussion of Bug#23425.
>>
>>     Then why not for '..' as well?  How is that different from ‘..’?
>>
>> It's not different. Alan wanted the same behavior for '..', and he got that too.
>
> But the behavior is not the same:

I was referring to Alan's desire to treat all quotes the same (i.e., to not 
substitute for any of them), which is now supported by setting 
text-quoting-style to grave.

>   (let ((text-quoting-style 'curve))
>     (substitute-command-keys "'foo'"))
>       => ’foo’
>
> but
>
>   (let ((text-quoting-style 'grave))
>     (substitute-command-keys "‘foo’"))
>       => ‘foo’
>
> I would have expected the first example to yield 'foo'

No, substitute-command-keys works on each grave accent and apostrophe 
separately, without looking at the others. As I recall it's worked that way and 
has been documented that way, in both master and emacs-25, ever since the 
feature was installed. One could posit a "smarter" form of substitution, which 
leaves 'foo' alone but which translates `foo'. Although we considered that 
possibility during design, we rejected it because it is more complicated and has 
more problems and quirks that are a pain to document and would surprise users in 
other ways.

next prev parent reply	other threads:[~2016-08-18 18:33 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11 18:55 bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault Phil
2016-08-11 20:05 ` Eli Zaretskii
2016-08-11 23:51   ` Philipp Stephani
2016-08-13  8:32     ` Eli Zaretskii
2016-08-13 12:25       ` Nicolas Petton
2016-08-14  6:33         ` John Wiegley
2016-08-14  4:54 ` Paul Eggert
2016-08-14 14:27   ` Eli Zaretskii
2016-08-14 14:51     ` Paul Eggert
2016-08-14 17:18       ` Eli Zaretskii
2016-08-15  2:04         ` Paul Eggert
2016-08-15 16:09           ` Eli Zaretskii
2016-08-15 16:46             ` Andreas Schwab
2016-08-15 18:43               ` Paul Eggert
2016-08-15 19:04                 ` Eli Zaretskii
2016-08-15 18:51               ` Eli Zaretskii
2016-08-15 19:05                 ` John Wiegley
2016-08-15 20:41                 ` Paul Eggert
2016-08-16 14:38                   ` Eli Zaretskii
2016-08-16 15:25                     ` John Wiegley
2016-08-16 16:09                       ` Nicolas Petton
2016-08-18 16:30                       ` Nicolas Petton
2016-08-18 16:41                         ` John Wiegley
2016-08-18 17:35                           ` Eli Zaretskii
2016-08-16 17:37                     ` Paul Eggert
2016-08-16 17:45                       ` John Wiegley
2016-08-16 17:55                         ` Paul Eggert
2016-08-16 17:57                           ` John Wiegley
2016-08-16 18:44                           ` Dmitry Gutov
2016-08-16 18:31                       ` Eli Zaretskii
2016-08-16 14:52                   ` Eli Zaretskii
2016-08-16 21:07                     ` Paul Eggert
2016-08-17 15:12                       ` Eli Zaretskii
2016-08-17 17:41                         ` Paul Eggert
2016-08-17 18:06                           ` Eli Zaretskii
2016-08-17 20:52                             ` Paul Eggert
2016-08-18 14:30                               ` Eli Zaretskii
2016-08-18 18:33                                 ` Paul Eggert [this message]
2016-08-18 18:58                                   ` Eli Zaretskii
2016-08-17 17:50                       ` Dmitry Gutov
2016-08-14 15:21   ` Dmitry Gutov
2016-08-15  1:53     ` Paul Eggert
2016-08-15  1:57       ` Dmitry Gutov
2016-08-15  2:05         ` Paul Eggert
2016-08-14 17:21   ` Eli Zaretskii
2016-08-14 20:16     ` Paul Eggert
2016-08-15  1:12       ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7296d150-48aa-0c39-9421-63fffa499d2e@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=24206@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=johnw@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.