bug#8601: * 2 -> * 4 typo fix in detect_coding

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#8601: * 2 -> * 4 typo fix in detect_coding_charset
@ 2011-05-01 18:19 Paul Eggert
  2011-05-01 19:06 ` Andreas Schwab
  2011-05-06  7:29 ` bug#8601: Merged fixes for 8600, 8601, 8602, and (partially) for 8545 Paul Eggert
  0 siblings, 2 replies; 56+ messages in thread
From: Paul Eggert @ 2011-05-01 18:19 UTC (permalink / raw)
  To: 8601; +Cc: Miles Bader

By code inspection it appears that there's a typo in
detect_coding_charset: an index is multiplied by 2
when it should be multiplied by 4.  By bisecting it
appears that this typo was introduced here:

	revno: 84043 [merge]
	committer: Miles Bader <miles@gnu.org>
	timestamp: Fri 2008-02-01 16:01:31 +0000
	message:
	  Merge unicode branch

	  Revision: emacs@sv.gnu.org/emacs--devo--0--patch-1037

so I'll CC: this to Miles.

Here's a proposed patch.  I haven't tested this, as I don't
have the time to fully understand this code.  But from the
description and other uses of code_space, the "* 2" must be
wrong.

=== modified file 'src/coding.c'
--- src/coding.c	2011-04-29 19:47:29 +0000
+++ src/coding.c	2011-05-01 18:05:21 +0000
@@ -5368,8 +5368,8 @@ detect_coding_charset (struct coding_sys
 	      if (src == src_end)
 		goto too_short;
 	      ONE_MORE_BYTE (c);
-	      if (c < charset->code_space[(dim - 1 - idx) * 2]
-		  || c > charset->code_space[(dim - 1 - idx) * 2 + 1])
+	      if (c < charset->code_space[(dim - 1 - idx) * 4]
+		  || c > charset->code_space[(dim - 1 - idx) * 4 + 1])
 		break;
 	    }
 	  if (idx < dim)







^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8601: * 2 -> * 4 typo fix in detect_coding_charset
  2011-05-01 18:19 bug#8601: * 2 -> * 4 typo fix in detect_coding_charset Paul Eggert
@ 2011-05-01 19:06 ` Andreas Schwab
  2011-05-01 19:25   ` Paul Eggert
  2011-05-06  7:29 ` bug#8601: Merged fixes for 8600, 8601, 8602, and (partially) for 8545 Paul Eggert
  1 sibling, 1 reply; 56+ messages in thread
From: Andreas Schwab @ 2011-05-01 19:06 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8601, Miles Bader

Paul Eggert <eggert@cs.ucla.edu> writes:

> By code inspection it appears that there's a typo in
> detect_coding_charset: an index is multiplied by 2
> when it should be multiplied by 4.  By bisecting it
> appears that this typo was introduced here:
>
> 	revno: 84043 [merge]

Except that a merge commit is never adequate for infering blame.  See
e209e347 for the real source.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8601: * 2 -> * 4 typo fix in detect_coding_charset
  2011-05-01 19:06 ` Andreas Schwab
@ 2011-05-01 19:25   ` Paul Eggert
  0 siblings, 0 replies; 56+ messages in thread
From: Paul Eggert @ 2011-05-01 19:25 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 8601, Miles Bader

On 05/01/11 12:06, Andreas Schwab wrote:
> Except that a merge commit is never adequate for infering blame.  See
> e209e347 for the real source.

OK, thanks, I'll CC: this to Kenichi Handa, who made the
original change back on 2006-11-08 04:28:29 +0000
(revno 52413.1.1170 in the trunk).  Kenichi, can you please
take a look at
<http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8601>?
Thanks.

PS.  Sorry, I don't know what "e209e347" means; is there some
documentation for that notation?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8601: Merged fixes for 8600, 8601, 8602, and (partially) for 8545
  2011-05-01 18:19 bug#8601: * 2 -> * 4 typo fix in detect_coding_charset Paul Eggert
  2011-05-01 19:06 ` Andreas Schwab
@ 2011-05-06  7:29 ` Paul Eggert
  2020-09-14 12:37   ` bug#8545: " Lars Ingebrigtsen
  1 sibling, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-05-06  7:29 UTC (permalink / raw)
  To: 8545, 8600-done, 8601-done, 8602-done

I committed to the Emacs trunk a merge (bzr 104134) that has fixes for
the following bugs:

* Bug#8600 - The fix removes the garbage element of code_space.

* Bug#8601 - Here I assumed that the "* 2" is a typo.

* Bug#8602 - This fixes some large-int-to-float screwups in
             the Lisp reader.

* Bug#8545 - This fixes the bug where the code should have called
             va_copy, but didn't.  Also, I changed a limit so that
	     the MOST_POSITIVE_FIXNUM limit for strings applies to
	     their length, i.e., does not include the null termination
	     byte.  Stefan hasn't had time to chime in, but if this
             last change turns out to be incorrect I will back it out.

This merge doesn't entirely fix Bug#8545, so I'll leave that bug open;
the others I'll close.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: Merged fixes for 8600, 8601, 8602, and (partially) for 8545
  2011-05-06  7:29 ` bug#8601: Merged fixes for 8600, 8601, 8602, and (partially) for 8545 Paul Eggert
@ 2020-09-14 12:37   ` Lars Ingebrigtsen
  2020-09-14 18:41     ` Eli Zaretskii
  0 siblings, 1 reply; 56+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-14 12:37 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545

Paul Eggert <eggert@cs.ucla.edu> writes:

> * Bug#8545 - This fixes the bug where the code should have called
>              va_copy, but didn't.  Also, I changed a limit so that
> 	     the MOST_POSITIVE_FIXNUM limit for strings applies to
> 	     their length, i.e., does not include the null termination
> 	     byte.  Stefan hasn't had time to chime in, but if this
>              last change turns out to be incorrect I will back it out.
>
> This merge doesn't entirely fix Bug#8545, so I'll leave that bug open;
> the others I'll close.

This was a quite long thread, and had a bunch of related bugs discussed.
Skimming this thread, I'm not sure exactly what the remaining problem in
this bug report is -- the discussion kinda petered out...

This is nine years old -- is there anything more to be done here?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: Merged fixes for 8600, 8601, 8602, and (partially) for 8545
  2020-09-14 12:37   ` bug#8545: " Lars Ingebrigtsen
@ 2020-09-14 18:41     ` Eli Zaretskii
  2020-09-16  2:01       ` Paul Eggert
  0 siblings, 1 reply; 56+ messages in thread
From: Eli Zaretskii @ 2020-09-14 18:41 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: eggert, 8545-done

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Mon, 14 Sep 2020 14:37:44 +0200
> Cc: 8545@debbugs.gnu.org
> 
> This was a quite long thread, and had a bunch of related bugs discussed.
> Skimming this thread, I'm not sure exactly what the remaining problem in
> this bug report is -- the discussion kinda petered out...
> 
> This is nine years old -- is there anything more to be done here?

I don't think so, so I'm closing this bug.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: Merged fixes for 8600, 8601, 8602, and (partially) for 8545
  2020-09-14 18:41     ` Eli Zaretskii
@ 2020-09-16  2:01       ` Paul Eggert
  0 siblings, 0 replies; 56+ messages in thread
From: Paul Eggert @ 2020-09-16  2:01 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 8545

Several issues from Bug#8545 still remain but its commentary had gotten so long 
(and our collective memory so hazy :-) that they were hard to see, so I started 
a new bug report (Bug#43439) with a proposed patch that I hope clarifies and 
addresses most of the issues.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
@ 2011-04-25  5:46 Paul Eggert
  2011-04-25  9:00 ` Eli Zaretskii
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-04-25  5:46 UTC (permalink / raw)
  To: 8545

This is a followup to Bug#8435.  Eli invited me to review the recent
doprnt-related changes, so here's a quick review:

* doprnt returns size_t.  But Stefan wrote that he prefers sizes to be
  signed values, and doprnt always returns a value that can fit in
  EMACS_INT.  So shouldn't doprnt return EMACS_INT, as it did before?

* doprnt supports only a small subset of the standard printf formats,
  but this subset is not documented.  It's unclear what the subset is.
  Or it's a superset of the subset, with %S and %l?  Anyway, this
  should be documented clearly in the lead comment.

* I suggest that every feature in doprnt be a feature that is actually
  needed and used; this will simplify maintainance.

* Format strings never include embedded null bytes, so there's
  no need for doprnt to support that.

* If the format string is too long, the alloca inside doprnt will
  crash Emacs on some hosts.  I suggest removing the alloca,
  instituting a fixed size limit on format specifiers, and documenting
  that limit.  Since user code cannot ever supply one of these
  formats, that should be good enough.

* The width features of doprnt (e.g., %25s) are never used.  That part
  of the code is still buggy; please see some comments below.  I
  suggest removing it entirely; this will simplify things.  But if not:

  - doprnt mishandles format specifications such as %0.0.0d.
    It passes them off to printf, and this results in undefined
    behavior, near as I can tell.

  - doprnt uses atoi (&fmtcpy[1]), but surely this isn't right if
    there are flags such as '-'.

  - Quite possibly there are other problems in this area, but I
    didn't want to spend further time reviewing a never-used feature.

* In this code, in verror:

      used = doprnt (buffer, size, m, m + mlen, ap);

      /* Note: the -1 below is because `doprnt' returns the number of bytes     
         excluding the terminating null byte, and it always terminates with a   
         null byte, even when producing a truncated message.  */
      if (used < size - 1)
        break;

  I don't see the reason for the "- 1".  If you replace this with:

      used = doprnt (buffer, size, m, m + mlen, ap);

      if (used < size)
        break;

  the code should still work, because, when used < size, the buffer
  should be properly null-terminated.  If it isn't then there's something
  wrong with doprnt, no?

* In this code, in verror:

      else if (size < size_max - 1)
        size = size_max - 1;

  there's no need for the "- 1"s.  Just use this:

      else if (size < size_max)
        size = size_max;

* This code in verror:

      if (buffer == buf)
        buffer = (char *) xmalloc (size);
      else
        buffer = (char *) xrealloc (buffer, size);

  uses xrealloc, which is unnecessarily expensive, as it may copy the
  buffer's contents even though they are entirely garbage here.  Use
  this instead, to avoid the useless copy:

      if (buffer != buf)
        xfree (buffer);
      buffer = (char *) xmalloc (size);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-25  5:46 bug#8545: issues with recent doprnt-related changes Paul Eggert
@ 2011-04-25  9:00 ` Eli Zaretskii
  2011-04-25 13:37   ` Stefan Monnier
  2011-04-26  6:02   ` Paul Eggert
  0 siblings, 2 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-25  9:00 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545

> Date: Sun, 24 Apr 2011 22:46:33 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: Eli Zaretskii <eliz@gnu.org>
> 
> This is a followup to Bug#8435.  Eli invited me to review the recent
> doprnt-related changes, so here's a quick review:

Thanks for the review and the comments.

> * doprnt returns size_t.  But Stefan wrote that he prefers sizes to be
>   signed values, and doprnt always returns a value that can fit in
>   EMACS_INT.  So shouldn't doprnt return EMACS_INT, as it did before?

I made it return size_t because all the related variables in verror
are size_t, and I didn't want to mix signed with unsigned.  AFAIU, the
preference to use signed is for those values that come from Lisp or go
back to the Lisp level, which is not the case here.

But I will let Stefan comment on this.  Changing doprnt to return a
signed value, and making the respective changes in verror, would be
trivial, and I won't mind doing that, if that's the verdict.

> * doprnt supports only a small subset of the standard printf formats,
>   but this subset is not documented.  It's unclear what the subset is.
>   Or it's a superset of the subset, with %S and %l?  Anyway, this
>   should be documented clearly in the lead comment.

I added such a documentation.

> * I suggest that every feature in doprnt be a feature that is actually
>   needed and used; this will simplify maintainance.

I agree, but I didn't add any features, except the support for %ld,
which is surely needed for error messages that show EMACS_INT values.
All the rest was already there in the original code of doprnt.  I see
no reason to remove that code just because no error message currently
uses some of those features.  According to "bzr annotate", most of the
related code in doprnt survived largely untouched since the 1990s,
except some cleanup.  This is no guarantee of being bug-free, of
course, but it does have _some_ weight in my eyes.

> * Format strings never include embedded null bytes, so there's
>   no need for doprnt to support that.

Potentially, someone could call `error' with its first argument taken
from a Lisp string, which could include null characters.  But again,
this feature was there to begin with, and I see no particular need to
remove it.

> * If the format string is too long, the alloca inside doprnt will
>   crash Emacs on some hosts.

You are right.  I modified doprnt to use SAFE_ALLOCA instead.

>                               I suggest removing the alloca,
>   instituting a fixed size limit on format specifiers, and documenting
>   that limit.  Since user code cannot ever supply one of these
>   formats, that should be good enough.

GNU coding standards frown on arbitrary limits, so I didn't want to
take that route, what with SAFE_ALLOCA readily available and easy to
use.

> * The width features of doprnt (e.g., %25s) are never used.

Again, an old feature that I see no reasons to remove.  And, since
doprnt produces error messages meant to be displayed, I find that
this feature actually makes sense.

>   - doprnt mishandles format specifications such as %0.0.0d.
>     It passes them off to printf, and this results in undefined
>     behavior, near as I can tell.

Since both error and verror are now marked as ATTRIBUTE_FORMAT_PRINTF,
the compiler will detect such invalid formats and flag them.  If the
warning is disregarded, the result of such a format is just a somewhat
illegible message.  In any case, vsnprintf would do the same, right?

>   - doprnt uses atoi (&fmtcpy[1]), but surely this isn't right if
>     there are flags such as '-'.

Why not?  In that case, atoi will produce a negative value for
`width', which is already handled by the code.  If I'm missing
something, please point out the specific problems with that.

>   - Quite possibly there are other problems in this area, but I
>     didn't want to spend further time reviewing a never-used feature.

I did read that code.  It looked solid to me, but if you or someone
else see specific problems, please point them out.

> * In this code, in verror:
> 
>       used = doprnt (buffer, size, m, m + mlen, ap);
> 
>       /* Note: the -1 below is because `doprnt' returns the number of bytes     
>          excluding the terminating null byte, and it always terminates with a   
>          null byte, even when producing a truncated message.  */
>       if (used < size - 1)
>         break;
> 
>   I don't see the reason for the "- 1".  If you replace this with:
> 
>       used = doprnt (buffer, size, m, m + mlen, ap);
> 
>       if (used < size)
>         break;
> 
>   the code should still work, because, when used < size, the buffer
>   should be properly null-terminated.  If it isn't then there's something
>   wrong with doprnt, no?

As the comment says, doprnt always null-terminates the result, even if
the result is truncated, and it never returns a value larger than the
buffer size it was given.  (In that, it differs from vsnprintf, which
can return larger values.)  When doprnt does truncate the output
string, it returns `size - 1'; if we compare against `size', we will
happily bail out of the loop, and never try to enlarge the buffer.

I saw no reason to enhance doprnt to continue processing the format
string and the arguments once the buffer is exhausted.  So I modified
verror instead to DTRT.

> * In this code, in verror:
> 
>       else if (size < size_max - 1)
>         size = size_max - 1;
> 
>   there's no need for the "- 1"s.  Just use this:
> 
>       else if (size < size_max)
>         size = size_max;

I made that change, thanks.

The reason I originally limited to `size_max - 1' is that the games
you play with the maximum size, viz.:

  size_t size_max =
    min (MOST_POSITIVE_FIXNUM, min (INT_MAX, SIZE_MAX - 1)) + 1;

are neither clear nor commented.  E.g., why the second `min'? could
INT_MAX be ever larger than SIZE_MAX-1? if so, what does that mean in
terms of relation between `int' and `size_t' on such a platform?

I'm only familiar with a very small number of architectures, where all
these tricks are unnecessary.  When I see such code, it makes me dizzy
and unsure of what I may be missing.  So I opted for a safer way out
(since error messages as long as SIZE_MAX are only theoretically
possible), that would not risk overflowing signed values into the sign
bit.  Perhaps in the future you could comment such obscure code to
make it understandable by mere mortals such as myself.

> * This code in verror:
> 
>       if (buffer == buf)
>         buffer = (char *) xmalloc (size);
>       else
>         buffer = (char *) xrealloc (buffer, size);
> 
>   uses xrealloc, which is unnecessarily expensive, as it may copy the
>   buffer's contents even though they are entirely garbage here.  Use
>   this instead, to avoid the useless copy:
> 
>       if (buffer != buf)
>         xfree (buffer);
>       buffer = (char *) xmalloc (size);

You are right, I made that change.

I believe this takes care of all the imminent problems you discovered
in your review.  Nevertheless, I will leave this bug report open for a
while, to allow you and others to come up with more problems and
suggestions for improvements.

Thanks again for taking the time to review and comment.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-25  9:00 ` Eli Zaretskii
@ 2011-04-25 13:37   ` Stefan Monnier
  2011-04-26 20:25     ` Paul Eggert
  2011-04-26  6:02   ` Paul Eggert
  1 sibling, 1 reply; 56+ messages in thread
From: Stefan Monnier @ 2011-04-25 13:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8545, Paul Eggert

>> * doprnt returns size_t.  But Stefan wrote that he prefers sizes to be
>> signed values, and doprnt always returns a value that can fit in
>> EMACS_INT.  So shouldn't doprnt return EMACS_INT, as it did before?
> I made it return size_t because all the related variables in verror
> are size_t, and I didn't want to mix signed with unsigned.  AFAIU, the
> preference to use signed is for those values that come from Lisp or go
> back to the Lisp level, which is not the case here.

Mixing the two is what I find problematic, so if it's size_t all the
way, that's OK.


        Stefan





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-25 13:37   ` Stefan Monnier
@ 2011-04-26 20:25     ` Paul Eggert
  2011-04-27  1:14       ` Stefan Monnier
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-04-26 20:25 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 8545

On 04/25/11 06:37, Stefan Monnier wrote:
>> AFAIU, the
>> > preference to use signed is for those values that come from Lisp or go
>> > back to the Lisp level, which is not the case here.
> Mixing the two is what I find problematic, so if it's size_t all the
> way, that's OK.

Sorry, but I don't see the general principle.  Earlier, it was
thought that emacs_write should return a signed value, because there's
code like (emacs_write (...) != n) in fileio.c, where 'n' is
signed, and signed-versus-unsigned comparison is problematic.
I can certainly understand this point of view.

With doprnt returning size_t, though, we still have this problem.
In eval.c's verror we see this:

  size_t size_max =
    min (MOST_POSITIVE_FIXNUM, min (INT_MAX, SIZE_MAX - 1)) + 1;
  size_t used = ..., size = ...;
  ...
  while (1)
    {
      ...
      if (used < size - 1)
	break;
      if (size <= size_max / 2)
        size *= 2;
      else if (size < size_max)
        size = size_max;
      else
        break;  /* and leave the message truncated */
      ...
    }

Here, the code is carefully comparing a signed value
MOST_POSITIVE_FIXNUM to a possibly-different-width
unsigned value SIZE_MAX - 1, storing the result into an
unsigned variable, and using that unsigned variable.
This comparison happens to be safe, but one has to stare
at it a bit to make sure that the
unsigned-versus-signed comparison isn't bogus.  Why is
this unsigned-versus-signed comparison OK, but the one
with emacs_write problematic?

I'm not saying this to be difficult: I'm just trying to
understand the general principle here.

I thought the point of preferring signed was so that we
didn't have to worry about stuff like the above.  Also I assumed
the idea is that one should be able to compile GCC with -ftrapv
and catch overflow errors.  But if the above code is OK as-is,
then clearly I'm misunderstanding the overall goal here.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-26 20:25     ` Paul Eggert
@ 2011-04-27  1:14       ` Stefan Monnier
  0 siblings, 0 replies; 56+ messages in thread
From: Stefan Monnier @ 2011-04-27  1:14 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545

> Sorry, but I don't see the general principle.  Earlier, it was
> thought that emacs_write should return a signed value, because there's
> code like (emacs_write (...) != n) in fileio.c, where 'n' is
> signed, and signed-versus-unsigned comparison is problematic.
> I can certainly understand this point of view.

That's the point of view, yes.

> With doprnt returning size_t, though, we still have this problem.

I haven't looked at the code.  I only commented based on Eli's
description, who said that all the relevant code used size_t.
Personally, I'd use `int' for such things, since any message larger than
2GB should be a sign that something went very wrong long before.


        Stefan





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-25  9:00 ` Eli Zaretskii
  2011-04-25 13:37   ` Stefan Monnier
@ 2011-04-26  6:02   ` Paul Eggert
  2011-04-27 19:34     ` Eli Zaretskii
  1 sibling, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-04-26  6:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8545

On 04/25/11 02:00, Eli Zaretskii wrote:

>> * Format strings never include embedded null bytes, so there's
>>   no need for doprnt to support that.
> 
> Potentially, someone could call `error' with its first argument taken
> from a Lisp string, which could include null characters.  But again,
> this feature was there to begin with, and I see no particular need to
> remove it.

The feature is buggy, because the code does not check
fmt versus fmt_end every time it increases fmt; it checks
only sometimes.  Hence one can construct examples where
doprnt will overrun the format, e.g., by having '%l' at the
end of the format.

If doprnt were written to expect a null-terminated string, which
is what all its callers pass anyway, it would be simpler and
easier to maintain and would not have this problem.

"%l" is a strange case anyway, since one cannot reliably use
"%l" as an alias for "%d".  For example, the format "%dx" prints
an integer followed by an 'x', but if you try to use "%lx" instead,
it doesn't work.  At least, we should remove "%l" as a format
specifier, as it's a rightly-unused feature and it's just asking
for trouble to try to support it.  This should also fix the
format-overrun bug mentioned earlier.

>> * If the format string is too long, the alloca inside doprnt will
>>   crash Emacs on some hosts.
> 
> You are right.  I modified doprnt to use SAFE_ALLOCA instead.

There's no need for alloca or SAFE_ALLOCA or xmalloc or any
dynamic allocator.  Instead, convert any width and precision
values to integers, and use "*".  For example, if the caller
specifies this:

	"%012345.6789g", 3.14

pass this to sprintf:

	"%0*.*g", 12345, 6789, 3.14

That way, the format string itself has easily-bounded size and
the code never needs to use alloca or xmalloc or whatever.

> Since both error and verror are now marked as ATTRIBUTE_FORMAT_PRINTF,
> the compiler will detect such invalid formats and flag them.

Sure, but in that case why maintain code to implement the two
formats (%S and %l) that are flagged invalid and are never used and
should never be used?

>>   - doprnt uses atoi (&fmtcpy[1]), but surely this isn't right if
>>     there are flags such as '-'.
> 
> Why not?  In that case, atoi will produce a negative value for
> `width', which is already handled by the code.  If I'm missing
> something, please point out the specific problems with that.

I don't see how the negative value is handled correctly.
%-10s means to print a string right-justified, but the code
surely treats it as if it were %0s.  And other flags
are possible, e.g., atoi will parse "%0-3d" as if the
width were zero, but the width is 3 (the "0" is a flag).

A quick second scan found a minor bug in size parsing: the
expression "n >= SIZE_MAX / 10" should be "n > SIZE_MAX / 10".

> The reason I originally limited to `size_max - 1' is that the games
> you play with the maximum size, viz.:
> 
>   size_t size_max =
>     min (MOST_POSITIVE_FIXNUM, min (INT_MAX, SIZE_MAX - 1)) + 1;
> 
> are neither clear nor commented.  E.g., why the second `min'? could
> INT_MAX be ever larger than SIZE_MAX-1? if so, what does that mean in
> terms of relation between `int' and `size_t' on such a platform?

The C Standard allows INT_MAX to be larger than SIZE_MAX - 1, yes.
I don't know of any current targets with that property, but it didn't
hurt to be safe.  The second 'min' was needed because vsnprintf
can't create a string longer than INT_MAX bytes.  Since doprnt doesn't
have that silly limit, the above line should be changed to something
like the following (this time with a comment :-):

  /* Limit the string to sizes that both Emacs and size_t can represent.  */
  size_t size_max = min (MOST_POSITIVE_FIXNUM + 1, SIZE_MAX);

> You are right, I made that change.

Thanks, can you make a similar change inside doprint?  It
also uses xrealloc where xfree+xmalloc would be better.

One other thing, the documentation says that lower-case l
is a flag, but it's a length modifer and not a flag.  It
must be specified after the precision (if given) and
immediately before the conversion specifier character,
and it cannot be intermixed with flags like 0 and - and +.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-26  6:02   ` Paul Eggert
@ 2011-04-27 19:34     ` Eli Zaretskii
  2011-04-27 23:51       ` Paul Eggert
  0 siblings, 1 reply; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-27 19:34 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545

> Date: Mon, 25 Apr 2011 23:02:25 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 8545@debbugs.gnu.org
> 
> On 04/25/11 02:00, Eli Zaretskii wrote:
> 
> >> * Format strings never include embedded null bytes, so there's
> >>   no need for doprnt to support that.
> > 
> > Potentially, someone could call `error' with its first argument taken
> > from a Lisp string, which could include null characters.  But again,
> > this feature was there to begin with, and I see no particular need to
> > remove it.
> 
> The feature is buggy, because the code does not check
> fmt versus fmt_end every time it increases fmt; it checks
> only sometimes.

I added more checks, thanks for pointing this out.

> "%l" is a strange case anyway, since one cannot reliably use
> "%l" as an alias for "%d".  For example, the format "%dx" prints
> an integer followed by an 'x', but if you try to use "%lx" instead,
> it doesn't work.  At least, we should remove "%l" as a format
> specifier, as it's a rightly-unused feature and it's just asking
> for trouble to try to support it.

You convinced me, so I removed %l.

> >> * If the format string is too long, the alloca inside doprnt will
> >>   crash Emacs on some hosts.
> > 
> > You are right.  I modified doprnt to use SAFE_ALLOCA instead.
> 
> There's no need for alloca or SAFE_ALLOCA or xmalloc or any
> dynamic allocator.  Instead, convert any width and precision
> values to integers, and use "*".  For example, if the caller
> specifies this:
> 
> 	"%012345.6789g", 3.14
> 
> pass this to sprintf:
> 
> 	"%0*.*g", 12345, 6789, 3.14

I see no reason for such complexity, just to avoid SAFE_ALLOCA.  But
feel free to make this change, if you think it's important enough.

> >>   - doprnt uses atoi (&fmtcpy[1]), but surely this isn't right if
> >>     there are flags such as '-'.
> > 
> > Why not?  In that case, atoi will produce a negative value for
> > `width', which is already handled by the code.  If I'm missing
> > something, please point out the specific problems with that.
> 
> I don't see how the negative value is handled correctly.
> %-10s means to print a string right-justified, but the code
> surely treats it as if it were %0s.

??? %-10s means to print a string LEFT-justified, and the code handles
that in this loop (which runs after the string was copied to its
destination):

	      if (minlen < 0)
		{
		  while (minlen < - width && bufsize > 0)
		    {
		      *bufptr++ = ' ';
		      bufsize--;
		      minlen++;
		    }
		  minlen = 0;
		}

I actually tried using %-30s, and it did work correctly (as did %30s).

>                                       And other flags
> are possible, e.g., atoi will parse "%0-3d" as if the
> width were zero, but the width is 3 (the "0" is a flag).

The code doesn't call atoi for numeric arguments.  It delegates that
case to sprintf, which will handle the likes of %0-3d correctly.  And
for %s and %c the "0" flag is not supported anyway (as stated in the
comments) and GCC flags that with a warning.  So I see no problem
here.

> A quick second scan found a minor bug in size parsing: the
> expression "n >= SIZE_MAX / 10" should be "n > SIZE_MAX / 10".

When they get to messages as long as SIZE_MAX, let them sue me for
taking away one byte.  verror will reject SIZE_MAX-long messages
anyway, so I see no reason to squeeze one more byte here just to throw
it away there.

>   /* Limit the string to sizes that both Emacs and size_t can represent.  */
>   size_t size_max = min (MOST_POSITIVE_FIXNUM + 1, SIZE_MAX);

"MOST_POSITIVE_FIXNUM + 1" is too much, since MOST_POSITIVE_FIXNUM
should be able to cover the terminating null character in Emacs.  So I
used this:

   size_t size_max = min (MOST_POSITIVE_FIXNUM, SIZE_MAX);

> Thanks, can you make a similar change inside doprint?  It
> also uses xrealloc where xfree+xmalloc would be better.

Done.

> One other thing, the documentation says that lower-case l
> is a flag, but it's a length modifer and not a flag.

I fixed the doc on that account.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-27 19:34     ` Eli Zaretskii
@ 2011-04-27 23:51       ` Paul Eggert
  2011-04-28  1:32         ` Juanma Barranquero
  2011-04-28  5:50         ` Eli Zaretskii
  0 siblings, 2 replies; 56+ messages in thread
From: Paul Eggert @ 2011-04-27 23:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8545

On 04/27/11 12:34, Eli Zaretskii wrote:

> I added more checks, thanks for pointing this out.

Thanks, but I don't see the need for this newly-added check:

          if (fmt > format_end)                                                 
            fmt = format_end;                                                   

If fmt is actually greater than format_end, it's pointing past the end
of an object, so the C code is relying on undefined behavior and the
check therefore isn't portable.  But how can fmt ever be greater than
format_end here?  I suggest removing the check.

> ??? %-10s means to print a string LEFT-justified, and the code handles
> that in this loop...
> for %s and %c the "0" flag is not supported anyway (as stated in the
> comments) and GCC flags that with a warning.

Good points both; thanks.

>> A quick second scan found a minor bug in size parsing: the
>> expression "n >= SIZE_MAX / 10" should be "n > SIZE_MAX / 10".
> 
> When they get to messages as long as SIZE_MAX, let them sue me for
> taking away one byte.

It's not a question of saving space at run-time.  It's a question of
helping the reader.  The reader is left wondering: why is that ">="
there?  And why is there another test "n * 10 > SIZE_MAX - (fmt[1] -
'0')" that always returns 0, no matter what?  Code like that will
puzzle future maintainers, who may well mess it up later because they
don't know what's going on.  Also, most printf implementations will
mess up if given a width or precision greater than INT_MAX, so I
suggest not allowing widths or precisions greater than that.  In summary,
I suggest replacing this:

  if (n >= SIZE_MAX / 10
       || n * 10 > SIZE_MAX - (fmt[1] - '0'))
     error ("Format width or precision too large");

with this:

  /* Avoid int overflow, because many sprintfs seriously mess up
     with widths or precisions greater than INT_MAX.  Avoid size_t
     overflow, since our counters use size_t.  This test is slightly
     conservative, for speed and simplicity.  */
  if (n >= min (INT_MAX, SIZE_MAX) / 10)
     error ("Format width or precision too large");

> "MOST_POSITIVE_FIXNUM + 1" is too much, since MOST_POSITIVE_FIXNUM
> should be able to cover the terminating null character in Emacs.

Why?  Emacs size fields count the bytes in the string, and does not
count the terminating null byte (which is not part of the string).

In briefly reviewing the new version, I found that doprnt
doesn't support the "ll" modifier for "long long", and thus wouldn't
port to platforms that use long long for EMACS_INT.  This is easy to
fix, so I installed a fix for that.  I also found three more problems:

* doprnt invokes strlen to find the length of the format.  The
  vsnprintf code didn't need to do that: it traversed the format once.
  Surely it shouldn't be hard to change doprnt so that it traverses
  the format once rather than twice.

* Sometimes verror will incorrectly truncate a string, even when there
  is plenty of memory.  verror might call doprnt (buffer, SIZE, m, m +
  mlen, ap), and doprnt might discover that a multibyte character is
  chopped in half at the end of the output buffer, and might return
  (say) SIZE - 2.  verror will incorrectly conclude that the output
  was just fine, but it wasn't complete.

* verror might invoke doprnt two or more times, which means that
  doprnt will traverse ap twice.  This does not work in general; the C
  standard is quite clear that the behavior is undefined in this case.
  One way to fix this would be to modify doprnt so that it always
  walks through the format completely, and generates all the output
  that it is going to generate, and that it reallocates the output
  buffer as needed as it goes.  This would require an API change.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-27 23:51       ` Paul Eggert
@ 2011-04-28  1:32         ` Juanma Barranquero
  2011-04-28  3:11           ` Paul Eggert
  2011-04-28  5:02           ` Eli Zaretskii
  2011-04-28  5:50         ` Eli Zaretskii
  1 sibling, 2 replies; 56+ messages in thread
From: Juanma Barranquero @ 2011-04-28  1:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545

On Thu, Apr 28, 2011 at 01:51, Paul Eggert <eggert@cs.ucla.edu> wrote:

> If fmt is actually greater than format_end, it's pointing past the end
> of an object, so the C code is relying on undefined behavior and the
> check therefore isn't portable.

I'm no expert on the C standard, but would it be undefined behavior,
as long as the pointer has not been dereferenced? A cursory look
suggests that fmt == format_end + 1 is possible, but fmt is not
dereferenced in that case.

    Juanma

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  1:32         ` Juanma Barranquero
@ 2011-04-28  3:11           ` Paul Eggert
  2011-04-28  3:42             ` Juanma Barranquero
                               ` (2 more replies)
  2011-04-28  5:02           ` Eli Zaretskii
  1 sibling, 3 replies; 56+ messages in thread
From: Paul Eggert @ 2011-04-28  3:11 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 8545

On 04/27/11 18:32, Juanma Barranquero wrote:

> A cursory look suggests that fmt == format_end + 1 is possible

Thanks, I had missed that possibility.  (Evidently your cursory looks
are better than mine. :-)  A possible patch is below.

> would it be undefined behavior,
> as long as the pointer has not been dereferenced?

Yes.  A portable C program is not allowed to create a pointer that
doesn't point to an object, with the two exceptions of a null pointer
and a pointer to the address immediately after an object.  On
some architectures, attempting to point to random addresses can cause
exceptions or other undefined behavior.

=== modified file 'src/doprnt.c'
--- src/doprnt.c	2011-04-27 23:04:20 +0000
+++ src/doprnt.c	2011-04-28 03:00:59 +0000
@@ -194,22 +194,21 @@ doprnt (char *buffer, register size_t bu
 		     This might be a field width or a precision; e.g.
 		     %1.1000f and %1000.1f both might need 1000+ bytes.
 		     Parse the width or precision, checking for overflow.  */
-		  size_t n = *fmt - '0';
-		  while (fmt < format_end
-			 && '0' <= fmt[1] && fmt[1] <= '9')
+		  size_t n = *fmt++ - '0';
+		  while (fmt < format_end && '0' <= *fmt && *fmt <= '9')
 		    {
 		      if (n >= SIZE_MAX / 10
 			  || n * 10 > SIZE_MAX - (fmt[1] - '0'))
 			error ("Format width or precision too large");
-		      n = n * 10 + fmt[1] - '0';
-		      *string++ = *++fmt;
+		      n = n * 10 + *fmt - '0';
+		      *string++ = *fmt++;
 		    }
 
 		  if (size_bound < n)
 		    size_bound = n;
 		}
 	      else if (*fmt == '-' || *fmt == ' ' || *fmt == '.' || *fmt == '+')
-		;
+		fmt++;
 	      else if (*fmt == 'l')
 		{
 		  long_flag = 1 + (fmt + 1 < format_end && fmt[1] == 'l');
@@ -218,10 +217,7 @@ doprnt (char *buffer, register size_t bu
 		}
 	      else
 		break;
-	      fmt++;
 	    }
-	  if (fmt > format_end)
-	    fmt = format_end;
 	  *string = 0;
 
 	  /* Make the size bound large enough to handle floating point formats






^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  3:11           ` Paul Eggert
@ 2011-04-28  3:42             ` Juanma Barranquero
  2011-04-28  5:06               ` Paul Eggert
  2011-04-28  5:15             ` Eli Zaretskii
  2011-04-29 12:28             ` Richard Stallman
  2 siblings, 1 reply; 56+ messages in thread
From: Juanma Barranquero @ 2011-04-28  3:42 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545

On Thu, Apr 28, 2011 at 05:11, Paul Eggert <eggert@cs.ucla.edu> wrote:

>> would it be undefined behavior,
>> as long as the pointer has not been dereferenced?
>
> Yes.  A portable C program is not allowed to create a pointer that
> doesn't point to an object, with the two exceptions of a null pointer
> and a pointer to the address immediately after an object.

That's weird, because it would mean that every pointer variable must
be initialized (either explicitly to some value, or implicitly to the
null pointer), or else the program will have undefined behavior.

Anyway, in this case fmt == format_end + 1 would point to the address
immediately after an object, wouldn't it?

> On
> some architectures, attempting to point to random addresses can cause
> exceptions or other undefined behavior.

On dereferencing, sure. But just on assignment to the pointer variable?

    Juanma

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  3:42             ` Juanma Barranquero
@ 2011-04-28  5:06               ` Paul Eggert
  0 siblings, 0 replies; 56+ messages in thread
From: Paul Eggert @ 2011-04-28  5:06 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 8545

On 04/27/11 20:42, Juanma Barranquero wrote:

> in this case fmt == format_end + 1 would point to the address
> immediately after an object, wouldn't it?

No, format_end is already pointing after the object;
the object's size is format_end - format.  So
format_end + 1 might not be a valid pointer.

> That's weird, because it would mean that every pointer variable must
> be initialized (either explicitly to some value, or implicitly to the
> null pointer), or else the program will have undefined behavior.

No, undefined behavior occurs only when an (invalid)
pointer value is created (e.g., by casting from integer, or by
adding to another pointer variable), or copied.  It doesn't occur
merely because storage is allocated for a pointer variable.

In this respect, it's like creating an (invalid) integer value.
If you assign i = INT_MAX + 1, the resulting behavior is undefined.
It's the same if you assign p = &x + 2.  That doesn't mean
"char *p;" has undefined behavior, any more than "int i;" does.

> On dereferencing, sure. But just on assignment to the pointer variable?

Yes.  To take an extreme example, some architectures can compute
a pointer only by using a special pointer register, and the register's
contents are always checked for validity, even if you don't dereference the
pointer.  I don't know whether Emacs has been ported to these machines,
but there are also problems with pointers wrapping around even on
more-conventional architectures.

This issue is covered by one of the questions in the C FAQ; see
<http://www.c-faq.com/aryptr/non0based.html>.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  3:11           ` Paul Eggert
  2011-04-28  3:42             ` Juanma Barranquero
@ 2011-04-28  5:15             ` Eli Zaretskii
  2011-04-28  5:29               ` Paul Eggert
  2011-04-29 12:28             ` Richard Stallman
  2 siblings, 1 reply; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-28  5:15 UTC (permalink / raw)
  To: Paul Eggert; +Cc: lekktu, 8545

> Date: Wed, 27 Apr 2011 20:11:52 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: Eli Zaretskii <eliz@gnu.org>, 8545@debbugs.gnu.org
> 
> On 04/27/11 18:32, Juanma Barranquero wrote:
> 
> > A cursory look suggests that fmt == format_end + 1 is possible
> 
> Thanks, I had missed that possibility.  (Evidently your cursory looks
> are better than mine. :-)  A possible patch is below.

I strenuously object to that patch, see below.  Please don't install
it.

> > would it be undefined behavior,
> > as long as the pointer has not been dereferenced?
> 
> Yes.  A portable C program is not allowed to create a pointer that
> doesn't point to an object, with the two exceptions of a null pointer
> and a pointer to the address immediately after an object.  On
> some architectures, attempting to point to random addresses can cause
> exceptions or other undefined behavior.

As I explain in another message, we _can_ dereference this invalid
pointer.  Which is why that test was added in the first place.

> -		  size_t n = *fmt - '0';
> -		  while (fmt < format_end
> -			 && '0' <= fmt[1] && fmt[1] <= '9')
> +		  size_t n = *fmt++ - '0';
> +		  while (fmt < format_end && '0' <= *fmt && *fmt <= '9')
>  		    {
>  		      if (n >= SIZE_MAX / 10
>  			  || n * 10 > SIZE_MAX - (fmt[1] - '0'))
>  			error ("Format width or precision too large");
> -		      n = n * 10 + fmt[1] - '0';
> -		      *string++ = *++fmt;
> +		      n = n * 10 + *fmt - '0';
> +		      *string++ = *fmt++;
>  		    }
>  
>  		  if (size_bound < n)
>  		    size_bound = n;
>  		}
>  	      else if (*fmt == '-' || *fmt == ' ' || *fmt == '.' || *fmt == '+')
> -		;
> +		fmt++;
>  	      else if (*fmt == 'l')
>  		{
>  		  long_flag = 1 + (fmt + 1 < format_end && fmt[1] == 'l');
> @@ -218,10 +217,7 @@ doprnt (char *buffer, register size_t bu
>  		}
>  	      else
>  		break;
> -	      fmt++;
>  	    }
> -	  if (fmt > format_end)
> -	    fmt = format_end;

I don't see how this is a better idea.  Instead of a simple two-liner,
which could be commented if its intent isn't clear enough, and which
makes the code 100% verifiable to not dereference anything beyond
format_end, how is it better to sprinkle weird post-increments in
several places?  This totally obfuscates the intent, does NOT allow to
comment it in any reasonable way (because the increments are no longer
in a single place, and serve more than one purpose), makes the code
much harder to grasp and analyze, and makes it almost impossible to
convince ourselves that it will never get past format_end without
unduly complicated analysis.  All that for getting rid of a simple and
clearly correct line??  No, thanks!





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  5:15             ` Eli Zaretskii
@ 2011-04-28  5:29               ` Paul Eggert
  2011-04-28  6:10                 ` Eli Zaretskii
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-04-28  5:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, 8545

On 04/27/11 22:15, Eli Zaretskii wrote:
> As I explain in another message, we _can_ dereference this invalid
> pointer.

Sorry, I'm not quite following, since I'm not sure what
the "another message" refers to.

Hmm, perhaps you're talking about this pattern in the code?

        while (fmt < format_end)
	  { ... fmt++ ... }
        switch (*fmt++)

Here, the code is dereferencing *format_end,
which means it's dereferencing one past the end of the
format string that is passed to it.  This is normally
not how buffers are used in C: normally, the pointer to
the end of a buffer is intended to point "one past" the
last byte of the buffer, and is not intended to be dereferenced.

If the intent here is that one should call doprnt with
the pattern (doprnt (A, ASIZE, B, B + BSIZE - 1, AP)) then
I suggest that the point be made clearly in doprnt's comment,
as part of doprnt's API, to prevent future confusion in
this area.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  5:29               ` Paul Eggert
@ 2011-04-28  6:10                 ` Eli Zaretskii
  2011-04-28  6:42                   ` Paul Eggert
  0 siblings, 1 reply; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-28  6:10 UTC (permalink / raw)
  To: Paul Eggert; +Cc: lekktu, 8545

> Date: Wed, 27 Apr 2011 22:29:25 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: lekktu@gmail.com, 8545@debbugs.gnu.org
> 
> On 04/27/11 22:15, Eli Zaretskii wrote:
> > As I explain in another message, we _can_ dereference this invalid
> > pointer.
> 
> Sorry, I'm not quite following, since I'm not sure what
> the "another message" refers to.

If you didn't receive it, you will find it filed in the bug tracker.

> Hmm, perhaps you're talking about this pattern in the code?
> 
>         while (fmt < format_end)
> 	  { ... fmt++ ... }
>         switch (*fmt++)

Yes, the loop (which increments the pointer more than once), the
reference with postincrement in the switch statement, and the
following dereference in fmt[-1] in the call to `error'.

> Here, the code is dereferencing *format_end,
> which means it's dereferencing one past the end of the
> format string that is passed to it.

No, it can dereference *(format_end+1).

> If the intent here is that one should call doprnt with
> the pattern (doprnt (A, ASIZE, B, B + BSIZE - 1, AP)) then
> I suggest that the point be made clearly in doprnt's comment,
> as part of doprnt's API, to prevent future confusion in
> this area.

No, it should be called as B+BSIZE.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  6:10                 ` Eli Zaretskii
@ 2011-04-28  6:42                   ` Paul Eggert
  2011-04-28  7:26                     ` Eli Zaretskii
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-04-28  6:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, 8545

On 04/27/11 23:10, Eli Zaretskii wrote:

> No, it can dereference *(format_end+1).
> 
>> If the intent here is that one should call doprnt with
>> the pattern (doprnt (A, ASIZE, B, B + BSIZE - 1, AP)) then
>> I suggest that the point be made clearly in doprnt's comment,
>> as part of doprnt's API, to prevent future confusion in
>> this area.
> 
> No, it should be called as B+BSIZE.

OK, but format_end == B + BSIZE.
So if doprnt (A, ASIZE, B, B + BSIZE, AP) can dereference format_end + 1,
this means doprnt can access B[BSIZE + 1], which means that
B should point to a char array of at least BSIZE + 2 bytes.

Normally, B is a C-language string literal such as "abc%d",
and BSIZE is the length of the string, which means
there is potential trouble because normally code
should not try to read the byte that follows the null
byte at the end of the string.

I expect that the cases where doprnt actually accesses B[BSIZE + 1]
are rare, and don't currently happen in practice; still, this is a confusing
area and whatever constraints are actually placed on doprnt's caller
should be made clear in the doprnt documentation, so that others
are warned about the situation and don't make the mistake
of passing formats that could cause problems.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  6:42                   ` Paul Eggert
@ 2011-04-28  7:26                     ` Eli Zaretskii
  2011-04-28  7:54                       ` Paul Eggert
  0 siblings, 1 reply; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-28  7:26 UTC (permalink / raw)
  To: Paul Eggert; +Cc: lekktu, 8545

> Date: Wed, 27 Apr 2011 23:42:57 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: lekktu@gmail.com, 8545@debbugs.gnu.org
> 
> OK, but format_end == B + BSIZE.
> So if doprnt (A, ASIZE, B, B + BSIZE, AP) can dereference format_end + 1,
> this means doprnt can access B[BSIZE + 1], which means that
> B should point to a char array of at least BSIZE + 2 bytes.

With the original code, that was the case, yes.  But that is why I
forcibly reset fmt to point to format_end: to avoid dereferencing past
the end of the array.

If you are saying that such invalid dereferencing can still happen,
please show how is that possible, with the code that is now in the
repository.

> Normally, B is a C-language string literal such as "abc%d",
> and BSIZE is the length of the string, which means
> there is potential trouble because normally code
> should not try to read the byte that follows the null
> byte at the end of the string.

That trouble shouldn't happen with the code in the repository.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  7:26                     ` Eli Zaretskii
@ 2011-04-28  7:54                       ` Paul Eggert
  2011-04-28 11:14                         ` Eli Zaretskii
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-04-28  7:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, 8545

On 04/28/11 00:26, Eli Zaretskii wrote:
> If you are saying that such invalid dereferencing can still happen,
> please show how is that possible, with the code that is now in the
> repository.

Sorry, I misunderstood your earlier comment to mean that
doprnt can now compute *(format_end+1).

If all that doprnt does is compute *format_end (or format_end+1
without dereferencing format_end+1), and if the documentation
notes that format_end must point to a character
(format_end cannot point to one-past-the-buffer-end,
which is what I expected), then that part's OK.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  7:54                       ` Paul Eggert
@ 2011-04-28 11:14                         ` Eli Zaretskii
  0 siblings, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-28 11:14 UTC (permalink / raw)
  To: Paul Eggert; +Cc: lekktu, 8545

> Date: Thu, 28 Apr 2011 00:54:21 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: lekktu@gmail.com, 8545@debbugs.gnu.org
> 
> If all that doprnt does is compute *format_end (or format_end+1
> without dereferencing format_end+1), and if the documentation
> notes that format_end must point to a character
> (format_end cannot point to one-past-the-buffer-end,
> which is what I expected), then that part's OK.

Yes, that's how things are.  format_end points to the last byte of the
buffer passed to doprnt.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  3:11           ` Paul Eggert
  2011-04-28  3:42             ` Juanma Barranquero
  2011-04-28  5:15             ` Eli Zaretskii
@ 2011-04-29 12:28             ` Richard Stallman
  2011-04-29 19:56               ` Eli Zaretskii
  2011-04-29 23:49               ` Paul Eggert
  2 siblings, 2 replies; 56+ messages in thread
From: Richard Stallman @ 2011-04-29 12:28 UTC (permalink / raw)
  To: Paul Eggert; +Cc: lekktu, 8545

    No, format_end is already pointing after the object;
    the object's size is format_end - format.  So
    format_end + 1 might not be a valid pointer.

20 years ago,, Emacs allocated an extra byte with a null character
after the end of every string or buffer.  If that is still true
then this pointer actually is valid.

But it doesn't matter anyway, for reasons described below.

    Yes.  A portable C program is not allowed to create a pointer that
    doesn't point to an object,

Our C programs are allowed to create any sort of pointer we want them
to.  ISO has no authority over us.  We are concerned with standards
insofar as they matter in practice for the convenience and reliability
of our software.

Thus, the question that matters to us is whether a construct is going
to cause a problem on the plausibke platforms we will want to support.
That does not include all theoretical ISO C implementations.

    Yes.  To take an extreme example, some architectures can compute
    a pointer only by using a special pointer register, and the register's
    contents are always checked for validity, even if you don't dereference the
    pointer.

They are outside GNU's design range of targets.

    If you assign i = INT_MAX + 1, the resulting behavior is undefined.

The result is INT_MIN.  We don't try to support any theoretical machine
where this would not be so.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org, www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use free telephony http://directory.fsf.org/category/tel/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-29 12:28             ` Richard Stallman
@ 2011-04-29 19:56               ` Eli Zaretskii
  2011-04-29 23:49               ` Paul Eggert
  1 sibling, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-29 19:56 UTC (permalink / raw)
  To: rms; +Cc: lekktu, 8545, eggert

> Date: Fri, 29 Apr 2011 08:28:35 -0400
> From: Richard Stallman <rms@gnu.org>
> Cc: lekktu@gmail.com, 8545@debbugs.gnu.org
> 
>     No, format_end is already pointing after the object;
>     the object's size is format_end - format.  So
>     format_end + 1 might not be a valid pointer.
> 
> 20 years ago, Emacs allocated an extra byte with a null character
> after the end of every string or buffer.

20 years later, we still do that for strings, but (AFAICS) not for
buffers.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-29 12:28             ` Richard Stallman
  2011-04-29 19:56               ` Eli Zaretskii
@ 2011-04-29 23:49               ` Paul Eggert
  2011-04-30 21:03                 ` Richard Stallman
  2011-05-01  4:25                 ` Jason Rumney
  1 sibling, 2 replies; 56+ messages in thread
From: Paul Eggert @ 2011-04-29 23:49 UTC (permalink / raw)
  To: rms; +Cc: lekktu, 8545

On 04/29/11 05:28, Richard Stallman wrote:

> We are concerned with standards insofar as they matter in practice
> for the convenience and reliability of our software.

Yes, of course, I should have made that clearer.  Standards are our
tools, not our masters.

>> If you assign i = INT_MAX + 1, the resulting behavior is undefined.
>
> The result is INT_MIN.  We don't try to support any theoretical machine
> where this would not be so.

Those machines used to be theoretical, but they're in common
use now.  Practical C code can no longer assume that integers
always wrap around when doing integer arithmetic.  For example:

   long
   foo (char *p, int i)
   {
     return &p[i + 1] - &p[i];
   }

On typical hosts where int is 32 bits, and long and char * are
both 64 bits, most compilers optimize that "return" statement
to "return 1;", even when I is INT_MAX and I + 1 therefore
overflows.  These compilers are therefore rejecting the notion
that INT_MAX + 1 must always equal INT_MIN.

Although FOO is contrived, a lot of complicated code in Emacs
mixes int and long and pointer arithmetic, and it's inevitable
that compilers are doing optimizations like the above when they
compile Emacs.  We cannot simply declare that INT_MAX + 1 must
always be INT_MIN, because that's not how compilers actually
work these days.

What we need is good advice for programmers in this area, so
that they can write C code that is portable in practice.  This
advice shouldn't be too conservative, because that would
discourage useful programs.  Nor should it be too
loosey-goosey, because that would encourage buggy programs.
Nor should it be complicated, because that would confuse
people and waste their time.  It is not easy to come up with
advice that satisfies all three constraints.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-29 23:49               ` Paul Eggert
@ 2011-04-30 21:03                 ` Richard Stallman
  2011-05-01  5:41                   ` Paul Eggert
  2011-05-01  4:25                 ` Jason Rumney
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Stallman @ 2011-04-30 21:03 UTC (permalink / raw)
  To: Paul Eggert; +Cc: lekktu, 8545

    >> If you assign i = INT_MAX + 1, the resulting behavior is undefined.
    >
    > The result is INT_MIN.  We don't try to support any theoretical machine
    > where this would not be so.

    Those machines used to be theoretical, but they're in common
    use now.

I assumed we were talking about type `int', but you did not explicitly
say so.  Touché -- but that just means we are talking at cross
purposes.  What I said about addition on type int is still valid.

     printf ("%d", INT_MAX+1);

will output INT_MIN.

      Practical C code can no longer assume that integers
    always wrap around when doing integer arithmetic.

I think that is the wrong interpretation of the facts.

       long
       foo (char *p, int i)
       {
	 return &p[i + 1] - &p[i];
       }
    On typical hosts where int is 32 bits, and long and char * are
    both 64 bits, most compilers optimize that "return" statement
    to "return 1;", even when I is INT_MAX and I + 1 therefore
    overflows.  These compilers are therefore rejecting the notion
    that INT_MAX + 1 must always equal INT_MIN.

i+1 is computed as an integer, but then it gets converted to a long.
What happens here seems to be an issue about type conversion combined
with addition -- not addition itself.

These compilers are taking a strange liberty.
Why isn't that a bug?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org, www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use free telephony http://directory.fsf.org/category/tel/





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-30 21:03                 ` Richard Stallman
@ 2011-05-01  5:41                   ` Paul Eggert
  2011-05-01 23:59                     ` Richard Stallman
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-05-01  5:41 UTC (permalink / raw)
  To: rms; +Cc: lekktu, 8545

On 04/30/11 14:03, Richard Stallman wrote:
>        long
>        foo (char *p, int i)
>        {
>          return &p[i + 1] - &p[i];
>        }
> ...
> i+1 is computed as an integer, but then it gets converted to a long.

Unfortunately that doesn't explain FOO's behavior.

If i+1 is computed as an int and if wraparound is required, then
when i is INT_MAX, i+1 must be INT_MIN.  (FOO does not convert
i+1 to long; but even if it did, INT_MIN's value would be unchanged by
that conversion.)  If p is pointing into a large array, &p[INT_MIN]
and &p[INT_MAX] can both be valid addresses, and in that case
foo (p, INT_MAX) would have to yield -2**32 + 1 on a typical 64-bit host
where signed integer arithmetic wrapped around.

But in my experience no compiler does it that way.  FOO always returns 1.

> What happens here seems to be an issue about type conversion combined
> with addition -- not addition itself.

I'm afraid not.  First, FOO doesn't have any type conversions.
If I is an int, A[I] doesn't convert I to any other type; it
simply uses I's value without conversion.

Second, it's easy to construct an example that involves only "int":

   int
   bar (int i)
   {
     return i < i + 1;
   }

With many compilers, BAR always returns 1, even when i == INT_MAX.

> These compilers are taking a strange liberty.
> Why isn't that a bug?

Well, for starters, most programmers *expect* FOO to return 1,
and similarly for BAR.  Why would a programmer file a bug report
when the program is behaving as expected?

>     printf ("%d", INT_MAX+1);
> will output INT_MIN.

That's true for all systems I have ready access to, yes.  And I
expect there are other cases where int arithmetic wraps around
reliably.  But there are many practical cases where it doesn't.
We cannot simply advise programmers to assume that adding 1
to INT_MAX always results in INT_MIN, as that assumption is
often incorrect nowadays.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-05-01  5:41                   ` Paul Eggert
@ 2011-05-01 23:59                     ` Richard Stallman
  2011-05-02  0:23                       ` Paul Eggert
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Stallman @ 2011-05-01 23:59 UTC (permalink / raw)
  To: Paul Eggert; +Cc: lekktu, 8545

    Second, it's easy to construct an example that involves only "int":

       int
       bar (int i)
       {
	 return i < i + 1;
       }
    With many compilers, BAR always returns 1, even when i == INT_MAX.

I see.

This would break many constructions intended to test whether an
operation did overflow.  Is there any reliable way to test for that?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org, www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use free telephony http://directory.fsf.org/category/tel/





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-05-01 23:59                     ` Richard Stallman
@ 2011-05-02  0:23                       ` Paul Eggert
       [not found]                         ` <E1QH37h-0001yM-HR@fencepost.gnu.org>
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-05-02  0:23 UTC (permalink / raw)
  To: rms; +Cc: lekktu, 8545

On 05/01/11 16:59, Richard Stallman wrote:
> This would break many constructions intended to test whether an
> operation did overflow.  Is there any reliable way to test for that?

Yes.  For example:

   int
   add_overflow (int a, int b)
   {
     if (b < 0)
       return a < INT_MIN - b;
     else
       return INT_MAX - b < a;
   }

add_overflow (a, b) returns 1 if a + b would overflow.
There are similar reliable tests for the other arithmetic operations.





^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <E1QH37h-0001yM-HR@fencepost.gnu.org>]

* bug#8545: issues with recent doprnt-related changes
       [not found]                         ` <E1QH37h-0001yM-HR@fencepost.gnu.org>
@ 2011-05-03 20:24                           ` Paul Eggert
  0 siblings, 0 replies; 56+ messages in thread
From: Paul Eggert @ 2011-05-03 20:24 UTC (permalink / raw)
  To: rms; +Cc: 8545

>>     There are similar reliable tests for the other arithmetic operations.
> 
> Is this documented somewhere?  Is there a list of the standard ways?

CERT has something, here:

https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow

Although the principles in that memo are OK, the actual code is
hard to read and its multiplication overflow checking is buggy.

Here's something better, which I just now wrote.  Also, please see
Emacs Bug#8611 <http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8611>;
its patch uses code like the following.


#include <limits.h>

int
add_overflow (int a, int b)
{
  return (b < 0
	  ? a < INT_MIN - b
	  : INT_MAX - b < a);
}

int
subtract_overflow (int a, int b)
{
  return (b < 0
	  ? INT_MAX + b < a
	  : a < INT_MIN + b);
}

int
unary_minus_overflow (int a)
{
  return a < -INT_MAX;
}

int
multiply_overflow (int a, int b)
{
  return (b < 0
	  ? (a < 0
	     ? a < INT_MAX / b
	     : b != -1 && INT_MIN / b < a)
	  : (b != 0
	     && (a < 0
		 ? a < INT_MIN / b
		 : INT_MAX / b < a)));
}

int
quotient_overflow (int a, int b)
{
  /* This does not check for division by zero.  Add that if you like.  */
  return a < -INT_MAX && b == -1;
}

int
remainder_overflow (int a, int b)
{
  /* Mathematically the remainder should never overflow, but on x86-like
     hosts INT_MIN % -1 traps, and the C standard permits this.  */
  return quotient_overflow (a, b);
}





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-29 23:49               ` Paul Eggert
  2011-04-30 21:03                 ` Richard Stallman
@ 2011-05-01  4:25                 ` Jason Rumney
  2011-05-01  5:56                   ` Paul Eggert
  1 sibling, 1 reply; 56+ messages in thread
From: Jason Rumney @ 2011-05-01  4:25 UTC (permalink / raw)
  To: Paul Eggert; +Cc: lekktu, 8545, rms

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 04/29/11 05:28, Richard Stallman wrote:
>
>> We are concerned with standards insofar as they matter in practice
>> for the convenience and reliability of our software.
>
> Yes, of course, I should have made that clearer.  Standards are our
> tools, not our masters.
>
>>> If you assign i = INT_MAX + 1, the resulting behavior is undefined.
>>
>> The result is INT_MIN.  We don't try to support any theoretical machine
>> where this would not be so.
>
>    long
>    foo (char *p, int i)
>    {
>      return &p[i + 1] - &p[i];
>    }
>
> On typical hosts where int is 32 bits, and long and char * are
> both 64 bits, most compilers optimize that "return" statement
> to "return 1;", even when I is INT_MAX and I + 1 therefore
> overflows.

That does not mean that INT_MAX + 1 is undefined.  You are also
involving implicit casts here. Exactly where those casts happen is what
is undefined.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-05-01  4:25                 ` Jason Rumney
@ 2011-05-01  5:56                   ` Paul Eggert
  2011-05-01  8:12                     ` Jason Rumney
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-05-01  5:56 UTC (permalink / raw)
  To: Jason Rumney; +Cc: lekktu, 8545, rms

On 04/30/11 21:25, Jason Rumney wrote:
> You are also
> involving implicit casts here. Exactly where those casts happen is what
> is undefined.

I'm not sure what is meant here, as the program in question
doesn't have any implicit arithmetic conversions (or at least,
it doesn't on the typical 64-bit host I was talking about).

> That does not mean that INT_MAX + 1 is undefined.

It could well be that the exact expression "INT_MAX + 1"
has well-defined behavior on all platforms we care about,
and that it always returns INT_MIN.  But I'm concerned
about the more general issue, which is whether Emacs code
can always assume that signed integer arithmetic
wraps around.  Unfortunately, there are many practical
Emacs targets where it's incorrect to assume that.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-05-01  5:56                   ` Paul Eggert
@ 2011-05-01  8:12                     ` Jason Rumney
  2011-05-01 11:02                       ` Andreas Schwab
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Rumney @ 2011-05-01  8:12 UTC (permalink / raw)
  To: Paul Eggert; +Cc: lekktu, 8545, rms

Paul Eggert <eggert@cs.ucla.edu> writes:

> I'm not sure what is meant here, as the program in question
> doesn't have any implicit arithmetic conversions (or at least,
> it doesn't on the typical 64-bit host I was talking about).

It has implicit type conversion from int to char* and then to long.
Whether those types are different from each other depends on the host
machine, OS and compiler. That is what is undeterministic about your
example, not the fact that in int arithmetic INT_MAX + 1 == INT_MIN.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-05-01  8:12                     ` Jason Rumney
@ 2011-05-01 11:02                       ` Andreas Schwab
  0 siblings, 0 replies; 56+ messages in thread
From: Andreas Schwab @ 2011-05-01 11:02 UTC (permalink / raw)
  To: Jason Rumney; +Cc: lekktu, 8545, Paul Eggert, rms

Jason Rumney <jasonr@gnu.org> writes:

> It has implicit type conversion from int to char* and then to long.

The only implicit conversion in this function is from ptrdiff_t to long.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-28  1:32         ` Juanma Barranquero
  2011-04-28  3:11           ` Paul Eggert
@ 2011-04-28  5:02           ` Eli Zaretskii
  1 sibling, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-28  5:02 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 8545, eggert

> From: Juanma Barranquero <lekktu@gmail.com>
> Date: Thu, 28 Apr 2011 03:32:23 +0200
> Cc: Eli Zaretskii <eliz@gnu.org>, 8545@debbugs.gnu.org
> 
> On Thu, Apr 28, 2011 at 01:51, Paul Eggert <eggert@cs.ucla.edu> wrote:
> 
> > If fmt is actually greater than format_end, it's pointing past the end
> > of an object, so the C code is relying on undefined behavior and the
> > check therefore isn't portable.
> 
> I'm no expert on the C standard, but would it be undefined behavior,
> as long as the pointer has not been dereferenced? A cursory look
> suggests that fmt == format_end + 1 is possible, but fmt is not
> dereferenced in that case.

My (not-so cursory) look at the code suggests that we do dereference
it, in this fragment:

	  switch (*fmt++)
	    {
	    default:
	      error ("Invalid format operation %%%s%c",
		     long_flag ? "l" : "", fmt[-1]);

If fmt > format_end, this will dereference the address beyond
format_end.  I thought showing the last character of the format string
itself is a better idea.  It is also exactly equivalent to what the
code will do and display when *format_end == '\0'.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-27 23:51       ` Paul Eggert
  2011-04-28  1:32         ` Juanma Barranquero
@ 2011-04-28  5:50         ` Eli Zaretskii
       [not found]           ` <4DB9146D.2040702@cs.ucla.edu>
  1 sibling, 1 reply; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-28  5:50 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545-done

> Date: Wed, 27 Apr 2011 16:51:06 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 8545@debbugs.gnu.org
> 
> >> A quick second scan found a minor bug in size parsing: the
> >> expression "n >= SIZE_MAX / 10" should be "n > SIZE_MAX / 10".
> > 
> > When they get to messages as long as SIZE_MAX, let them sue me for
> > taking away one byte.
> 
> It's not a question of saving space at run-time.  It's a question of
> helping the reader.  The reader is left wondering: why is that ">="
> there?

The reader will be wondering with ">" as well.  There's a comment
about checking for overflow which should be a good hint, especially
since SIZE_MAX is compared against.

>  And why is there another test "n * 10 > SIZE_MAX - (fmt[1] -
> '0')" that always returns 0, no matter what?

??? What happens if n*10 is SIZE_MAX-1 and fmt[1] is '2'?  Is the
result still zero?

>   /* Avoid int overflow, because many sprintfs seriously mess up
>      with widths or precisions greater than INT_MAX.  Avoid size_t
>      overflow, since our counters use size_t.  This test is slightly
>      conservative, for speed and simplicity.  */
>   if (n >= min (INT_MAX, SIZE_MAX) / 10)
>      error ("Format width or precision too large");

Sorry, I don't see how this is clearer.  The current code after the
test is built out of the same building blocks as the test, and
therefore the intent and the details of the test are easier to
understand than with your variant, which perhaps is mathematically and
numerically equivalent, but makes the code reading _harder_ because it
severs the syntactical connection between the two.

> > "MOST_POSITIVE_FIXNUM + 1" is too much, since MOST_POSITIVE_FIXNUM
> > should be able to cover the terminating null character in Emacs.
> 
> Why?  Emacs size fields count the bytes in the string, and does not
> count the terminating null byte (which is not part of the string).

That's not what I know.  String positions are zero-based and extend to
include the terminating null character.  See the relevant parts of the
display engine code.

> * doprnt invokes strlen to find the length of the format.  The
>   vsnprintf code didn't need to do that: it traversed the format once.
>   Surely it shouldn't be hard to change doprnt so that it traverses
>   the format once rather than twice.

doprnt is invoked in the context of displaying an error message that
throws to top level, and so it doesn't need to be optimized (which
will surely make the code more complex and error-prone, and its
maintenance harder).

> * Sometimes verror will incorrectly truncate a string, even when there
>   is plenty of memory.  verror might call doprnt (buffer, SIZE, m, m +
>   mlen, ap), and doprnt might discover that a multibyte character is
>   chopped in half at the end of the output buffer, and might return
>   (say) SIZE - 2.  verror will incorrectly conclude that the output
>   was just fine, but it wasn't complete.

Not an issue, what with the initial buffer size you enlarged to 4000.
I needed to artificially lower it to just 2 dozen bytes, just to see
the recovery code in action.  If someone wants to display a 4001-byte
message that ends with a multibyte non-ASCII character, let them be
punished for not knowing how to write concisely.

> * verror might invoke doprnt two or more times, which means that
>   doprnt will traverse ap twice.  This does not work in general; the C
>   standard is quite clear that the behavior is undefined in this case.

Are there any platforms supported by Emacs where this actually
happens?  If not, let's forget about this issue until it hits us.

I'm closing this bug.  We are already well past any real problems, and
invested too much energy and efforts of two busy people on this tiny
function, all because of your stubborn insistence on using a library
function where it doesn't fit the bill.  I hope you now have more
respect for views and code of others in general, and mine in
particular, so we won't need to go through this painful experience
again in the future.

Let's move on; I still need to work on the bidirectional display of
overlay strings and display properties, a job that was already delayed
by several precious days due to these disputes and the gratuitous work
on the code that should have been left alone in the first place.

^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <4DB9146D.2040702@cs.ucla.edu>]

[parent not found: <E1QFQVO-0004Dq-6o@fencepost.gnu.org>]

[parent not found: <4DB9E5FF.9020506@cs.ucla.edu>]

* bug#8545: issues with recent doprnt-related changes
       [not found]               ` <4DB9E5FF.9020506@cs.ucla.edu>
@ 2011-04-29 11:16                 ` Eli Zaretskii
  2011-04-29 14:41                   ` Paul Eggert
  2011-05-04  7:28                   ` Paul Eggert
  0 siblings, 2 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-29 11:16 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545

> Date: Thu, 28 Apr 2011 15:11:11 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 8545-done@debbugs.gnu.org
> 
> > If that's what you mean, it's easy to fix.  Done.
> 
> If I understand things correctly, that fix (in bzr 104036) handles the
> case where the format itself has a non-ASCII character that is
> truncated, but it doesn't handle the case where the format is
> something like "file name = %s", and %s expands to a long file name
> that is truncated.  If so, surely that case still needs to be fixed.

Ah, yes.  Missed one more place where this truncation could happen.
Should be fixed now (including an older bug with that code).

> Yes, portable code is supposed to use va_copy.  Code that traverses
> through an argument list N times can call va_start once, va_copy N - 1
> times, and va_end N times (once on the original, once on each copy).
> 
> va_copy is a C99-ism, but we can use it as-is in Emacs source code,
> and use the relevant gnulib module for obsolete platforms that lack it.
> Do the DOS and NT ports have va_copy?  If not, it should be simple
> to supply a substitute.

The MS-DOS and MinGW builds use GCC, so they have va_copy by
definition.  MSVC doesn't, but we can provide a trivial definition
which will work for x86.  If we still support MSVC by the time Emacs
can be built as a 64-bit executable on Windows, and if MSVC still
doesn't have va_copy by that time, we can handle this better at that
time.

> Another possibility is to remove the 'if' test entirely, making it the
> caller's responsibility to not specify outlandish widths in format
> strings.

I don't think this is a good idea.  verror is in many cases the last
line of defense, so it should IMO be rock-stable and try very hard to
emit something useful even in the most improbable situations.

For that reason, I also don't like the calls to `abort' you
introduced.  I understand the motivation (detection of invalid Emacs
code), but why not make it call `error' instead, like we do here:

	  switch (*fmt++)
	    {
	    default:
	      error ("Invalid format operation %%%s%c",
		     "ll" + 2 - long_flag, fmt[-1]);

After all, using %ll when the long long data type isn't supported is
not different from using %a or some other unsupported format letter,
right?

> OK, thanks.  I read the code, and if I understand it correctly, since
> 'point' is 1-origin, a buffer with MOST_POSITIVE_FIXNUM characters
> will have values of 'point' ranging from 1 through
> MOST_POSITIVE_FIXNUM + 1, but that "+ 1" would mean Fpoint wouldn't
> work: so we should limit buffers to contain at most
> MOST_POSITIVE_FIXNUM - 1 bytes.

I guess so, yes.  I would like to have other opinions, though, so I
will start a new thread on emacs-devel about that.

> Is it also the case that Emacs should limit strings to at most
> MOST_POSITIVE_FIXNUM - 1 bytes?

Only if we are thinking about copying a MOST_POSITIVE_FIXNUM-long
string into a buffer.  Otherwise, since string positions are
zero-based, I think strings are safe with MOST_POSITIVE_FIXNUM.

> Sorry, I couldn't tell this from the functions you mentioned;
> there's a lot of code there, and this stuff isn't immediately
> obvious.

Yeah, tell me about that.  I've been hacking that code extensively for
the last year and a half, and I still don't always know who's who ;-)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-29 11:16                 ` Eli Zaretskii
@ 2011-04-29 14:41                   ` Paul Eggert
  2011-04-29 19:35                     ` Eli Zaretskii
  2011-05-04  7:28                   ` Paul Eggert
  1 sibling, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-04-29 14:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8545

On 04/29/11 04:16, Eli Zaretskii wrote:
> I also don't like the calls to `abort' you
> introduced.  I understand the motivation (detection of invalid Emacs
> code), but why not make it call `error' instead

Yes, that would be fine.  I hadn't thought of that possibility.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-29 14:41                   ` Paul Eggert
@ 2011-04-29 19:35                     ` Eli Zaretskii
  2011-04-29 20:32                       ` Paul Eggert
  0 siblings, 1 reply; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-29 19:35 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545

> Date: Fri, 29 Apr 2011 07:41:40 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 8545@debbugs.gnu.org
> 
> On 04/29/11 04:16, Eli Zaretskii wrote:
> > I also don't like the calls to `abort' you
> > introduced.  I understand the motivation (detection of invalid Emacs
> > code), but why not make it call `error' instead
> 
> Yes, that would be fine.

I installed such a change.

Btw, none of the platforms currently defines HAVE_LONG_LONG_INT or
HAVE_UNSIGNED_LONG_LONG_INT, AFAICS.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-29 19:35                     ` Eli Zaretskii
@ 2011-04-29 20:32                       ` Paul Eggert
  2011-04-30  8:59                         ` Eli Zaretskii
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-04-29 20:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8545

On 04/29/11 12:35, Eli Zaretskii wrote:
> none of the platforms currently defines HAVE_LONG_LONG_INT or
> HAVE_UNSIGNED_LONG_LONG_INT, AFAICS.

It's done automatically, by 'configure'.  HAVE_LONG_LONG_INT is
1 on all the platforms I regularly use, e.g., x86 GNU/Linux.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-29 20:32                       ` Paul Eggert
@ 2011-04-30  8:59                         ` Eli Zaretskii
  0 siblings, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-04-30  8:59 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545

> Date: Fri, 29 Apr 2011 13:32:29 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 8545@debbugs.gnu.org
> 
> On 04/29/11 12:35, Eli Zaretskii wrote:
> > none of the platforms currently defines HAVE_LONG_LONG_INT or
> > HAVE_UNSIGNED_LONG_LONG_INT, AFAICS.
> 
> It's done automatically, by 'configure'.

You are right, I didn't look in some of the possible places.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-04-29 11:16                 ` Eli Zaretskii
  2011-04-29 14:41                   ` Paul Eggert
@ 2011-05-04  7:28                   ` Paul Eggert
  2011-05-04  9:52                     ` Eli Zaretskii
  1 sibling, 1 reply; 56+ messages in thread
From: Paul Eggert @ 2011-05-04  7:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8545

On 04/29/11 04:16, Eli Zaretskii wrote:
> I guess so, yes.  I would like to have other opinions, though, so I
> will start a new thread on emacs-devel about that.

It seems from that discussion that strings can contain MOST_POSITIVE_FIXNUM bytes.
Also, that va_arg bug really needs fixing.  So I plan to install the following
patch after some more testing.  This assumes va_copy exists, which may affect
the Windows port, but a one-line macro should suffice if it doesn't have
va_copy already.

=== modified file 'ChangeLog'
--- ChangeLog	2011-05-04 06:11:49 +0000
+++ ChangeLog	2011-05-04 07:19:21 +0000
@@ -1,5 +1,9 @@
 2011-05-04  Paul Eggert  <eggert@cs.ucla.edu>
 
+	Use C99's va_copy to avoid undefined behavior on x86-64 GNU/Linux.
+	* Makefile.in (GNULIB_MODULES): Add stdarg, for va_copy.
+	* lib/stdarg.in.h, m4/stdarg.m4: New files, from gnulib.
+
 	* Makefile.in (GNULIB_TOOL_FLAG): Add --conditional-dependencies.
 	This new gnulib-tool option saves 'configure' the trouble of
 	checking for strtoull when strtoumax exists.

=== modified file 'Makefile.in'
--- Makefile.in	2011-05-04 06:11:49 +0000
+++ Makefile.in	2011-05-04 07:19:21 +0000
@@ -333,7 +333,7 @@
 GNULIB_MODULES = \
   careadlinkat crypto/md5 dtoastr filemode getloadavg getopt-gnu \
   ignore-value intprops lstat mktime readlink \
-  socklen stdio strftime strtoumax symlink sys_stat
+  socklen stdarg stdio strftime strtoumax symlink sys_stat
 GNULIB_TOOL_FLAGS = \
  --conditional-dependencies --import --no-changelog --no-vc-files \
  --makefile-name=gnulib.mk

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2011-05-04 06:13:23 +0000
+++ src/ChangeLog	2011-05-04 07:20:46 +0000
@@ -1,5 +1,16 @@
 2011-05-04  Paul Eggert  <eggert@cs.ucla.edu>
 
+	* term.c (vfatal): Remove stray call to va_end.
+	It's not needed and the C Standard doesn't allow it here anyway.
+
+	Use C99's va_copy to avoid undefined behavior on x86-64 GNU/Linux.
+	* eval.c (verror): doprnt a copy of ap, not the original.  (Bug#8545)
+
+	* eval.c (verror): OK to create a string of up to MOST_POSITIVE_FIXNUM
+	bytes.
+
+	* term.c: Don't include <stdarg.h>, as <lisp.h> does that.
+
 	Arithmetic overflows now return float rather than wrapping around.
 	(Bug#8611).
 	* data.c: Include <intprops.h>.

=== modified file 'src/eval.c'
--- src/eval.c	2011-04-30 19:00:39 +0000
+++ src/eval.c	2011-05-04 07:19:21 +0000
@@ -1994,7 +1994,7 @@
 {
   char buf[4000];
   size_t size = sizeof buf;
-  size_t size_max = min (MOST_POSITIVE_FIXNUM, SIZE_MAX);
+  size_t size_max = min (MOST_POSITIVE_FIXNUM + 1, SIZE_MAX);
   size_t mlen = strlen (m);
   char *buffer = buf;
   size_t used;
@@ -2002,7 +2002,10 @@
 
   while (1)
     {
-      used = doprnt (buffer, size, m, m + mlen, ap);
+      va_list ap_copy;
+      va_copy (ap_copy, ap);
+      used = doprnt (buffer, size, m, m + mlen, ap_copy);
+      va_end (ap_copy);
 
       /* Note: the -1 below is because `doprnt' returns the number of bytes
 	 excluding the terminating null byte, and it always terminates with a

=== modified file 'src/term.c'
--- src/term.c	2011-04-24 09:00:03 +0000
+++ src/term.c	2011-05-04 07:20:46 +0000
@@ -26,7 +26,6 @@
 #include <sys/file.h>
 #include <unistd.h>
 #include <signal.h>
-#include <stdarg.h>
 #include <setjmp.h>
 
 #include "lisp.h"
@@ -3619,7 +3618,6 @@
   vfprintf (stderr, str, ap);
   if (!(strlen (str) > 0 && str[strlen (str) - 1] == '\n'))
     fprintf (stderr, "\n");
-  va_end (ap);
   fflush (stderr);
   exit (1);
 }






^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-05-04  7:28                   ` Paul Eggert
@ 2011-05-04  9:52                     ` Eli Zaretskii
  2011-05-04 14:56                       ` Paul Eggert
       [not found]                       ` <4DC1692B.1090101@cs.ucla.edu>
  0 siblings, 2 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-05-04  9:52 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 8545

> Date: Wed, 04 May 2011 00:28:18 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 8545@debbugs.gnu.org
> 
> On 04/29/11 04:16, Eli Zaretskii wrote:
> > I guess so, yes.  I would like to have other opinions, though, so I
> > will start a new thread on emacs-devel about that.
> 
> It seems from that discussion that strings can contain MOST_POSITIVE_FIXNUM bytes.

I think the conclusion was that it can contain MOST_POSITIVE_FIXNUM
_including_the_terminating_null_.

> Also, that va_arg bug really needs fixing.  So I plan to install the following
> patch after some more testing.  This assumes va_copy exists, which may affect
> the Windows port, but a one-line macro should suffice if it doesn't have
> va_copy already.

Right, I will take care of that when this is installed.

Thanks.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
  2011-05-04  9:52                     ` Eli Zaretskii
@ 2011-05-04 14:56                       ` Paul Eggert
       [not found]                       ` <4DC1692B.1090101@cs.ucla.edu>
  1 sibling, 0 replies; 56+ messages in thread
From: Paul Eggert @ 2011-05-04 14:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8545, Emacs Development

On 05/04/11 02:52, Eli Zaretskii wrote:
>> It seems from that discussion that strings can contain MOST_POSITIVE_FIXNUM bytes.
> I think the conclusion was that it can contain MOST_POSITIVE_FIXNUM
> _including_the_terminating_null_.

Hmm, that's not how I read
<http://lists.gnu.org/archive/html/emacs-devel/2011-04/msg00923.html>.

I understood the argument to be that a buffer must contain at most
MOST_POSITIVE_FIXNUM - 1 bytes due to other reasons, but it's OK
to have "a string whose length is MOST_POSITIVE_FIXNUM", i.e.,
(= (length STRING) most-positive-fixnum), because we already check
buffer sizes before inserting strings.  If you
count the trailing byte, the length of the underlying C character
array would be MOST_POSITIVE_FIXNUM + 1.

I'll CC: this to emacs-devel just in case I misinterpreted that.

^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <4DC1692B.1090101@cs.ucla.edu>]

* bug#8545: issues with recent doprnt-related changes
       [not found]                       ` <4DC1692B.1090101@cs.ucla.edu>
@ 2011-05-05 20:36                         ` Eli Zaretskii
       [not found]                         ` <83ei4cnau6.fsf@gnu.org>
  1 sibling, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-05-05 20:36 UTC (permalink / raw)
  To: Paul Eggert, Stefan Monnier; +Cc: 8545, emacs-devel

> Date: Wed, 04 May 2011 07:56:43 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 8545@debbugs.gnu.org, Emacs Development <emacs-devel@gnu.org>
> 
> On 05/04/11 02:52, Eli Zaretskii wrote:
> >> It seems from that discussion that strings can contain MOST_POSITIVE_FIXNUM bytes.
> > I think the conclusion was that it can contain MOST_POSITIVE_FIXNUM
> > _including_the_terminating_null_.
> 
> Hmm, that's not how I read
> <http://lists.gnu.org/archive/html/emacs-devel/2011-04/msg00923.html>.
> 
> I understood the argument to be that a buffer must contain at most
> MOST_POSITIVE_FIXNUM - 1 bytes due to other reasons, but it's OK
> to have "a string whose length is MOST_POSITIVE_FIXNUM", i.e.,
> (= (length STRING) most-positive-fixnum), because we already check
> buffer sizes before inserting strings.  If you
> count the trailing byte, the length of the underlying C character
> array would be MOST_POSITIVE_FIXNUM + 1.

Stefan, could you please chime in?





^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <83ei4cnau6.fsf@gnu.org>]

* bug#8545: issues with recent doprnt-related changes
       [not found]                         ` <83ei4cnau6.fsf@gnu.org>
@ 2011-05-06 13:33                           ` Stefan Monnier
       [not found]                           ` <jwvsjss2bz3.fsf-monnier+emacs@gnu.org>
  1 sibling, 0 replies; 56+ messages in thread
From: Stefan Monnier @ 2011-05-06 13:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8545, Paul Eggert, emacs-devel

>> >> It seems from that discussion that strings can contain
>> >> MOST_POSITIVE_FIXNUM bytes.
>> > I think the conclusion was that it can contain MOST_POSITIVE_FIXNUM
>> > _including_the_terminating_null_.
>> 
>> Hmm, that's not how I read
>> <http://lists.gnu.org/archive/html/emacs-devel/2011-04/msg00923.html>.
>> 
>> I understood the argument to be that a buffer must contain at most
>> MOST_POSITIVE_FIXNUM - 1 bytes due to other reasons, but it's OK
>> to have "a string whose length is MOST_POSITIVE_FIXNUM", i.e.,
>> (= (length STRING) most-positive-fixnum), because we already check
>> buffer sizes before inserting strings.  If you
>> count the trailing byte, the length of the underlying C character
>> array would be MOST_POSITIVE_FIXNUM + 1.

> Stefan, could you please chime in?

I'm not sure I understand the details more than you do.  AFAIK the
MOST_POSITIVE_FIXNUM limit only comes into play when the number goes
through Lisp_Object somewhere.  And we never show "length, including
terminating nul-byte" to Elisp, so the nul-byte shouldn't matter: while
the C array will have size MOST_POSITIVE_FIXNUM + 1, this will only be
represented in EMACS_INT which can accommodate it just fine.


        Stefan





^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <jwvsjss2bz3.fsf-monnier+emacs@gnu.org>]

* bug#8545: issues with recent doprnt-related changes
       [not found]                           ` <jwvsjss2bz3.fsf-monnier+emacs@gnu.org>
@ 2011-05-06 14:41                             ` Paul Eggert
  2011-05-06 15:03                             ` Eli Zaretskii
       [not found]                             ` <83vcxnlvl9.fsf@gnu.org>
  2 siblings, 0 replies; 56+ messages in thread
From: Paul Eggert @ 2011-05-06 14:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 8545, emacs-devel

On 05/06/11 06:33, Stefan Monnier wrote:
> I'm not sure I understand the details more than you do.  AFAIK the
> MOST_POSITIVE_FIXNUM limit only comes into play when the number goes
> through Lisp_Object somewhere.  And we never show "length, including
> terminating nul-byte" to Elisp, so the nul-byte shouldn't matter: while
> the C array will have size MOST_POSITIVE_FIXNUM + 1, this will only be
> represented in EMACS_INT which can accommodate it just fine.

First, you say you're not sure you understand the details;
then, you explain the situation perfectly!  :-)

Thanks.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* bug#8545: issues with recent doprnt-related changes
       [not found]                           ` <jwvsjss2bz3.fsf-monnier+emacs@gnu.org>
  2011-05-06 14:41                             ` Paul Eggert
@ 2011-05-06 15:03                             ` Eli Zaretskii
       [not found]                             ` <83vcxnlvl9.fsf@gnu.org>
  2 siblings, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-05-06 15:03 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 8545, eggert, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Paul Eggert <eggert@cs.ucla.edu>,  8545@debbugs.gnu.org,  emacs-devel@gnu.org
> Date: Fri, 06 May 2011 10:33:21 -0300
> 
> And we never show "length, including
> terminating nul-byte" to Elisp, so the nul-byte shouldn't matter

If you are sure of that, then I agree it's fine to use
MOST_POSITIVE_FIXNUM+1.  I wasn't sure.  I do see in C variables that
explicitly use the string position of the null byte; I was afraid that
they could somehow leak to Lisp.





^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <83vcxnlvl9.fsf@gnu.org>]

* bug#8545: issues with recent doprnt-related changes
       [not found]                             ` <83vcxnlvl9.fsf@gnu.org>
@ 2011-05-06 17:13                               ` Stefan Monnier
       [not found]                               ` <jwv8vuj21q0.fsf-monnier+emacs@gnu.org>
  1 sibling, 0 replies; 56+ messages in thread
From: Stefan Monnier @ 2011-05-06 17:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8545, eggert, emacs-devel

>> And we never show "length, including
>> terminating nul-byte" to Elisp, so the nul-byte shouldn't matter

> If you are sure of that, then I agree it's fine to use
> MOST_POSITIVE_FIXNUM+1.  I wasn't sure.  I do see in C variables that
> explicitly use the string position of the null byte; I was afraid that
> they could somehow leak to Lisp.

I don't know enough to guarantee you that they can never leak to Lisp,
but I can't think of any reason why they should.  And note that "the
position of the nul byte" is the same as "the length of the list", so
it's still <= MOST_POSITIVE_FIXNUM.  It's only the position after the
nul byte that would overflow.


        Stefan





^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <jwv8vuj21q0.fsf-monnier+emacs@gnu.org>]

* bug#8545: issues with recent doprnt-related changes
       [not found]                               ` <jwv8vuj21q0.fsf-monnier+emacs@gnu.org>
@ 2011-05-06 19:57                                 ` Eli Zaretskii
       [not found]                                 ` <83k4e3lhzp.fsf@gnu.org>
  1 sibling, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-05-06 19:57 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 8545, eggert, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: eggert@cs.ucla.edu,  8545@debbugs.gnu.org,  emacs-devel@gnu.org
> Date: Fri, 06 May 2011 14:13:06 -0300
> 
> note that "the position of the nul byte" is the same as "the length
> of the list", so it's still <= MOST_POSITIVE_FIXNUM.  It's only the
> position after the nul byte that would overflow.

But what about this code and its commentary (from
next_element_from_c_string):

  /* IT's position can be greater IT->string_nchars in case a field
     width or precision has been specified when the iterator was
     initialized.  */
  if (IT_CHARPOS (*it) >= it->end_charpos)
    {
      /* End of the game.  */
      ...

This happens when the iterator is initialized by reseat_to_string.
Admittedly, it's not very practical to have such huge strings that are
padded to an even larger width...





^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <83k4e3lhzp.fsf@gnu.org>]

* bug#8545: issues with recent doprnt-related changes
       [not found]                                 ` <83k4e3lhzp.fsf@gnu.org>
@ 2011-05-07  3:18                                   ` Stefan Monnier
       [not found]                                   ` <jwvr58byz9s.fsf-monnier+emacs@gnu.org>
  1 sibling, 0 replies; 56+ messages in thread
From: Stefan Monnier @ 2011-05-07  3:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 8545, eggert, emacs-devel

>> note that "the position of the nul byte" is the same as "the length
>> of the list", so it's still <= MOST_POSITIVE_FIXNUM.  It's only the
>> position after the nul byte that would overflow.

> But what about this code and its commentary (from
> next_element_from_c_string):

>   /* IT's position can be greater IT->string_nchars in case a field
>      width or precision has been specified when the iterator was
>      initialized.  */
>   if (IT_CHARPOS (*it) >= it->end_charpos)
>     {
>       /* End of the game.  */
>       ...

Do these ever make it into a Lisp_Object?


        Stefan





^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <jwvr58byz9s.fsf-monnier+emacs@gnu.org>]

* bug#8545: issues with recent doprnt-related changes
       [not found]                                   ` <jwvr58byz9s.fsf-monnier+emacs@gnu.org>
@ 2011-05-07  7:55                                     ` Eli Zaretskii
  0 siblings, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2011-05-07  7:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 8545, eggert, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: eggert@cs.ucla.edu,  8545@debbugs.gnu.org,  emacs-devel@gnu.org
> Date: Sat, 07 May 2011 00:18:47 -0300
> 
> >> note that "the position of the nul byte" is the same as "the length
> >> of the list", so it's still <= MOST_POSITIVE_FIXNUM.  It's only the
> >> position after the nul byte that would overflow.
> 
> > But what about this code and its commentary (from
> > next_element_from_c_string):
> 
> >   /* IT's position can be greater IT->string_nchars in case a field
> >      width or precision has been specified when the iterator was
> >      initialized.  */
> >   if (IT_CHARPOS (*it) >= it->end_charpos)
> >     {
> >       /* End of the game.  */
> >       ...
> 
> Do these ever make it into a Lisp_Object?

Well, the resulting string can be returned by format-mode-line, for
example.





^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2020-09-16  2:01 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-01 18:19 bug#8601: * 2 -> * 4 typo fix in detect_coding_charset Paul Eggert
2011-05-01 19:06 ` Andreas Schwab
2011-05-01 19:25   ` Paul Eggert
2011-05-06  7:29 ` bug#8601: Merged fixes for 8600, 8601, 8602, and (partially) for 8545 Paul Eggert
2020-09-14 12:37   ` bug#8545: " Lars Ingebrigtsen
2020-09-14 18:41     ` Eli Zaretskii
2020-09-16  2:01       ` Paul Eggert
  -- strict thread matches above, loose matches on Subject: below --
2011-04-25  5:46 bug#8545: issues with recent doprnt-related changes Paul Eggert
2011-04-25  9:00 ` Eli Zaretskii
2011-04-25 13:37   ` Stefan Monnier
2011-04-26 20:25     ` Paul Eggert
2011-04-27  1:14       ` Stefan Monnier
2011-04-26  6:02   ` Paul Eggert
2011-04-27 19:34     ` Eli Zaretskii
2011-04-27 23:51       ` Paul Eggert
2011-04-28  1:32         ` Juanma Barranquero
2011-04-28  3:11           ` Paul Eggert
2011-04-28  3:42             ` Juanma Barranquero
2011-04-28  5:06               ` Paul Eggert
2011-04-28  5:15             ` Eli Zaretskii
2011-04-28  5:29               ` Paul Eggert
2011-04-28  6:10                 ` Eli Zaretskii
2011-04-28  6:42                   ` Paul Eggert
2011-04-28  7:26                     ` Eli Zaretskii
2011-04-28  7:54                       ` Paul Eggert
2011-04-28 11:14                         ` Eli Zaretskii
2011-04-29 12:28             ` Richard Stallman
2011-04-29 19:56               ` Eli Zaretskii
2011-04-29 23:49               ` Paul Eggert
2011-04-30 21:03                 ` Richard Stallman
2011-05-01  5:41                   ` Paul Eggert
2011-05-01 23:59                     ` Richard Stallman
2011-05-02  0:23                       ` Paul Eggert
     [not found]                         ` <E1QH37h-0001yM-HR@fencepost.gnu.org>
2011-05-03 20:24                           ` Paul Eggert
2011-05-01  4:25                 ` Jason Rumney
2011-05-01  5:56                   ` Paul Eggert
2011-05-01  8:12                     ` Jason Rumney
2011-05-01 11:02                       ` Andreas Schwab
2011-04-28  5:02           ` Eli Zaretskii
2011-04-28  5:50         ` Eli Zaretskii
     [not found]           ` <4DB9146D.2040702@cs.ucla.edu>
     [not found]             ` <E1QFQVO-0004Dq-6o@fencepost.gnu.org>
     [not found]               ` <4DB9E5FF.9020506@cs.ucla.edu>
2011-04-29 11:16                 ` Eli Zaretskii
2011-04-29 14:41                   ` Paul Eggert
2011-04-29 19:35                     ` Eli Zaretskii
2011-04-29 20:32                       ` Paul Eggert
2011-04-30  8:59                         ` Eli Zaretskii
2011-05-04  7:28                   ` Paul Eggert
2011-05-04  9:52                     ` Eli Zaretskii
2011-05-04 14:56                       ` Paul Eggert
     [not found]                       ` <4DC1692B.1090101@cs.ucla.edu>
2011-05-05 20:36                         ` Eli Zaretskii
     [not found]                         ` <83ei4cnau6.fsf@gnu.org>
2011-05-06 13:33                           ` Stefan Monnier
     [not found]                           ` <jwvsjss2bz3.fsf-monnier+emacs@gnu.org>
2011-05-06 14:41                             ` Paul Eggert
2011-05-06 15:03                             ` Eli Zaretskii
     [not found]                             ` <83vcxnlvl9.fsf@gnu.org>
2011-05-06 17:13                               ` Stefan Monnier
     [not found]                               ` <jwv8vuj21q0.fsf-monnier+emacs@gnu.org>
2011-05-06 19:57                                 ` Eli Zaretskii
     [not found]                                 ` <83k4e3lhzp.fsf@gnu.org>
2011-05-07  3:18                                   ` Stefan Monnier
     [not found]                                   ` <jwvr58byz9s.fsf-monnier+emacs@gnu.org>
2011-05-07  7:55                                     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).