bug#870: Repeatable instance of bug#870

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* bug#870: Repeatable instance of bug#870
@ 2009-01-05  5:03 Juanma Barranquero
  0 siblings, 0 replies; 25+ messages in thread
From: Juanma Barranquero @ 2009-01-05  5:03 UTC (permalink / raw
  To: Emacs Devel; +Cc: 870

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

Today, I've been finally able to create a repeatable test case for
bug#870, "Missing ^J in ChangeLog".

The bug manifests itself as one or more ^J chars missing when reading
a text file. AFAIK, it has only happened with ChangeLogs, and just to
a few Windows users (not unexpectedly, as we typically handle much
more CRLF files than people on other systems).

On my setup, the bug can be repeated at will by doing:

   emacs -Q --eval "(desktop-save-mode 1)" ChangeLog.870
   C-x C-f
   y <RET>    ; to save the desktop when asked
   emacs -Q --eval "(desktop-read)"
   C-s C-q C-M

After that, the cursor will be over a ^M char, the remnant of a CRLF
pair whose ^J has disappeared.

If before restarting Emacs you edit .emacs.desktop and remove
"(buffer-file-coding-system . utf-8-dos)" from the ChangeLog.870
entry, the bug does not happen.

The missing ^J is exactly at position #x8000 of the ChangeLog.870
file. If you do remove a character from the file and repeat the test,
the problem does not happen at position #x8000, but another instance
of the same bug does happen at position #x38007. That seems to
indicate some kind of trouble with a 32 KiB buffer.

I'm attaching a bzipped copy of ChangeLog.870.

Any help in debugging this bug (or even a patch fixing it ;-) will be
much appreciated.

    Juanma

[-- Attachment #2: ChangeLog.870.bz2 --]
[-- Type: application/x-bzip2, Size: 123313 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-05  5:03 Juanma Barranquero
  2009-01-05 10:59 ` bug#870: " Jason Rumney
@ 2009-01-05 10:59 ` Jason Rumney
  1 sibling, 0 replies; 25+ messages in thread
From: Jason Rumney @ 2009-01-05 10:59 UTC (permalink / raw
  To: Juanma Barranquero, 870; +Cc: Emacs Devel

Juanma Barranquero wrote:
>    emacs -Q --eval "(desktop-save-mode 1)" ChangeLog.870
>   

I can also reproduce the bug with C-x RET r utf-8-dos after visiting the 
file normally.

It appears that there is a bug in all the decode_coding_* functions when 
a CR lies on a CHARBUF_SIZE (0x4000) boundary with a matching LF on the 
other side of the boundary.

They all do something like:

      if (eol_crlf && c1 == '\r')
        ONE_MORE_BYTE (byte_after_cr);

but ONE_MORE_BYTE will abort the decode if it reaches the end of the 
buffer, leaving the CR in limbo between having been read and being added 
to the buffer. Then on decoding the subsequent block, the initial LF 
does not trip the normal CRLF decoding, so it is put into the buffer.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bug#870: Repeatable instance of bug#870
  2009-01-05  5:03 Juanma Barranquero
@ 2009-01-05 10:59 ` Jason Rumney
  2009-01-05 11:12   ` Juanma Barranquero
                     ` (3 more replies)
  2009-01-05 10:59 ` Jason Rumney
  1 sibling, 4 replies; 25+ messages in thread
From: Jason Rumney @ 2009-01-05 10:59 UTC (permalink / raw
  To: Juanma Barranquero, 870; +Cc: Emacs Devel

Juanma Barranquero wrote:
>    emacs -Q --eval "(desktop-save-mode 1)" ChangeLog.870
>   

I can also reproduce the bug with C-x RET r utf-8-dos after visiting the 
file normally.

It appears that there is a bug in all the decode_coding_* functions when 
a CR lies on a CHARBUF_SIZE (0x4000) boundary with a matching LF on the 
other side of the boundary.

They all do something like:

      if (eol_crlf && c1 == '\r')
        ONE_MORE_BYTE (byte_after_cr);

but ONE_MORE_BYTE will abort the decode if it reaches the end of the 
buffer, leaving the CR in limbo between having been read and being added 
to the buffer. Then on decoding the subsequent block, the initial LF 
does not trip the normal CRLF decoding, so it is put into the buffer.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-05 10:59 ` bug#870: " Jason Rumney
  2009-01-05 11:12   ` Juanma Barranquero
@ 2009-01-05 11:12   ` Juanma Barranquero
  2009-01-07  1:07   ` Kenichi Handa
  2009-01-07  1:07   ` Kenichi Handa
  3 siblings, 0 replies; 25+ messages in thread
From: Juanma Barranquero @ 2009-01-05 11:12 UTC (permalink / raw
  To: Jason Rumney; +Cc: 870, Emacs Devel

On Mon, Jan 5, 2009 at 11:59, Jason Rumney <jasonr@gnu.org> wrote:

> It appears that there is a bug in all the decode_coding_* functions when a
> CR lies on a CHARBUF_SIZE (0x4000) boundary with a matching LF on the other
> side of the boundary.
>
> They all do something like:
>
>     if (eol_crlf && c1 == '\r')
>       ONE_MORE_BYTE (byte_after_cr);
>
> but ONE_MORE_BYTE will abort the decode if it reaches the end of the buffer,
> leaving the CR in limbo between having been read and being added to the
> buffer. Then on decoding the subsequent block, the initial LF does not trip
> the normal CRLF decoding, so it is put into the buffer.

Wouldn't that mean that, on writing the buffer, the file would end
with extra CRs, instead of missing LFs?

    Juanma






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bug#870: Repeatable instance of bug#870
  2009-01-05 10:59 ` bug#870: " Jason Rumney
@ 2009-01-05 11:12   ` Juanma Barranquero
  2009-01-05 11:22     ` Jason Rumney
  2009-01-05 11:22     ` Jason Rumney
  2009-01-05 11:12   ` Juanma Barranquero
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 25+ messages in thread
From: Juanma Barranquero @ 2009-01-05 11:12 UTC (permalink / raw
  To: Jason Rumney; +Cc: 870, Emacs Devel

On Mon, Jan 5, 2009 at 11:59, Jason Rumney <jasonr@gnu.org> wrote:

> It appears that there is a bug in all the decode_coding_* functions when a
> CR lies on a CHARBUF_SIZE (0x4000) boundary with a matching LF on the other
> side of the boundary.
>
> They all do something like:
>
>     if (eol_crlf && c1 == '\r')
>       ONE_MORE_BYTE (byte_after_cr);
>
> but ONE_MORE_BYTE will abort the decode if it reaches the end of the buffer,
> leaving the CR in limbo between having been read and being added to the
> buffer. Then on decoding the subsequent block, the initial LF does not trip
> the normal CRLF decoding, so it is put into the buffer.

Wouldn't that mean that, on writing the buffer, the file would end
with extra CRs, instead of missing LFs?

    Juanma




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-05 11:12   ` Juanma Barranquero
@ 2009-01-05 11:22     ` Jason Rumney
  2009-01-05 11:22     ` Jason Rumney
  1 sibling, 0 replies; 25+ messages in thread
From: Jason Rumney @ 2009-01-05 11:22 UTC (permalink / raw
  To: Juanma Barranquero; +Cc: 870, Emacs Devel

Juanma Barranquero wrote:
> On Mon, Jan 5, 2009 at 11:59, Jason Rumney <jasonr@gnu.org> wrote:
>
>   
>> It appears that there is a bug in all the decode_coding_* functions when a
>> CR lies on a CHARBUF_SIZE (0x4000) boundary with a matching LF on the other
>> side of the boundary.
>>
>> They all do something like:
>>
>>     if (eol_crlf && c1 == '\r')
>>       ONE_MORE_BYTE (byte_after_cr);
>>
>> but ONE_MORE_BYTE will abort the decode if it reaches the end of the buffer,
>> leaving the CR in limbo between having been read and being added to the
>> buffer. Then on decoding the subsequent block, the initial LF does not trip
>> the normal CRLF decoding, so it is put into the buffer.
>>     
>
> Wouldn't that mean that, on writing the buffer, the file would end
> with extra CRs, instead of missing LFs?
>   
The CRs are effectively stripped on reading, since they end up in limbo 
between being read and being added to the decoding buffer. I haven't 
tried writing the file, but I think (from memory and from the way the 
code looks to me) the problem is a missing CR, not a missing LF.







^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bug#870: Repeatable instance of bug#870
  2009-01-05 11:12   ` Juanma Barranquero
  2009-01-05 11:22     ` Jason Rumney
@ 2009-01-05 11:22     ` Jason Rumney
  2009-01-05 11:31       ` Juanma Barranquero
  2009-01-05 11:31       ` Juanma Barranquero
  1 sibling, 2 replies; 25+ messages in thread
From: Jason Rumney @ 2009-01-05 11:22 UTC (permalink / raw
  To: Juanma Barranquero; +Cc: 870, Emacs Devel

Juanma Barranquero wrote:
> On Mon, Jan 5, 2009 at 11:59, Jason Rumney <jasonr@gnu.org> wrote:
>
>   
>> It appears that there is a bug in all the decode_coding_* functions when a
>> CR lies on a CHARBUF_SIZE (0x4000) boundary with a matching LF on the other
>> side of the boundary.
>>
>> They all do something like:
>>
>>     if (eol_crlf && c1 == '\r')
>>       ONE_MORE_BYTE (byte_after_cr);
>>
>> but ONE_MORE_BYTE will abort the decode if it reaches the end of the buffer,
>> leaving the CR in limbo between having been read and being added to the
>> buffer. Then on decoding the subsequent block, the initial LF does not trip
>> the normal CRLF decoding, so it is put into the buffer.
>>     
>
> Wouldn't that mean that, on writing the buffer, the file would end
> with extra CRs, instead of missing LFs?
>   
The CRs are effectively stripped on reading, since they end up in limbo 
between being read and being added to the decoding buffer. I haven't 
tried writing the file, but I think (from memory and from the way the 
code looks to me) the problem is a missing CR, not a missing LF.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-05 11:22     ` Jason Rumney
@ 2009-01-05 11:31       ` Juanma Barranquero
  2009-01-05 11:31       ` Juanma Barranquero
  1 sibling, 0 replies; 25+ messages in thread
From: Juanma Barranquero @ 2009-01-05 11:31 UTC (permalink / raw
  To: Jason Rumney; +Cc: 870, Emacs Devel

On Mon, Jan 5, 2009 at 12:22, Jason Rumney <jasonr@gnu.org> wrote:

> The CRs are effectively stripped on reading, since they end up in limbo
> between being read and being added to the decoding buffer. I haven't tried
> writing the file, but I think (from memory and from the way the code looks
> to me) the problem is a missing CR, not a missing LF.

That's not what I see.

ChangeLog.870 initially contains:

0000 7ff0 20 74 69 6d 65 2d 73 74  61 6d 70 2e 65 6c 3a 0d   time-stamp.el:.
0000 8000 0a 09 2a 20 74 69 6d 65  2e 65 6c 3a 0d 0a 09 2a  ..* time.el:...*

After rereading the file, in Emacs it shows as:

	* time-stamp.el:^M	* time.el:

which I interpret as if, while reading, the ^M was read without ^L and
so taken literally, while the ^L was missing.

Then, if I write it back, the file on disk contains

0000 7ff0 20 74 69 6d 65 2d 73 74  61 6d 70 2e 65 6c 3a 0d   time-stamp.el:.
0000 8000 09 2a 20 74 69 6d 65 2e  65 6c 3a 0d 0a 09 2a 20  .* time.el:...*

so a LF has gone missing.

    Juanma






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bug#870: Repeatable instance of bug#870
  2009-01-05 11:22     ` Jason Rumney
  2009-01-05 11:31       ` Juanma Barranquero
@ 2009-01-05 11:31       ` Juanma Barranquero
  2009-01-05 13:50         ` Jason Rumney
  2009-01-05 13:50         ` Jason Rumney
  1 sibling, 2 replies; 25+ messages in thread
From: Juanma Barranquero @ 2009-01-05 11:31 UTC (permalink / raw
  To: Jason Rumney; +Cc: 870, Emacs Devel

On Mon, Jan 5, 2009 at 12:22, Jason Rumney <jasonr@gnu.org> wrote:

> The CRs are effectively stripped on reading, since they end up in limbo
> between being read and being added to the decoding buffer. I haven't tried
> writing the file, but I think (from memory and from the way the code looks
> to me) the problem is a missing CR, not a missing LF.

That's not what I see.

ChangeLog.870 initially contains:

0000 7ff0 20 74 69 6d 65 2d 73 74  61 6d 70 2e 65 6c 3a 0d   time-stamp.el:.
0000 8000 0a 09 2a 20 74 69 6d 65  2e 65 6c 3a 0d 0a 09 2a  ..* time.el:...*

After rereading the file, in Emacs it shows as:

	* time-stamp.el:^M	* time.el:

which I interpret as if, while reading, the ^M was read without ^L and
so taken literally, while the ^L was missing.

Then, if I write it back, the file on disk contains

0000 7ff0 20 74 69 6d 65 2d 73 74  61 6d 70 2e 65 6c 3a 0d   time-stamp.el:.
0000 8000 09 2a 20 74 69 6d 65 2e  65 6c 3a 0d 0a 09 2a 20  .* time.el:...*

so a LF has gone missing.

    Juanma




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-05 11:31       ` Juanma Barranquero
  2009-01-05 13:50         ` Jason Rumney
@ 2009-01-05 13:50         ` Jason Rumney
  1 sibling, 0 replies; 25+ messages in thread
From: Jason Rumney @ 2009-01-05 13:50 UTC (permalink / raw
  To: Juanma Barranquero; +Cc: 870, Emacs Devel

Juanma Barranquero wrote:
> After rereading the file, in Emacs it shows as:
>
> 	* time-stamp.el:^M	* time.el:
>
> which I interpret as if, while reading, the ^M was read without ^L and
> so taken literally, while the ^L was missing.
>
> Then, if I write it back, the file on disk contains
>
> 0000 7ff0 20 74 69 6d 65 2d 73 74  61 6d 70 2e 65 6c 3a 0d   time-stamp.el:.
> 0000 8000 09 2a 20 74 69 6d 65 2e  65 6c 3a 0d 0a 09 2a 20  .* time.el:...*
>
> so a LF has gone missing.
>   

Yes, you're right it is a LF (^J) that has gone missing - I was 
confused. So maybe I am wrong about exactly what happens in that part of 
the decode functions - maybe the CR does get written to the buffer, but 
the following LF is somehow swallowed.







^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bug#870: Repeatable instance of bug#870
  2009-01-05 11:31       ` Juanma Barranquero
@ 2009-01-05 13:50         ` Jason Rumney
  2009-01-05 14:28           ` Juanma Barranquero
  2009-01-05 14:28           ` Juanma Barranquero
  2009-01-05 13:50         ` Jason Rumney
  1 sibling, 2 replies; 25+ messages in thread
From: Jason Rumney @ 2009-01-05 13:50 UTC (permalink / raw
  To: Juanma Barranquero; +Cc: 870, Emacs Devel

Juanma Barranquero wrote:
> After rereading the file, in Emacs it shows as:
>
> 	* time-stamp.el:^M	* time.el:
>
> which I interpret as if, while reading, the ^M was read without ^L and
> so taken literally, while the ^L was missing.
>
> Then, if I write it back, the file on disk contains
>
> 0000 7ff0 20 74 69 6d 65 2d 73 74  61 6d 70 2e 65 6c 3a 0d   time-stamp.el:.
> 0000 8000 09 2a 20 74 69 6d 65 2e  65 6c 3a 0d 0a 09 2a 20  .* time.el:...*
>
> so a LF has gone missing.
>   

Yes, you're right it is a LF (^J) that has gone missing - I was 
confused. So maybe I am wrong about exactly what happens in that part of 
the decode functions - maybe the CR does get written to the buffer, but 
the following LF is somehow swallowed.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-05 13:50         ` Jason Rumney
@ 2009-01-05 14:28           ` Juanma Barranquero
  2009-01-05 14:28           ` Juanma Barranquero
  1 sibling, 0 replies; 25+ messages in thread
From: Juanma Barranquero @ 2009-01-05 14:28 UTC (permalink / raw
  To: Jason Rumney; +Cc: 870, Emacs Devel

On Mon, Jan 5, 2009 at 14:50, Jason Rumney <jasonr@gnu.org> wrote:

> So
> maybe I am wrong about exactly what happens in that part of the decode
> functions - maybe the CR does get written to the buffer, but the following
> LF is somehow swallowed.

The bug does not happen on encoding (for writing), because it is
already visible after re-decoding (I mean, after desktop.el applies
buffer-file-coding-system, or after the
revert-buffer-with-coding-system call in your example). Once the
buffer has the lone ^M, it's no wonder it ends up in the file after
writing.

I think you're right that the problem is related to decoding a CRLF
when the pair crosses a buffer boundary.

    Juanma

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bug#870: Repeatable instance of bug#870
  2009-01-05 13:50         ` Jason Rumney
  2009-01-05 14:28           ` Juanma Barranquero
@ 2009-01-05 14:28           ` Juanma Barranquero
  1 sibling, 0 replies; 25+ messages in thread
From: Juanma Barranquero @ 2009-01-05 14:28 UTC (permalink / raw
  To: Jason Rumney; +Cc: 870, Emacs Devel

On Mon, Jan 5, 2009 at 14:50, Jason Rumney <jasonr@gnu.org> wrote:

> So
> maybe I am wrong about exactly what happens in that part of the decode
> functions - maybe the CR does get written to the buffer, but the following
> LF is somehow swallowed.

The bug does not happen on encoding (for writing), because it is
already visible after re-decoding (I mean, after desktop.el applies
buffer-file-coding-system, or after the
revert-buffer-with-coding-system call in your example). Once the
buffer has the lone ^M, it's no wonder it ends up in the file after
writing.

I think you're right that the problem is related to decoding a CRLF
when the pair crosses a buffer boundary.

    Juanma

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-05 10:59 ` bug#870: " Jason Rumney
                     ` (2 preceding siblings ...)
  2009-01-07  1:07   ` Kenichi Handa
@ 2009-01-07  1:07   ` Kenichi Handa
  3 siblings, 0 replies; 25+ messages in thread
From: Kenichi Handa @ 2009-01-07  1:07 UTC (permalink / raw
  To: Jason Rumney; +Cc: lekktu, 870, emacs-devel

In article <4961E7F7.2000509@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:

> Juanma Barranquero wrote:
> >    emacs -Q --eval "(desktop-save-mode 1)" ChangeLog.870
> >   

> I can also reproduce the bug with C-x RET r utf-8-dos after visiting the 
> file normally.

I can reproduce it by that recipe.

> It appears that there is a bug in all the decode_coding_* functions when 
> a CR lies on a CHARBUF_SIZE (0x4000) boundary with a matching LF on the 
> other side of the boundary.

> They all do something like:

>       if (eol_crlf && c1 == '\r')
>         ONE_MORE_BYTE (byte_after_cr);

> but ONE_MORE_BYTE will abort the decode if it reaches the end of the 
> buffer, leaving the CR in limbo between having been read and being added 
> to the buffer. Then on decoding the subsequent block, the initial LF 
> does not trip the normal CRLF decoding, so it is put into the buffer.

??? decode_coding_* gets bytes from coding->source and
produces characters in CHARBUF.  So, I think the above
analysis is not correct.

As normal visiting of ChangeLog.870 doesn't have the problem
but revisiting it causes the problem, I think the bug is in
Finsert_file_contents; perhaps in the handling of REPLACE.
I'll have a look at it.

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bug#870: Repeatable instance of bug#870
  2009-01-05 10:59 ` bug#870: " Jason Rumney
  2009-01-05 11:12   ` Juanma Barranquero
  2009-01-05 11:12   ` Juanma Barranquero
@ 2009-01-07  1:07   ` Kenichi Handa
  2009-01-07  6:53     ` Kenichi Handa
  2009-01-07  6:53     ` Kenichi Handa
  2009-01-07  1:07   ` Kenichi Handa
  3 siblings, 2 replies; 25+ messages in thread
From: Kenichi Handa @ 2009-01-07  1:07 UTC (permalink / raw
  To: Jason Rumney; +Cc: lekktu, 870, emacs-devel

In article <4961E7F7.2000509@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:

> Juanma Barranquero wrote:
> >    emacs -Q --eval "(desktop-save-mode 1)" ChangeLog.870
> >   

> I can also reproduce the bug with C-x RET r utf-8-dos after visiting the 
> file normally.

I can reproduce it by that recipe.

> It appears that there is a bug in all the decode_coding_* functions when 
> a CR lies on a CHARBUF_SIZE (0x4000) boundary with a matching LF on the 
> other side of the boundary.

> They all do something like:

>       if (eol_crlf && c1 == '\r')
>         ONE_MORE_BYTE (byte_after_cr);

> but ONE_MORE_BYTE will abort the decode if it reaches the end of the 
> buffer, leaving the CR in limbo between having been read and being added 
> to the buffer. Then on decoding the subsequent block, the initial LF 
> does not trip the normal CRLF decoding, so it is put into the buffer.

??? decode_coding_* gets bytes from coding->source and
produces characters in CHARBUF.  So, I think the above
analysis is not correct.

As normal visiting of ChangeLog.870 doesn't have the problem
but revisiting it causes the problem, I think the bug is in
Finsert_file_contents; perhaps in the handling of REPLACE.
I'll have a look at it.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-07  1:07   ` Kenichi Handa
  2009-01-07  6:53     ` Kenichi Handa
@ 2009-01-07  6:53     ` Kenichi Handa
  1 sibling, 0 replies; 25+ messages in thread
From: Kenichi Handa @ 2009-01-07  6:53 UTC (permalink / raw
  To: Kenichi Handa; +Cc: lekktu, emacs-devel, 870

In article <E1LKMsw-0005wG-G6@etlken.m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> > It appears that there is a bug in all the decode_coding_* functions when 
> > a CR lies on a CHARBUF_SIZE (0x4000) boundary with a matching LF on the 
> > other side of the boundary.

> > They all do something like:

> >       if (eol_crlf && c1 == '\r')
> >         ONE_MORE_BYTE (byte_after_cr);

> > but ONE_MORE_BYTE will abort the decode if it reaches the end of the 
> > buffer, leaving the CR in limbo between having been read and being added 
> > to the buffer. Then on decoding the subsequent block, the initial LF 
> > does not trip the normal CRLF decoding, so it is put into the buffer.

> ??? decode_coding_* gets bytes from coding->source and
> produces characters in CHARBUF.  So, I think the above
> analysis is not correct.

> As normal visiting of ChangeLog.870 doesn't have the problem
> but revisiting it causes the problem, I think the bug is in
> Finsert_file_contents; perhaps in the handling of REPLACE.
> I'll have a look at it.

I fixed the bug.  Actually what wrong was decode_coding_*
but in the different place as above.

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bug#870: Repeatable instance of bug#870
  2009-01-07  1:07   ` Kenichi Handa
@ 2009-01-07  6:53     ` Kenichi Handa
  2009-01-07  8:19       ` martin rudalics
                         ` (2 more replies)
  2009-01-07  6:53     ` Kenichi Handa
  1 sibling, 3 replies; 25+ messages in thread
From: Kenichi Handa @ 2009-01-07  6:53 UTC (permalink / raw
  To: Kenichi Handa; +Cc: lekktu, emacs-devel, 870, jasonr

In article <E1LKMsw-0005wG-G6@etlken.m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> > It appears that there is a bug in all the decode_coding_* functions when 
> > a CR lies on a CHARBUF_SIZE (0x4000) boundary with a matching LF on the 
> > other side of the boundary.

> > They all do something like:

> >       if (eol_crlf && c1 == '\r')
> >         ONE_MORE_BYTE (byte_after_cr);

> > but ONE_MORE_BYTE will abort the decode if it reaches the end of the 
> > buffer, leaving the CR in limbo between having been read and being added 
> > to the buffer. Then on decoding the subsequent block, the initial LF 
> > does not trip the normal CRLF decoding, so it is put into the buffer.

> ??? decode_coding_* gets bytes from coding->source and
> produces characters in CHARBUF.  So, I think the above
> analysis is not correct.

> As normal visiting of ChangeLog.870 doesn't have the problem
> but revisiting it causes the problem, I think the bug is in
> Finsert_file_contents; perhaps in the handling of REPLACE.
> I'll have a look at it.

I fixed the bug.  Actually what wrong was decode_coding_*
but in the different place as above.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-07  6:53     ` Kenichi Handa
@ 2009-01-07  8:19       ` martin rudalics
  2009-01-07 12:29         ` Kenichi Handa
  2009-01-07  9:43       ` Juanma Barranquero
  2009-01-07  9:43       ` Juanma Barranquero
  2 siblings, 1 reply; 25+ messages in thread
From: martin rudalics @ 2009-01-07  8:19 UTC (permalink / raw
  To: Kenichi Handa; +Cc: 870

 >> As normal visiting of ChangeLog.870 doesn't have the problem
 >> but revisiting it causes the problem, I think the bug is in
 >> Finsert_file_contents; perhaps in the handling of REPLACE.
 >> I'll have a look at it.
 >
 > I fixed the bug.  Actually what wrong was decode_coding_*
 > but in the different place as above.

Handa-san, while you're there could you please also have a look at
bug#1039?  Maybe it's related to the present issue.

Thank you, martin.






^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-07  6:53     ` Kenichi Handa
  2009-01-07  8:19       ` martin rudalics
@ 2009-01-07  9:43       ` Juanma Barranquero
  2009-01-07  9:43       ` Juanma Barranquero
  2 siblings, 0 replies; 25+ messages in thread
From: Juanma Barranquero @ 2009-01-07  9:43 UTC (permalink / raw
  To: Kenichi Handa; +Cc: emacs-devel, 870

On Wed, Jan 7, 2009 at 07:53, Kenichi Handa <handa@m17n.org> wrote:

> I fixed the bug.

Thanks! (I've been suffering this #$@!&* for the past eight months or so.)

I've added the "(Bug#870)" ref to your ChangeLog entry.

    Juanma






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: bug#870: Repeatable instance of bug#870
  2009-01-07  6:53     ` Kenichi Handa
  2009-01-07  8:19       ` martin rudalics
  2009-01-07  9:43       ` Juanma Barranquero
@ 2009-01-07  9:43       ` Juanma Barranquero
       [not found]         ` <496489D2.8030902@gnu.org>
  2 siblings, 1 reply; 25+ messages in thread
From: Juanma Barranquero @ 2009-01-07  9:43 UTC (permalink / raw
  To: Kenichi Handa; +Cc: emacs-devel, 870, jasonr

On Wed, Jan 7, 2009 at 07:53, Kenichi Handa <handa@m17n.org> wrote:

> I fixed the bug.

Thanks! (I've been suffering this #$@!&* for the past eight months or so.)

I've added the "(Bug#870)" ref to your ChangeLog entry.

    Juanma




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
       [not found]           ` <f7ccd24b0901070301t221f906atf75f8632dcf1c41@mail.gmail.com>
@ 2009-01-07 11:10             ` Jason Rumney
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Rumney @ 2009-01-07 11:10 UTC (permalink / raw
  To: Juanma Barranquero; +Cc: 870, Kenichi Handa

Juanma Barranquero wrote:
> However, for the past few days (since 2008/01/04 or so)
> NNN-done@emacsbugs messages seem to be ignored, though messages to
> control@emacsbugs do work.
>   

I haven't noticed that - it seems to have worked in this case. Maybe 
something has been changed to automatically reopen reports when a 
subsequent mail is received on the original report address. If so, I 
think it is a degradation, as often such messages are background chit 
chat about other side issues, if someone really reports that the fix 
does not work, then the bug can be reopened through the control address.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-07  8:19       ` martin rudalics
@ 2009-01-07 12:29         ` Kenichi Handa
  2009-01-07 15:33           ` martin rudalics
  0 siblings, 1 reply; 25+ messages in thread
From: Kenichi Handa @ 2009-01-07 12:29 UTC (permalink / raw
  To: martin rudalics; +Cc: 870

In article <4964657F.5010205@gmx.at>, martin rudalics <rudalics@gmx.at> writes:

>>> As normal visiting of ChangeLog.870 doesn't have the problem
>>> but revisiting it causes the problem, I think the bug is in
>>> Finsert_file_contents; perhaps in the handling of REPLACE.
>>> I'll have a look at it.
> 
> I fixed the bug.  Actually what wrong was decode_coding_*
> but in the different place as above.

> Handa-san, while you're there could you please also have a look at
> bug#1039?  Maybe it's related to the present issue.

I installed a fix.  It was a different issue.

2009-01-07  Kenichi Handa  <handa@m17n.org>

	* fileio.c (Finsert_file_contents): In the case of replace,
	remeber the coding system used for decoding in
	coding_system (Bug#1039).

---
Kenichi Handa
handa@m17n.org








^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-07 12:29         ` Kenichi Handa
@ 2009-01-07 15:33           ` martin rudalics
  2009-01-13  2:30             ` Kenichi Handa
  0 siblings, 1 reply; 25+ messages in thread
From: martin rudalics @ 2009-01-07 15:33 UTC (permalink / raw
  To: Kenichi Handa; +Cc: 870

 > I installed a fix.  It was a different issue.
 >
 > 2009-01-07  Kenichi Handa  <handa@m17n.org>
 >
 > 	* fileio.c (Finsert_file_contents): In the case of replace,
 > 	remeber the coding system used for decoding in
 > 	coding_system (Bug#1039).

Thanks for taking care of this.  Your fix solves the problem for me
though I'm not sure whether it fixes the issue raised by Peter:

 > That patch fixes the bug I reported, but it creates a new one: if you
 > change the EOL convention outside of emacs, revert-buffer no longer
 > detects this. To reproduce:
 > printf "hello\r\nworld\r\n" > hello
 > emacs -Q hello &
 > printf "hello\rworld\r" > hello
 > M-x revert-buffer
 > # emacs still sees DOS newlines

In particular, when I visit a file, (1) save it with a different line
ending, (2) change the line ending outside this instance of Emacs, and
(3) revert the buffer, its line ending is the one saved in (1) and not
the one from (2).  But IIUC Emacs 22 didn't handle this either.

martin






^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-07 15:33           ` martin rudalics
@ 2009-01-13  2:30             ` Kenichi Handa
  2009-01-13  4:06               ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Kenichi Handa @ 2009-01-13  2:30 UTC (permalink / raw
  To: martin rudalics; +Cc: 870

In article <4964CB64.2090506@gmx.at>, martin rudalics <rudalics@gmx.at> writes:

> Thanks for taking care of this.  Your fix solves the problem for me
> though I'm not sure whether it fixes the issue raised by Peter:

> That patch fixes the bug I reported, but it creates a new one: if you
> change the EOL convention outside of emacs, revert-buffer no longer
> detects this. To reproduce:
> printf "hello\r\nworld\r\n" > hello
> emacs -Q hello &
> printf "hello\rworld\r" > hello
> M-x revert-buffer
> # emacs still sees DOS newlines

As I can't reproduce the above problem, I think the bug is fixed.

> In particular, when I visit a file, (1) save it with a different line
> ending, (2) change the line ending outside this instance of Emacs, and
> (3) revert the buffer, its line ending is the one saved in (1) and not
> the one from (2).  But IIUC Emacs 22 didn't handle this either.

By (1), the variable buffer-file-coding-system-explicit is
set to XXX, and, in such a case, revert-buffer binds
coding-system-for-read to XXX to respect your decision made
by (1).

I'm not sure this behavior is a bug.

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#870: Repeatable instance of bug#870
  2009-01-13  2:30             ` Kenichi Handa
@ 2009-01-13  4:06               ` Eli Zaretskii
  0 siblings, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2009-01-13  4:06 UTC (permalink / raw
  To: Kenichi Handa, 870; +Cc: bug-gnu-emacs

> From: Kenichi Handa <handa@m17n.org>
> Date: Tue, 13 Jan 2009 11:30:16 +0900
> Cc: 870@emacsbugs.donarmstrong.com
> 
> > In particular, when I visit a file, (1) save it with a different line
> > ending, (2) change the line ending outside this instance of Emacs, and
> > (3) revert the buffer, its line ending is the one saved in (1) and not
> > the one from (2).  But IIUC Emacs 22 didn't handle this either.
> 
> By (1), the variable buffer-file-coding-system-explicit is
> set to XXX, and, in such a case, revert-buffer binds
> coding-system-for-read to XXX to respect your decision made
> by (1).
> 
> I'm not sure this behavior is a bug.

It isn't.






^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2009-01-13  4:06 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-05  5:03 bug#870: Repeatable instance of bug#870 Juanma Barranquero
  -- strict thread matches above, loose matches on Subject: below --
2009-01-05  5:03 Juanma Barranquero
2009-01-05 10:59 ` bug#870: " Jason Rumney
2009-01-05 11:12   ` Juanma Barranquero
2009-01-05 11:22     ` Jason Rumney
2009-01-05 11:22     ` Jason Rumney
2009-01-05 11:31       ` Juanma Barranquero
2009-01-05 11:31       ` Juanma Barranquero
2009-01-05 13:50         ` Jason Rumney
2009-01-05 14:28           ` Juanma Barranquero
2009-01-05 14:28           ` Juanma Barranquero
2009-01-05 13:50         ` Jason Rumney
2009-01-05 11:12   ` Juanma Barranquero
2009-01-07  1:07   ` Kenichi Handa
2009-01-07  6:53     ` Kenichi Handa
2009-01-07  8:19       ` martin rudalics
2009-01-07 12:29         ` Kenichi Handa
2009-01-07 15:33           ` martin rudalics
2009-01-13  2:30             ` Kenichi Handa
2009-01-13  4:06               ` Eli Zaretskii
2009-01-07  9:43       ` Juanma Barranquero
2009-01-07  9:43       ` Juanma Barranquero
     [not found]         ` <496489D2.8030902@gnu.org>
     [not found]           ` <f7ccd24b0901070301t221f906atf75f8632dcf1c41@mail.gmail.com>
2009-01-07 11:10             ` Jason Rumney
2009-01-07  6:53     ` Kenichi Handa
2009-01-07  1:07   ` Kenichi Handa
2009-01-05 10:59 ` Jason Rumney

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.