all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#23701: Decoding broken by sequence ESC comma
@ 2016-06-05 19:13 Taylan Ulrich Bayırlı/Kammer
  2016-06-05 19:59 ` Andreas Schwab
  0 siblings, 1 reply; 7+ messages in thread
From: Taylan Ulrich Bayırlı/Kammer @ 2016-06-05 19:13 UTC (permalink / raw)
  To: 23701

This bug can be reproduced at least on 24.5 and the current 25 pretest
at the time of this writing.

The occurrence of the sequence of the bytes 1B 2C (ASCII ESC and comma)
messes up Emacs's decoding of an ASCII file from that point on.

This doesn't happen in any other text-displaying application I tested,
including a terminal emulator (given it's an escape sequence and all).

Concrete steps to reproduce:

  $ printf 'foo\033,bar' > tmp.txt
  $ emacs -q tmp.txt

Expected result:

  A buffer displaying "foo^[,bar".

Actual result:

  A buffer displaying "fooáò".

Taylan





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#23701: Decoding broken by sequence ESC comma
  2016-06-05 19:13 bug#23701: Decoding broken by sequence ESC comma Taylan Ulrich Bayırlı/Kammer
@ 2016-06-05 19:59 ` Andreas Schwab
  2016-06-05 22:35   ` Taylan Ulrich Bayırlı/Kammer
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2016-06-05 19:59 UTC (permalink / raw)
  To: Taylan Ulrich "Bayırlı/Kammer"; +Cc: 23701

taylanbayirli@gmail.com (Taylan Ulrich "Bayırlı/Kammer") writes:

> The occurrence of the sequence of the bytes 1B 2C (ASCII ESC and comma)
> messes up Emacs's decoding of an ASCII file from that point on.

This is one of the ISO 2022 escape sequences.

> This doesn't happen in any other text-displaying application I tested,
> including a terminal emulator (given it's an escape sequence and all).

None of them know about ISO 2022, apparently.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#23701: Decoding broken by sequence ESC comma
  2016-06-05 19:59 ` Andreas Schwab
@ 2016-06-05 22:35   ` Taylan Ulrich Bayırlı/Kammer
  2016-06-06  2:33     ` Eli Zaretskii
  2016-06-06  7:27     ` Andreas Schwab
  0 siblings, 2 replies; 7+ messages in thread
From: Taylan Ulrich Bayırlı/Kammer @ 2016-06-05 22:35 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 23701

Andreas Schwab <schwab@linux-m68k.org> writes:

> taylanbayirli@gmail.com (Taylan Ulrich "Bayırlı/Kammer") writes:
>
>> The occurrence of the sequence of the bytes 1B 2C (ASCII ESC and comma)
>> messes up Emacs's decoding of an ASCII file from that point on.
>
> This is one of the ISO 2022 escape sequences.
>
>> This doesn't happen in any other text-displaying application I tested,
>> including a terminal emulator (given it's an escape sequence and all).
>
> None of them know about ISO 2022, apparently.
>
> Andreas.

Hmm, OK.  I figure it's an obscure use-case, but perhaps so is its
accidental(?) occurrence in a text file.

On the meanwhile I found out C-x RET r us-ascii RET fixes my issue.

The file in which I encountered this (mailing list archives of R6RS)
actually contains the sequences escape, comma, capital-a, and that in
places where these seem intentionally positioned, such as between
sentences.  I wonder what this is about.  Whatever it means, if this is
more common than uses of that ISO 2022 sequence, that would be a problem
I suppose.  Here's the relevant snippet from the file, with literal ESC
characters changed to ^[:

>  | On Fri, Sep 11, 2009 at 10:46 PM, Aubrey Jaffer<agj at alum.mit.edu> wrote:
>  | > ^[,A | Date: Wed, 9 Sep 2009 00:30:18 -0400
>  | > ^[,A | From: Lynn Winebarger <owinebar at gmail.com>
>  | > ^[,A |
>  | > ^[,A | ...
>  | > ^[,A | The advent of hygeinic macros marked the end of the era in which
>  | > ^[,A | symbols could be equated with identifiers. ^[,A Identifiers have a lot
>  | > ^[,A | more information in them.
>  | >
>  | > The SLIB implementations of syntactic-closures, syntax-case,

I just grepped all the files and the archives seem to contain a few more
files in which the ESC , sequence appears, such as:

    G^[,Avdel vs Godel vs Goedel

    ^[,Hylem vs ^[,Hylen vs the same with proper vowel symbols

    ... I know that there is a single bit sequence that specifies
    strings, and it's not ^[,A+;^[(Bs; I know that there's another
    single sequence that specifies ellipsis, and it's not ^[$,1s&^[(B
    ...

These aren't ISO-8859-1 either.  I don't know what encoding they're
supposed to be in.  Could also be a mail server breaking things.

All in all, I'm just throwing this out there; I have no idea how
commonly used ISO 2022 is, but handling it by default certainly breaks
some files that contain ESC , either by accident or with some other
purpose.  Maybe it should not be handled by default.

Taylan





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#23701: Decoding broken by sequence ESC comma
  2016-06-05 22:35   ` Taylan Ulrich Bayırlı/Kammer
@ 2016-06-06  2:33     ` Eli Zaretskii
  2016-06-06 13:17       ` Taylan Ulrich Bayırlı/Kammer
  2016-06-06  7:27     ` Andreas Schwab
  1 sibling, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2016-06-06  2:33 UTC (permalink / raw)
  To: Taylan Ulrich Bayırlı/Kammer; +Cc: schwab, 23701

> From: taylanbayirli@gmail.com (Taylan Ulrich
> 	Bayırlı/Kammer)
> Date: Mon, 06 Jun 2016 01:35:26 +0300
> Cc: 23701@debbugs.gnu.org
> 
> All in all, I'm just throwing this out there; I have no idea how
> commonly used ISO 2022 is, but handling it by default certainly breaks
> some files that contain ESC , either by accident or with some other
> purpose.  Maybe it should not be handled by default.

ISO 2022 and its derivatives are widely used in Far Eastern regions,
so I don't see how can we stop supporting it.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#23701: Decoding broken by sequence ESC comma
  2016-06-05 22:35   ` Taylan Ulrich Bayırlı/Kammer
  2016-06-06  2:33     ` Eli Zaretskii
@ 2016-06-06  7:27     ` Andreas Schwab
  1 sibling, 0 replies; 7+ messages in thread
From: Andreas Schwab @ 2016-06-06  7:27 UTC (permalink / raw)
  To: Taylan Ulrich "Bayırlı/Kammer"; +Cc: 23701

taylanbayirli@gmail.com (Taylan Ulrich "Bayırlı/Kammer") writes:

> I just grepped all the files and the archives seem to contain a few more
> files in which the ESC , sequence appears, such as:
>
>     G^[,Avdel vs Godel vs Goedel
>
>     ^[,Hylem vs ^[,Hylen vs the same with proper vowel symbols
>
>     ... I know that there is a single bit sequence that specifies
>     strings, and it's not ^[,A+;^[(Bs; I know that there's another
>     single sequence that specifies ellipsis, and it's not ^[$,1s&^[(B
>     ...
>
> These aren't ISO-8859-1 either.  I don't know what encoding they're
> supposed to be in.  Could also be a mail server breaking things.

That looks like iso-2022-7bit, except for missing shift-out sequences.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#23701: Decoding broken by sequence ESC comma
  2016-06-06  2:33     ` Eli Zaretskii
@ 2016-06-06 13:17       ` Taylan Ulrich Bayırlı/Kammer
  2016-06-06 15:07         ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Taylan Ulrich Bayırlı/Kammer @ 2016-06-06 13:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, 23701

Eli Zaretskii <eliz@gnu.org> writes:

>> From: taylanbayirli@gmail.com (Taylan Ulrich
>> 	Bayırlı/Kammer)
>> Date: Mon, 06 Jun 2016 01:35:26 +0300
>> Cc: 23701@debbugs.gnu.org
>> 
>> All in all, I'm just throwing this out there; I have no idea how
>> commonly used ISO 2022 is, but handling it by default certainly breaks
>> some files that contain ESC , either by accident or with some other
>> purpose.  Maybe it should not be handled by default.
>
> ISO 2022 and its derivatives are widely used in Far Eastern regions,
> so I don't see how can we stop supporting it.

Fair enough.  I would think this bug report can be closed then.

Taylan





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#23701: Decoding broken by sequence ESC comma
  2016-06-06 13:17       ` Taylan Ulrich Bayırlı/Kammer
@ 2016-06-06 15:07         ` Eli Zaretskii
  0 siblings, 0 replies; 7+ messages in thread
From: Eli Zaretskii @ 2016-06-06 15:07 UTC (permalink / raw)
  To: Taylan Ulrich Bayırlı/Kammer; +Cc: schwab, 23701-done

> From: taylanbayirli@gmail.com (Taylan Ulrich Bayırlı/Kammer)
> Cc: schwab@linux-m68k.org,  23701@debbugs.gnu.org
> Date: Mon, 06 Jun 2016 16:17:42 +0300
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > ISO 2022 and its derivatives are widely used in Far Eastern regions,
> > so I don't see how can we stop supporting it.
> 
> Fair enough.  I would think this bug report can be closed then.

Thanks, done.

For the record, the way to prevent unwanted decoding in these cases is
one of:

  . Add a 'coding' cookie to the file
  . Precede "C-x C-f" with "C-x RET c us-ascii RET"
  . "C-x RET r us-ascii RET", like you did.






^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-06-06 15:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-05 19:13 bug#23701: Decoding broken by sequence ESC comma Taylan Ulrich Bayırlı/Kammer
2016-06-05 19:59 ` Andreas Schwab
2016-06-05 22:35   ` Taylan Ulrich Bayırlı/Kammer
2016-06-06  2:33     ` Eli Zaretskii
2016-06-06 13:17       ` Taylan Ulrich Bayırlı/Kammer
2016-06-06 15:07         ` Eli Zaretskii
2016-06-06  7:27     ` Andreas Schwab

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.