all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: handa@gnu.org (K. Handa)
Cc: dmantipov@yandex.ru, maden.ldm@gmail.com, 18610@debbugs.gnu.org
Subject: bug#18610: 24.4.50; Specific file causing emacs to segfault upon opening
Date: Sun, 05 Oct 2014 19:09:26 +0300	[thread overview]
Message-ID: <83zjdamrbd.fsf@gnu.org> (raw)
In-Reply-To: <871tqmevsu.fsf@gnu.org>

> From: handa@gnu.org (K. Handa)
> Date: Sun, 05 Oct 2014 17:59:45 +0900
> Cc: dmantipov@yandex.ru, maden.ldm@gmail.com, 18610@debbugs.gnu.org
> 
> > However, detect_coding_iso_2022 returns with the 'found' member of its
> > second argument having zero value, which I interpret as meaning that
> > it didn't really find any ISO-2022 sequences.  So the simple patch
> > below fixes this for me.  Kenichi, is this patch OK?
> 
> No.  Even if there's no special ISO-2022 escape sequence, we
> should not reject iso-2022 as a detected coding system.

Can you explain why?  AFAICT, all the other detectors are required to
set some flag in the 'found' field, so why is ISO-2022 special in this
regard?

> And, even if that detection was incorrect, the decoder
> should not produce an invalid byte sequence in a
> buffer/string which leads to Emacs crash.

No argument here.

> The bug is in detect_coding_iso_2022 which doesn't set
> CATEGORY_MASK_ISO_7_ELSE in coding->rejected in this case.

Btw, it would be nice if these masks could be documented so that their
meaning was clear.  I considered the possibility that the flags are
not set correctly, but couldn't test that hypothesis given my
insufficient knowledge of ISO-2022 details and variants.

> I've just installed a fix to trunk.  Could you please try
> the latest version?

It fixes the crash, but I'm not sure the results are what we want.
Emacs 24.3, which also did not crash, would set the
buffer-file-coding-system of the buffer visiting the file to
'undecided', and regarded the \226 characters as 8-bit raw bytes:

   character: \226 (displayed as \226) (codepoint 4194198, #o17777626, #x3fff96)
   ...
   general-category: Cn (Other, Not Assigned)

By contrast, the current trunk sets buffer-file-coding-system to
'latin-1' and thinks this character is a Latin-1 character:

   character: \226 (displayed as \226) (codepoint 150, #o226, #x96)
   preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
   ...
   old-name: START OF GUARDED AREA
   general-category: Cc (Other, Control)

That doesn't sound right to me.

If I force some specific coding system, e.g.

   C-x RET c utf-8 RET C-x C-f FILE RET

then the \226 characters are correctly recognized as 8-bit bytes by
the current trunk (as was the case before your changes).

Could it be that the current trunk fails to recognize the 8-bit bytes
in the file?





  reply	other threads:[~2014-10-05 16:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-02 14:50 bug#18610: 24.4.50; Specific file causing emacs to segfault upon opening maden.ldm
2014-10-02 15:40 ` Nicolas Richard
2014-10-03  8:14   ` Eli Zaretskii
2014-10-03 11:22     ` Dmitry Antipov
2014-10-03 12:20       ` Eli Zaretskii
2014-10-03 12:39         ` Eli Zaretskii
2014-10-03 15:16           ` Andreas Schwab
2014-10-03 15:32             ` Eli Zaretskii
2014-10-03 16:02               ` Andreas Schwab
2014-10-03 16:35                 ` Eli Zaretskii
2014-10-03 16:40                   ` Andreas Schwab
2014-10-03 16:57                     ` Eli Zaretskii
2014-10-05  8:59         ` K. Handa
2014-10-05 16:09           ` Eli Zaretskii [this message]
2014-10-06 14:00             ` K. Handa
2014-10-06 15:20               ` Eli Zaretskii
2014-10-07 12:34                 ` K. Handa
2014-10-07 13:20                   ` Ivan Shmakov
2014-10-07 14:33                     ` Eli Zaretskii
2014-10-07 15:10                       ` Ivan Shmakov
2014-10-07 15:19                         ` Eli Zaretskii
2014-10-08  7:33                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83zjdamrbd.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=18610@debbugs.gnu.org \
    --cc=dmantipov@yandex.ru \
    --cc=handa@gnu.org \
    --cc=maden.ldm@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.