From: handa@gnu.org (K. Handa)
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: 16286-done@debbugs.gnu.org
Subject: bug#16286: 24.3.50; insert-file-contents may bring invisible garbage
Date: Tue, 28 Jan 2014 00:01:00 +0900 [thread overview]
Message-ID: <87ppndfp03.fsf@gnu.org> (raw)
In-Reply-To: <52E4588D.70004@cs.ucla.edu> (message from Paul Eggert on Sat, 25 Jan 2014 16:36:29 -0800)
In article <52E4588D.70004@cs.ucla.edu>, Paul Eggert <eggert@cs.ucla.edu> writes:
> I installed a patch as trunk bzr 116158, which (at least for me) fixes
> the reported bug, and am taking the liberty of marking this as done.
> There may well be a better fix, but at least Emacs shouldn't crash or
> report nonsense now.
Thank you for working on this bug which I introduced when I
made decode_coding_gap optimized for ASCII and UTF-8 only
files.
Your change is to set CODING_MODE_LAST_BLOCK in coding->mode
before calling decode_coding_gap so that detect_coding
doesn't detect a file as utf-8 if it has incomplete utf-8
sequence at the tail (as the reported testcase).
But, I think it is better that detect_coding detects such a
file as utf-8 and treats the trailing garbage as raw bytes.
24.3 does it, and that is why decode_coding_gap sets
CODING_MODE_LAST_BLOCK after calling detect_coding.
So, I suggest the attached fix instead of yours. What do
you think?
---
Kenichi Handa
handa@gnu.org
=== modified file 'src/ChangeLog'
--- src/ChangeLog 2014-01-26 12:17:55 +0000
+++ src/ChangeLog 2014-01-27 14:53:58 +0000
@@ -1,3 +1,16 @@
+2014-01-27 K. Handa <handa@gnu.org>
+
+ These change are to fix bug#16286 in the different way than what
+ done by revno:116158.
+
+ * coding.h (struct coding_system): New member detected_utf8_bytes.
+
+ * coding.c (detect_coding_utf_8): Set coding->detected_utf8_bytes.
+ (decode_coding_gap): Use short cut for UTF-8 file reading only
+ when coding->detected_utf8_bytes equals to coding->src_bytes.
+
+ * fileio.c (Finsert_file_contents): Cancel the previous change.
+
2014-01-26 Jan Djärv <jan.h.d@swipnet.se>
* xterm.c (x_focus_changed): Check for non-X terminal-frame (Bug#16540)
=== modified file 'src/coding.c'
--- src/coding.c 2014-01-26 01:20:24 +0000
+++ src/coding.c 2014-01-27 14:47:43 +0000
@@ -1300,6 +1300,7 @@
means that we found a valid non-ASCII characters. */
detect_info->found |= CATEGORY_MASK_UTF_8_AUTO | CATEGORY_MASK_UTF_8_NOSIG;
}
+ coding->detected_utf8_bytes = src_base - coding->source;
coding->detected_utf8_chars = nchars;
return 1;
}
@@ -7890,7 +7891,7 @@
coding->dst_multibyte = ! NILP (BVAR (current_buffer, enable_multibyte_characters));
coding->head_ascii = -1;
- coding->detected_utf8_chars = -1;
+ coding->detected_utf8_bytes = coding->detected_utf8_chars = -1;
coding->eol_seen = EOL_SEEN_NONE;
if (CODING_REQUIRE_DETECTION (coding))
detect_coding (coding);
@@ -7907,7 +7908,8 @@
if (chars != bytes)
{
/* There exists a non-ASCII byte. */
- if (EQ (CODING_ATTR_TYPE (attrs), Qutf_8))
+ if (EQ (CODING_ATTR_TYPE (attrs), Qutf_8)
+ && coding->detected_utf8_bytes == coding->src_bytes)
{
if (coding->detected_utf8_chars >= 0)
chars = coding->detected_utf8_chars;
=== modified file 'src/coding.h'
--- src/coding.h 2014-01-26 01:20:24 +0000
+++ src/coding.h 2014-01-27 14:47:43 +0000
@@ -468,7 +468,9 @@
the eol format. */
ptrdiff_t head_ascii;
- ptrdiff_t detected_utf8_chars;
+ /* How many bytes/chars at the source are detected as valid utf-8
+ sequence. Set by detect_coding_utf_8. */
+ ptrdiff_t detected_utf8_bytes, detected_utf8_chars;
/* Used internally in coding.c. See the comment of detect_ascii. */
int eol_seen;
=== modified file 'src/fileio.c'
--- src/fileio.c 2014-01-26 00:32:30 +0000
+++ src/fileio.c 2014-01-27 14:47:59 +0000
@@ -4298,7 +4298,6 @@
Z_BYTE -= inserted;
ZV -= inserted;
Z -= inserted;
- coding.mode |= CODING_MODE_LAST_BLOCK;
decode_coding_gap (&coding, inserted, inserted);
inserted = coding.produced_char;
coding_system = CODING_ID_NAME (coding.id);
next prev parent reply other threads:[~2014-01-27 15:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-29 14:05 bug#16286: 24.3.50; insert-file-contents may bring invisible garbage Andrey Kotlarski
2014-01-02 16:30 ` Eli Zaretskii
2014-01-04 22:42 ` Andrey Kotlarski
2014-01-26 0:36 ` Paul Eggert
2014-01-27 15:01 ` K. Handa [this message]
2014-01-27 17:01 ` Paul Eggert
2014-01-29 13:40 ` K. Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ppndfp03.fsf@gnu.org \
--to=handa@gnu.org \
--cc=16286-done@debbugs.gnu.org \
--cc=eggert@cs.ucla.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).