From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: handa@gnu.org (K. Handa) Newsgroups: gmane.emacs.bugs Subject: bug#16286: 24.3.50; insert-file-contents may bring invisible garbage Date: Tue, 28 Jan 2014 00:01:00 +0900 Message-ID: <87ppndfp03.fsf@gnu.org> References: <87sitb4usd.fsf@gmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1390834936 8293 80.91.229.3 (27 Jan 2014 15:02:16 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 27 Jan 2014 15:02:16 +0000 (UTC) Cc: 16286-done@debbugs.gnu.org To: Paul Eggert Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Jan 27 16:02:22 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1W7nhd-0007HL-V1 for geb-bug-gnu-emacs@m.gmane.org; Mon, 27 Jan 2014 16:02:22 +0100 Original-Received: from localhost ([::1]:60025 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W7nhd-00066y-Ax for geb-bug-gnu-emacs@m.gmane.org; Mon, 27 Jan 2014 10:02:21 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46991) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W7nhU-00066O-V7 for bug-gnu-emacs@gnu.org; Mon, 27 Jan 2014 10:02:17 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W7nhK-0002sD-Kq for bug-gnu-emacs@gnu.org; Mon, 27 Jan 2014 10:02:12 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:52460) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W7nhK-0002s9-IF for bug-gnu-emacs@gnu.org; Mon, 27 Jan 2014 10:02:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1W7nhK-0002Ck-0S for bug-gnu-emacs@gnu.org; Mon, 27 Jan 2014 10:02:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: handa@gnu.org (K. Handa) Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 27 Jan 2014 15:02:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16286 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 16286-done@debbugs.gnu.org id=D16286.13908348708403 (code D ref 16286); Mon, 27 Jan 2014 15:02:01 +0000 Original-Received: (at 16286-done) by debbugs.gnu.org; 27 Jan 2014 15:01:10 +0000 Original-Received: from localhost ([127.0.0.1]:38246 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W7ngT-0002BS-GE for submit@debbugs.gnu.org; Mon, 27 Jan 2014 10:01:09 -0500 Original-Received: from fencepost.gnu.org ([208.118.235.10]:51449) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W7ngQ-0002BE-5K for 16286-done@debbugs.gnu.org; Mon, 27 Jan 2014 10:01:07 -0500 Original-Received: from fl1-119-240-87-91.iba.mesh.ad.jp ([119.240.87.91]:33094 helo=wanchai) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1W7ngP-0000CN-5M; Mon, 27 Jan 2014 10:01:05 -0500 Original-Received: from handa by wanchai with local (Exim 4.80) (envelope-from ) id 1W7ngK-00022p-OO; Tue, 28 Jan 2014 00:01:00 +0900 In-Reply-To: <52E4588D.70004@cs.ucla.edu> (message from Paul Eggert on Sat, 25 Jan 2014 16:36:29 -0800) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:84114 Archived-At: In article <52E4588D.70004@cs.ucla.edu>, Paul Eggert w= rites: > I installed a patch as trunk bzr 116158, which (at least for me) fixes=20 > the reported bug, and am taking the liberty of marking this as done.=20 > There may well be a better fix, but at least Emacs shouldn't crash or=20 > report nonsense now. Thank you for working on this bug which I introduced when I made decode_coding_gap optimized for ASCII and UTF-8 only files.=20=20 Your change is to set CODING_MODE_LAST_BLOCK in coding->mode before calling decode_coding_gap so that detect_coding doesn't detect a file as utf-8 if it has incomplete utf-8 sequence at the tail (as the reported testcase). But, I think it is better that detect_coding detects such a file as utf-8 and treats the trailing garbage as raw bytes. 24.3 does it, and that is why decode_coding_gap sets CODING_MODE_LAST_BLOCK after calling detect_coding. So, I suggest the attached fix instead of yours. What do you think? --- Kenichi Handa handa@gnu.org =3D=3D=3D modified file 'src/ChangeLog' --- src/ChangeLog 2014-01-26 12:17:55 +0000 +++ src/ChangeLog 2014-01-27 14:53:58 +0000 @@ -1,3 +1,16 @@ +2014-01-27 K. Handa + + These change are to fix bug#16286 in the different way than what + done by revno:116158. + + * coding.h (struct coding_system): New member detected_utf8_bytes. + + * coding.c (detect_coding_utf_8): Set coding->detected_utf8_bytes. + (decode_coding_gap): Use short cut for UTF-8 file reading only + when coding->detected_utf8_bytes equals to coding->src_bytes. + + * fileio.c (Finsert_file_contents): Cancel the previous change. + 2014-01-26 Jan Dj=E4rv =20 * xterm.c (x_focus_changed): Check for non-X terminal-frame (Bug#16540) =3D=3D=3D modified file 'src/coding.c' --- src/coding.c 2014-01-26 01:20:24 +0000 +++ src/coding.c 2014-01-27 14:47:43 +0000 @@ -1300,6 +1300,7 @@ means that we found a valid non-ASCII characters. */ detect_info->found |=3D CATEGORY_MASK_UTF_8_AUTO | CATEGORY_MASK_UTF_8_NO= SIG; } + coding->detected_utf8_bytes =3D src_base - coding->source; coding->detected_utf8_chars =3D nchars; return 1; } @@ -7890,7 +7891,7 @@ coding->dst_multibyte =3D ! NILP (BVAR (current_buffer, enable_multibyte= _characters)); =20 coding->head_ascii =3D -1; - coding->detected_utf8_chars =3D -1; + coding->detected_utf8_bytes =3D coding->detected_utf8_chars =3D -1; coding->eol_seen =3D EOL_SEEN_NONE; if (CODING_REQUIRE_DETECTION (coding)) detect_coding (coding); @@ -7907,7 +7908,8 @@ if (chars !=3D bytes) { /* There exists a non-ASCII byte. */ - if (EQ (CODING_ATTR_TYPE (attrs), Qutf_8)) + if (EQ (CODING_ATTR_TYPE (attrs), Qutf_8) + && coding->detected_utf8_bytes =3D=3D coding->src_bytes) { if (coding->detected_utf8_chars >=3D 0) chars =3D coding->detected_utf8_chars; =3D=3D=3D modified file 'src/coding.h' --- src/coding.h 2014-01-26 01:20:24 +0000 +++ src/coding.h 2014-01-27 14:47:43 +0000 @@ -468,7 +468,9 @@ the eol format. */ ptrdiff_t head_ascii; =20 - ptrdiff_t detected_utf8_chars; + /* How many bytes/chars at the source are detected as valid utf-8 + sequence. Set by detect_coding_utf_8. */ + ptrdiff_t detected_utf8_bytes, detected_utf8_chars; =20 /* Used internally in coding.c. See the comment of detect_ascii. */ int eol_seen; =3D=3D=3D modified file 'src/fileio.c' --- src/fileio.c 2014-01-26 00:32:30 +0000 +++ src/fileio.c 2014-01-27 14:47:59 +0000 @@ -4298,7 +4298,6 @@ Z_BYTE -=3D inserted; ZV -=3D inserted; Z -=3D inserted; - coding.mode |=3D CODING_MODE_LAST_BLOCK; decode_coding_gap (&coding, inserted, inserted); inserted =3D coding.produced_char; coding_system =3D CODING_ID_NAME (coding.id);