From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark Date: Sat, 08 Aug 2009 15:20:10 +0300 Message-ID: <837hxemr9h.fsf@gnu.org> References: <20090807085054.036E61BF28D@ws1-10.us4.outblaze.com> Reply-To: Eli Zaretskii , 4047@emacsbugs.donarmstrong.com NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1249734431 1766 80.91.229.12 (8 Aug 2009 12:27:11 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 8 Aug 2009 12:27:11 +0000 (UTC) Cc: 4047@emacsbugs.donarmstrong.com To: Pierre Bogossian , Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Aug 08 14:27:04 2009 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MZl0p-00035g-E3 for geb-bug-gnu-emacs@m.gmane.org; Sat, 08 Aug 2009 14:27:03 +0200 Original-Received: from localhost ([127.0.0.1]:41674 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MZl0o-0001BP-Fk for geb-bug-gnu-emacs@m.gmane.org; Sat, 08 Aug 2009 08:27:02 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MZl0i-0001Aj-TX for bug-gnu-emacs@gnu.org; Sat, 08 Aug 2009 08:26:56 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MZl0e-00019R-14 for bug-gnu-emacs@gnu.org; Sat, 08 Aug 2009 08:26:56 -0400 Original-Received: from [199.232.76.173] (port=57831 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MZl0d-00019K-Sb for bug-gnu-emacs@gnu.org; Sat, 08 Aug 2009 08:26:51 -0400 Original-Received: from rzlab.ucr.edu ([138.23.92.77]:38973) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MZl0d-0006C6-5n for bug-gnu-emacs@gnu.org; Sat, 08 Aug 2009 08:26:51 -0400 Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n78CQmM5026711; Sat, 8 Aug 2009 05:26:48 -0700 Original-Received: (from debbugs@localhost) by rzlab.ucr.edu (8.14.3/8.14.3/Submit) id n78CP5M5026445; Sat, 8 Aug 2009 05:25:05 -0700 X-Loop: owner@emacsbugs.donarmstrong.com Resent-From: Eli Zaretskii Resent-To: bug-submit-list@donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Sat, 08 Aug 2009 12:25:05 +0000 Resent-Message-ID: Resent-Sender: owner@emacsbugs.donarmstrong.com X-Emacs-PR-Message: followup 4047 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Original-Received: via spool by 4047-submit@emacsbugs.donarmstrong.com id=B4047.124973402026044 (code B ref 4047); Sat, 08 Aug 2009 12:25:05 +0000 Original-Received: (at 4047) by emacsbugs.donarmstrong.com; 8 Aug 2009 12:20:20 +0000 X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. Original-Received: from mtaout2.012.net.il (mtaout2.012.net.il [84.95.2.4]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n78CKHNN025994 for <4047@emacsbugs.donarmstrong.com>; Sat, 8 Aug 2009 05:20:19 -0700 Original-Received: from conversion-daemon.i_mtaout2.012.net.il by i_mtaout2.012.net.il (HyperSendmail v2004.12) id <0KO2005004NENI00@i_mtaout2.012.net.il> for 4047@emacsbugs.donarmstrong.com; Sat, 08 Aug 2009 15:20:11 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([77.126.151.173]) by i_mtaout2.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0KO200LLQ4XMS8A0@i_mtaout2.012.net.il>; Sat, 08 Aug 2009 15:20:11 +0300 (IDT) In-reply-to: <20090807085054.036E61BF28D@ws1-10.us4.outblaze.com> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) Resent-Date: Sat, 08 Aug 2009 08:26:56 -0400 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:29987 Archived-At: > From: "Pierre Bogossian" > Date: Fri, 7 Aug 2009 09:50:54 +0100 > > >[...] does it help to say > >"C-x RET f utf-8-with-signature RET" before entering hexl-mode? > > No, but forcing the coding system of any buffer to utf_8-with-signature > using this command and then entering hexl-mode is enough to trigger > the error. I can even reproduce it with a blank scratch buffer. > > >> Unfortunately I can't test a unix version at the moment. > > > >Which means your OS is what? > > Windows XP SP3. The problem happens on GNU/Linux as well. I think I've identified why the problem happens, but I need help in finding the right solution. Handa-san, can you please comment on what's below? Of course, others are welcome to comment as well. The cause of the problem is this: hexlify-buffer must bind coding-system-for-write to the buffer's encoding, to force call-process-region use the buffer's encoding when writing the text to the temporary file. OTOH, it needs to avoid encoding the arguments passed to the `hexl' program by the buffer's encoding, because that could be inappropriate for encoding command lines on the underlying system. However, call-process-region normally uses coding-system-for-write, if it is non-nil, to encode the arguments as well. To resolve this contradiction, hexlify-buffer encodes the arguments manually (by locale-coding-system), assuming that, being unibyte strings after that encoding, they will not be encoded by call-process-region. But call-process (called by call-process-region) does this: /* If arguments are supplied, we may have to encode them. */ if (nargs >= 5) { int must_encode = 0; Lisp_Object coding_attrs; for (i = 4; i < nargs; i++) CHECK_STRING (args[i]); for (i = 4; i < nargs; i++) if (STRING_MULTIBYTE (args[i])) must_encode = 1; if (!NILP (Vcoding_system_for_write)) val = Vcoding_system_for_write; else if (! must_encode) val = Qnil; else { args2 = (Lisp_Object *) alloca ((nargs + 1) * sizeof *args2); args2[0] = Qcall_process; for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; coding_systems = Ffind_operation_coding_system (nargs + 1, args2); First, if coding-system-for-write is non-nil, it is used, even if none of the argument strings is a multibyte string. (This particular bug can easily be solved by making the test for must_encode before we test that coding-system-for-write is non-nil, but I'm not sure this is the right solution because other arguments could be multibyte strings, which will still cause us to use coding-system-for-write for _all_ arguments.) And second, this fragment, which actually encodes the arguments, further down in call-process: if (nargs > 4) { register int i; struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5; GCPRO5 (infile, buffer, current_dir, path, error_file); argument_coding.dst_multibyte = 0; for (i = 4; i < nargs; i++) { argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]); if (CODING_REQUIRE_ENCODING (&argument_coding)) /* We must encode this argument. */ args[i] = encode_coding_string (&argument_coding, args[i], 1); } encodes the argument even though argument_coding.src_multibyte is set to nil. Is encode_coding_string supposed to encode unibyte strings?