all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Pierre Bogossian <bogossian@mail.com>, Kenichi Handa <handa@m17n.org>
Cc: 4047@emacsbugs.donarmstrong.com
Subject: bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
Date: Sat, 08 Aug 2009 15:20:10 +0300	[thread overview]
Message-ID: <837hxemr9h.fsf@gnu.org> (raw)
In-Reply-To: <20090807085054.036E61BF28D@ws1-10.us4.outblaze.com>

> From: "Pierre Bogossian" <bogossian@mail.com>
> Date: Fri, 7 Aug 2009 09:50:54 +0100
> 
> >[...] does it help to say
> >"C-x RET f utf-8-with-signature RET" before entering hexl-mode?
> 
> No, but forcing the coding system of any buffer to utf_8-with-signature
> using this command and then entering hexl-mode is enough to trigger
> the error. I can even reproduce it with a blank scratch buffer.
> 
> >> Unfortunately I can't test a unix version at the moment.
> >
> >Which means your OS is what?
> 
> Windows XP SP3.

The problem happens on GNU/Linux as well.

I think I've identified why the problem happens, but I need help in
finding the right solution.  Handa-san, can you please comment on
what's below?  Of course, others are welcome to comment as well.

The cause of the problem is this: hexlify-buffer must bind
coding-system-for-write to the buffer's encoding, to force
call-process-region use the buffer's encoding when writing the text to
the temporary file.  OTOH, it needs to avoid encoding the arguments
passed to the `hexl' program by the buffer's encoding, because that
could be inappropriate for encoding command lines on the underlying
system.  However, call-process-region normally uses
coding-system-for-write, if it is non-nil, to encode the arguments as
well.  To resolve this contradiction, hexlify-buffer encodes the
arguments manually (by locale-coding-system), assuming that, being
unibyte strings after that encoding, they will not be encoded by
call-process-region.

But call-process (called by call-process-region) does this:

    /* If arguments are supplied, we may have to encode them.  */
    if (nargs >= 5)
      {
	int must_encode = 0;
	Lisp_Object coding_attrs;

	for (i = 4; i < nargs; i++)
	  CHECK_STRING (args[i]);

	for (i = 4; i < nargs; i++)
	  if (STRING_MULTIBYTE (args[i]))
	    must_encode = 1;

	if (!NILP (Vcoding_system_for_write))
	  val = Vcoding_system_for_write;
	else if (! must_encode)
	  val = Qnil;
	else
	  {
	    args2 = (Lisp_Object *) alloca ((nargs + 1) * sizeof *args2);
	    args2[0] = Qcall_process;
	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);

First, if coding-system-for-write is non-nil, it is used, even if none
of the argument strings is a multibyte string.  (This particular bug
can easily be solved by making the test for must_encode before we test
that coding-system-for-write is non-nil, but I'm not sure this is the
right solution because other arguments could be multibyte strings,
which will still cause us to use coding-system-for-write for _all_
arguments.)

And second, this fragment, which actually encodes the arguments,
further down in call-process:

  if (nargs > 4)
    {
      register int i;
      struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5;

      GCPRO5 (infile, buffer, current_dir, path, error_file);
      argument_coding.dst_multibyte = 0;
      for (i = 4; i < nargs; i++)
	{
	  argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]);
	  if (CODING_REQUIRE_ENCODING (&argument_coding))
	    /* We must encode this argument.  */
	    args[i] = encode_coding_string (&argument_coding, args[i], 1);
	}

encodes the argument even though argument_coding.src_multibyte is set
to nil.  Is encode_coding_string supposed to encode unibyte strings?





  reply	other threads:[~2009-08-08 12:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-07  8:50 bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark Pierre Bogossian
2009-08-08 12:20 ` Eli Zaretskii [this message]
2009-08-08 13:22   ` Eli Zaretskii
2009-08-08 14:29     ` Andreas Schwab
2009-08-08 15:29       ` Eli Zaretskii
2009-08-08 15:47         ` Andreas Schwab
2009-08-08 17:24           ` Eli Zaretskii
2009-08-08 17:57             ` Lennart Borgman
2009-08-08 15:56         ` Lennart Borgman
2009-08-08 17:25           ` Eli Zaretskii
2009-08-10 19:45       ` Stefan Monnier
2009-08-11  0:51         ` Kenichi Handa
2009-08-14  9:02           ` Eli Zaretskii
2009-08-21  9:33             ` Eli Zaretskii
2009-08-21 12:18               ` Kenichi Handa
     [not found]                 ` <83praof8mu.fsf@gnu.org>
2009-08-05 14:01                   ` Pierre Bogossian
2009-08-06 17:49                     ` Eli Zaretskii
2009-08-22 10:30                     ` bug#4047: marked as done (23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark) Emacs bug Tracking System
2009-08-27 11:15                   ` bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=837hxemr9h.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=4047@emacsbugs.donarmstrong.com \
    --cc=bogossian@mail.com \
    --cc=handa@m17n.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.