From: Eli Zaretskii <eliz@gnu.org>
To: Pierre Bogossian <bogossian@mail.com>, Kenichi Handa <handa@m17n.org>
Cc: 4047@emacsbugs.donarmstrong.com
Subject: bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
Date: Sat, 08 Aug 2009 15:20:10 +0300 [thread overview]
Message-ID: <837hxemr9h.fsf@gnu.org> (raw)
In-Reply-To: <20090807085054.036E61BF28D@ws1-10.us4.outblaze.com>
> From: "Pierre Bogossian" <bogossian@mail.com>
> Date: Fri, 7 Aug 2009 09:50:54 +0100
>
> >[...] does it help to say
> >"C-x RET f utf-8-with-signature RET" before entering hexl-mode?
>
> No, but forcing the coding system of any buffer to utf_8-with-signature
> using this command and then entering hexl-mode is enough to trigger
> the error. I can even reproduce it with a blank scratch buffer.
>
> >> Unfortunately I can't test a unix version at the moment.
> >
> >Which means your OS is what?
>
> Windows XP SP3.
The problem happens on GNU/Linux as well.
I think I've identified why the problem happens, but I need help in
finding the right solution. Handa-san, can you please comment on
what's below? Of course, others are welcome to comment as well.
The cause of the problem is this: hexlify-buffer must bind
coding-system-for-write to the buffer's encoding, to force
call-process-region use the buffer's encoding when writing the text to
the temporary file. OTOH, it needs to avoid encoding the arguments
passed to the `hexl' program by the buffer's encoding, because that
could be inappropriate for encoding command lines on the underlying
system. However, call-process-region normally uses
coding-system-for-write, if it is non-nil, to encode the arguments as
well. To resolve this contradiction, hexlify-buffer encodes the
arguments manually (by locale-coding-system), assuming that, being
unibyte strings after that encoding, they will not be encoded by
call-process-region.
But call-process (called by call-process-region) does this:
/* If arguments are supplied, we may have to encode them. */
if (nargs >= 5)
{
int must_encode = 0;
Lisp_Object coding_attrs;
for (i = 4; i < nargs; i++)
CHECK_STRING (args[i]);
for (i = 4; i < nargs; i++)
if (STRING_MULTIBYTE (args[i]))
must_encode = 1;
if (!NILP (Vcoding_system_for_write))
val = Vcoding_system_for_write;
else if (! must_encode)
val = Qnil;
else
{
args2 = (Lisp_Object *) alloca ((nargs + 1) * sizeof *args2);
args2[0] = Qcall_process;
for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
First, if coding-system-for-write is non-nil, it is used, even if none
of the argument strings is a multibyte string. (This particular bug
can easily be solved by making the test for must_encode before we test
that coding-system-for-write is non-nil, but I'm not sure this is the
right solution because other arguments could be multibyte strings,
which will still cause us to use coding-system-for-write for _all_
arguments.)
And second, this fragment, which actually encodes the arguments,
further down in call-process:
if (nargs > 4)
{
register int i;
struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5;
GCPRO5 (infile, buffer, current_dir, path, error_file);
argument_coding.dst_multibyte = 0;
for (i = 4; i < nargs; i++)
{
argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]);
if (CODING_REQUIRE_ENCODING (&argument_coding))
/* We must encode this argument. */
args[i] = encode_coding_string (&argument_coding, args[i], 1);
}
encodes the argument even though argument_coding.src_multibyte is set
to nil. Is encode_coding_string supposed to encode unibyte strings?
next prev parent reply other threads:[~2009-08-08 12:20 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-07 8:50 bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark Pierre Bogossian
2009-08-08 12:20 ` Eli Zaretskii [this message]
2009-08-08 13:22 ` Eli Zaretskii
2009-08-08 14:29 ` Andreas Schwab
2009-08-08 15:29 ` Eli Zaretskii
2009-08-08 15:47 ` Andreas Schwab
2009-08-08 17:24 ` Eli Zaretskii
2009-08-08 17:57 ` Lennart Borgman
2009-08-08 15:56 ` Lennart Borgman
2009-08-08 17:25 ` Eli Zaretskii
2009-08-10 19:45 ` Stefan Monnier
2009-08-11 0:51 ` Kenichi Handa
2009-08-14 9:02 ` Eli Zaretskii
2009-08-21 9:33 ` Eli Zaretskii
2009-08-21 12:18 ` Kenichi Handa
[not found] ` <83praof8mu.fsf@gnu.org>
2009-08-05 14:01 ` Pierre Bogossian
2009-08-06 17:49 ` Eli Zaretskii
2009-08-22 10:30 ` bug#4047: marked as done (23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark) Emacs bug Tracking System
2009-08-27 11:15 ` bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark Kenichi Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=837hxemr9h.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=4047@emacsbugs.donarmstrong.com \
--cc=bogossian@mail.com \
--cc=handa@m17n.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.