bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
@ 2009-08-07  8:50 Pierre Bogossian
  2009-08-08 12:20 ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Pierre Bogossian @ 2009-08-07  8:50 UTC (permalink / raw)
  To: Eli Zaretskii, Pierre Bogossian, 4047

>What is the value of buffer-file-coding-system before you enter
>hexl-mode?

It can be utf-8-with-signature-dos or utf-8-with-signature-unix
depending on the type of "end-of-line" used by the file.

>[...] does it help to say
>"C-x RET f utf-8-with-signature RET" before entering hexl-mode?

No, but forcing the coding system of any buffer to utf_8-with-signature
using this command and then entering hexl-mode is enough to trigger
the error. I can even reproduce it with a blank scratch buffer.

>> Unfortunately I can't test a unix version at the moment.
>
>Which means your OS is what?

Windows XP SP3.

-- 
Be Yourself @ mail.com!
Choose From 200+ Email Addresses
Get a Free Account at www.mail.com!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-07  8:50 bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark Pierre Bogossian
@ 2009-08-08 12:20 ` Eli Zaretskii
  2009-08-08 13:22   ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2009-08-08 12:20 UTC (permalink / raw)
  To: Pierre Bogossian, Kenichi Handa; +Cc: 4047

> From: "Pierre Bogossian" <bogossian@mail.com>
> Date: Fri, 7 Aug 2009 09:50:54 +0100
> 
> >[...] does it help to say
> >"C-x RET f utf-8-with-signature RET" before entering hexl-mode?
> 
> No, but forcing the coding system of any buffer to utf_8-with-signature
> using this command and then entering hexl-mode is enough to trigger
> the error. I can even reproduce it with a blank scratch buffer.
> 
> >> Unfortunately I can't test a unix version at the moment.
> >
> >Which means your OS is what?
> 
> Windows XP SP3.

The problem happens on GNU/Linux as well.

I think I've identified why the problem happens, but I need help in
finding the right solution.  Handa-san, can you please comment on
what's below?  Of course, others are welcome to comment as well.

The cause of the problem is this: hexlify-buffer must bind
coding-system-for-write to the buffer's encoding, to force
call-process-region use the buffer's encoding when writing the text to
the temporary file.  OTOH, it needs to avoid encoding the arguments
passed to the `hexl' program by the buffer's encoding, because that
could be inappropriate for encoding command lines on the underlying
system.  However, call-process-region normally uses
coding-system-for-write, if it is non-nil, to encode the arguments as
well.  To resolve this contradiction, hexlify-buffer encodes the
arguments manually (by locale-coding-system), assuming that, being
unibyte strings after that encoding, they will not be encoded by
call-process-region.

But call-process (called by call-process-region) does this:

    /* If arguments are supplied, we may have to encode them.  */
    if (nargs >= 5)
      {
	int must_encode = 0;
	Lisp_Object coding_attrs;

	for (i = 4; i < nargs; i++)
	  CHECK_STRING (args[i]);

	for (i = 4; i < nargs; i++)
	  if (STRING_MULTIBYTE (args[i]))
	    must_encode = 1;

	if (!NILP (Vcoding_system_for_write))
	  val = Vcoding_system_for_write;
	else if (! must_encode)
	  val = Qnil;
	else
	  {
	    args2 = (Lisp_Object *) alloca ((nargs + 1) * sizeof *args2);
	    args2[0] = Qcall_process;
	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);

First, if coding-system-for-write is non-nil, it is used, even if none
of the argument strings is a multibyte string.  (This particular bug
can easily be solved by making the test for must_encode before we test
that coding-system-for-write is non-nil, but I'm not sure this is the
right solution because other arguments could be multibyte strings,
which will still cause us to use coding-system-for-write for _all_
arguments.)

And second, this fragment, which actually encodes the arguments,
further down in call-process:

  if (nargs > 4)
    {
      register int i;
      struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5;

      GCPRO5 (infile, buffer, current_dir, path, error_file);
      argument_coding.dst_multibyte = 0;
      for (i = 4; i < nargs; i++)
	{
	  argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]);
	  if (CODING_REQUIRE_ENCODING (&argument_coding))
	    /* We must encode this argument.  */
	    args[i] = encode_coding_string (&argument_coding, args[i], 1);
	}

encodes the argument even though argument_coding.src_multibyte is set
to nil.  Is encode_coding_string supposed to encode unibyte strings?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-08 12:20 ` Eli Zaretskii
@ 2009-08-08 13:22   ` Eli Zaretskii
  2009-08-08 14:29     ` Andreas Schwab
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2009-08-08 13:22 UTC (permalink / raw)
  To: 4047; +Cc: bogossian

> Date: Sat, 08 Aug 2009 15:20:10 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 4047@emacsbugs.donarmstrong.com
> 
> The cause of the problem is this: [...]

I probably should have said explicitly that the end result of all I
described is that the "-hex" command-line argument to `hexl' is
encoded by utf-8-with-signature, and becomes "\357\273\277-hex",
which, of course, utterly confuses `hexl'.

Btw, I doubt that any encoding that uses BOM can ever be appropriate
for encoding command-line arguments.  Maybe we should treat them
specially in call-process and its ilk.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-08 13:22   ` Eli Zaretskii
@ 2009-08-08 14:29     ` Andreas Schwab
  2009-08-08 15:29       ` Eli Zaretskii
  2009-08-10 19:45       ` Stefan Monnier
  0 siblings, 2 replies; 19+ messages in thread
From: Andreas Schwab @ 2009-08-08 14:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 4047, bogossian

Eli Zaretskii <eliz@gnu.org> writes:

> Btw, I doubt that any encoding that uses BOM can ever be appropriate
> for encoding command-line arguments.  Maybe we should treat them
> specially in call-process and its ilk.

The bug is that hexlify-buffer assumes that manually encoding the
command line stops call-process from encoding it again, which does not
work: coding-system-for-write takes absolute precedence.  IMHO
call-process should not use coding-system-for-write for encoding the
command line, if at all there should be a separate override.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-08 14:29     ` Andreas Schwab
@ 2009-08-08 15:29       ` Eli Zaretskii
  2009-08-08 15:47         ` Andreas Schwab
  2009-08-08 15:56         ` Lennart Borgman
  2009-08-10 19:45       ` Stefan Monnier
  1 sibling, 2 replies; 19+ messages in thread
From: Eli Zaretskii @ 2009-08-08 15:29 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 4047, bogossian

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: 4047@emacsbugs.donarmstrong.com,  bogossian@mail.com
> Date: Sat, 08 Aug 2009 16:29:31 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Btw, I doubt that any encoding that uses BOM can ever be appropriate
> > for encoding command-line arguments.  Maybe we should treat them
> > specially in call-process and its ilk.
> 
> The bug is that hexlify-buffer assumes that manually encoding the
> command line stops call-process from encoding it again, which does not
> work: coding-system-for-write takes absolute precedence.

If encode_coding_string would leave unibyte strings alone (as I think
it should, unless there's a good reason not to), the absolute
precedence you mention would not matter.  Or, if there _is_ a good
reason for encode_coding_string's current behavior, we could avoid
encoding unibyte strings in the command-line arguments (although
admittedly that would be a kludge).

> IMHO call-process should not use coding-system-for-write for
> encoding the command line

But if some of the command-line arguments are file names, say, we do
need to encode them, don't we?

> if at all there should be a separate override.

That'd be fine by me, if there's no better alternative.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-08 15:29       ` Eli Zaretskii
@ 2009-08-08 15:47         ` Andreas Schwab
  2009-08-08 17:24           ` Eli Zaretskii
  2009-08-08 15:56         ` Lennart Borgman
  1 sibling, 1 reply; 19+ messages in thread
From: Andreas Schwab @ 2009-08-08 15:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 4047, bogossian

Eli Zaretskii <eliz@gnu.org> writes:

> If encode_coding_string would leave unibyte strings alone

It does if coding-system-for-write is nil.

>> IMHO call-process should not use coding-system-for-write for
>> encoding the command line
>
> But if some of the command-line arguments are file names, say, we do
> need to encode them, don't we?

coding-system-for-write is meant to override the coding system for write
operations, but IMHO the coding system for file names is in a different
category.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-08 15:47         ` Andreas Schwab
@ 2009-08-08 17:24           ` Eli Zaretskii
  2009-08-08 17:57             ` Lennart Borgman
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2009-08-08 17:24 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 4047, bogossian

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: 4047@emacsbugs.donarmstrong.com,  bogossian@mail.com
> Date: Sat, 08 Aug 2009 17:47:27 +0200
> 
> coding-system-for-write is meant to override the coding system for write
> operations, but IMHO the coding system for file names is in a different
> category.

What about strings passed to Grep?





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-08 17:24           ` Eli Zaretskii
@ 2009-08-08 17:57             ` Lennart Borgman
  0 siblings, 0 replies; 19+ messages in thread
From: Lennart Borgman @ 2009-08-08 17:57 UTC (permalink / raw)
  To: Eli Zaretskii, 4047; +Cc: bogossian, Andreas Schwab

On Sat, Aug 8, 2009 at 7:24 PM, Eli Zaretskii<eliz@gnu.org> wrote:
>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: 4047@emacsbugs.donarmstrong.com,  bogossian@mail.com
>> Date: Sat, 08 Aug 2009 17:47:27 +0200
>>
>> coding-system-for-write is meant to override the coding system for write
>> operations, but IMHO the coding system for file names is in a different
>> category.
>
> What about strings passed to Grep?

Or arg to any program? The required coding could be different than the
file name coding.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-08 15:29       ` Eli Zaretskii
  2009-08-08 15:47         ` Andreas Schwab
@ 2009-08-08 15:56         ` Lennart Borgman
  2009-08-08 17:25           ` Eli Zaretskii
  1 sibling, 1 reply; 19+ messages in thread
From: Lennart Borgman @ 2009-08-08 15:56 UTC (permalink / raw)
  To: Eli Zaretskii, 4047; +Cc: bogossian, Andreas Schwab

On Sat, Aug 8, 2009 at 5:29 PM, Eli Zaretskii<eliz@gnu.org> wrote:

> But if some of the command-line arguments are file names, say, we do
> need to encode them, don't we?

Could not different programs (the program arg to call-process) have
different requirements? At least on w32 that seems to me to be the
case.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-08 15:56         ` Lennart Borgman
@ 2009-08-08 17:25           ` Eli Zaretskii
  0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2009-08-08 17:25 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: bogossian, 4047, schwab

> Date: Sat, 8 Aug 2009 17:56:21 +0200
> From: Lennart Borgman <lennart.borgman@gmail.com>
> Cc: Andreas Schwab <schwab@linux-m68k.org>, bogossian@mail.com
> 
> On Sat, Aug 8, 2009 at 5:29 PM, Eli Zaretskii<eliz@gnu.org> wrote:
> 
> > But if some of the command-line arguments are file names, say, we do
> > need to encode them, don't we?
> 
> Could not different programs (the program arg to call-process) have
> different requirements?

Of course, they do.  But the Lisp code that invokes them should know
what it is doing.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-08 14:29     ` Andreas Schwab
  2009-08-08 15:29       ` Eli Zaretskii
@ 2009-08-10 19:45       ` Stefan Monnier
  2009-08-11  0:51         ` Kenichi Handa
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2009-08-10 19:45 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 4047, bogossian

>> Btw, I doubt that any encoding that uses BOM can ever be appropriate
>> for encoding command-line arguments.  Maybe we should treat them
>> specially in call-process and its ilk.
> The bug is that hexlify-buffer assumes that manually encoding the
> command line stops call-process from encoding it again, which does not
> work: coding-system-for-write takes absolute precedence.  IMHO
> call-process should not use coding-system-for-write for encoding the
> command line, if at all there should be a separate override.

I believe we've bumped into this problem already in the past.
To me, it's clear that call-process should be careful about coding
arguments, since the coding-system to use may depend on the argument
and/or the command, so in general the caller will want to specify
explicitly some coding system for the arguments, including a different
coding system for each argument.  An override var might be a good idea,
but it won't cater to the case where each arg requires a different
encoding, so the most important thing is to make sure that unibyte args
don't get re-encoded.

Unless Handa objects, I'd recommend we change encode_coding_string to be
a nop on unibyte strings (tho, we may want to let it obey EOL
conversions).  If there are good reasons not to do that, then
Fcall_process should be changed to not call encode_coding_string on
unibyte strings.

        Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-10 19:45       ` Stefan Monnier
@ 2009-08-11  0:51         ` Kenichi Handa
  2009-08-14  9:02           ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Kenichi Handa @ 2009-08-11  0:51 UTC (permalink / raw)
  To: Stefan Monnier, 4047; +Cc: bogossian, 4047, schwab

In article <jwvljlrlapn.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> Unless Handa objects, I'd recommend we change encode_coding_string to be
> a nop on unibyte strings (tho, we may want to let it obey EOL
> conversions).

I don't object to that change.

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-11  0:51         ` Kenichi Handa
@ 2009-08-14  9:02           ` Eli Zaretskii
  2009-08-21  9:33             ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2009-08-14  9:02 UTC (permalink / raw)
  To: Kenichi Handa, 4047; +Cc: bogossian, schwab

> From: Kenichi Handa <handa@m17n.org>
> Date: Tue, 11 Aug 2009 09:51:49 +0900
> Cc: bogossian@mail.com, 4047@emacsbugs.donarmstrong.com, schwab@linux-m68k.org
> 
> In article <jwvljlrlapn.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > Unless Handa objects, I'd recommend we change encode_coding_string to be
> > a nop on unibyte strings (tho, we may want to let it obey EOL
> > conversions).
> 
> I don't object to that change.

For strings only (i.e. in coding.h:encode_coding_string) or on the
more basic level, in coding.c:encode_coding_object?





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-14  9:02           ` Eli Zaretskii
@ 2009-08-21  9:33             ` Eli Zaretskii
  2009-08-21 12:18               ` Kenichi Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2009-08-21  9:33 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: schwab, 4047, bogossian

> Date: Fri, 14 Aug 2009 12:02:37 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> CC: monnier@iro.umontreal.ca, bogossian@mail.com, schwab@linux-m68k.org
> 
> > From: Kenichi Handa <handa@m17n.org>
> > Date: Tue, 11 Aug 2009 09:51:49 +0900
> > Cc: bogossian@mail.com, 4047@emacsbugs.donarmstrong.com, schwab@linux-m68k.org
> > 
> > In article <jwvljlrlapn.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > > Unless Handa objects, I'd recommend we change encode_coding_string to be
> > > a nop on unibyte strings (tho, we may want to let it obey EOL
> > > conversions).
> > 
> > I don't object to that change.
> 
> For strings only (i.e. in coding.h:encode_coding_string) or on the
> more basic level, in coding.c:encode_coding_object?

Ping!





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-21  9:33             ` Eli Zaretskii
@ 2009-08-21 12:18               ` Kenichi Handa
       [not found]                 ` <83praof8mu.fsf@gnu.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Kenichi Handa @ 2009-08-21 12:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, 4047, bogossian

In article <83ljldh5pm.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > > > Unless Handa objects, I'd recommend we change encode_coding_string to be
> > > > a nop on unibyte strings (tho, we may want to let it obey EOL
> > > > conversions).
> > > 
> > > I don't object to that change.
> > 
> > For strings only (i.e. in coding.h:encode_coding_string) or on the
> > more basic level, in coding.c:encode_coding_object?

> Ping!

At the moment, all I can say is that changing
coding.h:encode_coding_string is quite safe.  But,
encode_coding_object is used by Lisp functions
encode-coding-region and encode-coding-string, and thus the
change will break some packages that use them on unibyte
string/buffer.

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <83praof8mu.fsf@gnu.org>]

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
@ 2009-08-05 14:01                   ` Pierre Bogossian
  2009-08-06 17:49                     ` Eli Zaretskii
  2009-08-22 10:30                     ` bug#4047: marked as done (23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark) Emacs bug Tracking System
  0 siblings, 2 replies; 19+ messages in thread
From: Pierre Bogossian @ 2009-08-05 14:01 UTC (permalink / raw)
  To: bug-gnu-emacs

Hi,

I'm testing the windows version of the new emacs 23.1.1

Here's what I noticed:

If I open a UTF8 file with a byte-order mark, and if I
try to enter hexl-mode, I get this error: "\357\273\277-hex: No such file or directory".

The presence of the BOM is important, I can enter hexl-mode
with no problem if I remove the BOM from the file.

I did the same test with emacs 22.3.1 and it worked fine, so
this looks like a regression.

Unfortunately I can't test a unix version at the moment.

Regards,

Pierre

-- 
Be Yourself @ mail.com!
Choose From 200+ Email Addresses
Get a Free Account at www.mail.com!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
  2009-08-05 14:01                   ` Pierre Bogossian
@ 2009-08-06 17:49                     ` Eli Zaretskii
  2009-08-22 10:30                     ` bug#4047: marked as done (23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark) Emacs bug Tracking System
  1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2009-08-06 17:49 UTC (permalink / raw)
  To: Pierre Bogossian, 4047

> From: "Pierre Bogossian" <bogossian@mail.com>
> Date: Wed, 5 Aug 2009 15:01:31 +0100
> Cc: 
> 
> If I open a UTF8 file with a byte-order mark, and if I
> try to enter hexl-mode, I get this error: "\357\273\277-hex: No such file or directory".

What is the value of buffer-file-coding-system before you enter
hexl-mode?  If it is anything but  utf-8-with-signature, does it help
to say "C-x RET f utf-8-with-signature RET" before entering hexl-mode?

> Unfortunately I can't test a unix version at the moment.

Which means your OS is what?





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: marked as done (23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark)
  2009-08-05 14:01                   ` Pierre Bogossian
  2009-08-06 17:49                     ` Eli Zaretskii
@ 2009-08-22 10:30                     ` Emacs bug Tracking System
  1 sibling, 0 replies; 19+ messages in thread
From: Emacs bug Tracking System @ 2009-08-22 10:30 UTC (permalink / raw)
  To: Eli Zaretskii

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

Your message dated Sat, 22 Aug 2009 13:25:13 +0300
with message-id <83praof8mu.fsf@gnu.org>
and subject line Re: bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
has caused the Emacs bug report #4047,
regarding 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@emacsbugs.donarmstrong.com
immediately.)

-- 
4047: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=4047
Emacs Bug Tracking System
Contact owner@emacsbugs.donarmstrong.com with problems

[-- Attachment #2: Type: message/rfc822, Size: 3239 bytes --]

From: "Pierre Bogossian" <bogossian@mail.com>
To: bug-gnu-emacs@gnu.org
Subject: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
Date: Wed, 5 Aug 2009 15:01:31 +0100
Message-ID: <20090805140131.19FBE606865@ws1-4.us4.outblaze.com>

Hi,

I'm testing the windows version of the new emacs 23.1.1

Here's what I noticed:

If I open a UTF8 file with a byte-order mark, and if I
try to enter hexl-mode, I get this error: "\357\273\277-hex: No such file or directory".

The presence of the BOM is important, I can enter hexl-mode
with no problem if I remove the BOM from the file.

I did the same test with emacs 22.3.1 and it worked fine, so
this looks like a regression.

Unfortunately I can't test a unix version at the moment.

Regards,

Pierre

-- 
Be Yourself @ mail.com!
Choose From 200+ Email Addresses
Get a Free Account at www.mail.com!

[-- Attachment #3: Type: message/rfc822, Size: 2821 bytes --]

From: Eli Zaretskii <eliz@gnu.org>
To: Kenichi Handa <handa@m17n.org>
Cc: 4047-done@emacsbugs.donarmstrong.com, monnier@iro.umontreal.ca, bogossian@mail.com, schwab@linux-m68k.org
Subject: Re: bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
Date: Sat, 22 Aug 2009 13:25:13 +0300
Message-ID: <83praof8mu.fsf@gnu.org>

> From: Kenichi Handa <handa@m17n.org>
> CC: 4047@emacsbugs.donarmstrong.com, monnier@iro.umontreal.ca,
>         bogossian@mail.com, schwab@linux-m68k.org
> Date: Fri, 21 Aug 2009 21:18:53 +0900
> 
> In article <83ljldh5pm.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > > > Unless Handa objects, I'd recommend we change encode_coding_string to be
> > > > > a nop on unibyte strings (tho, we may want to let it obey EOL
> > > > > conversions).
> > > > 
> > > > I don't object to that change.
> > > 
> > > For strings only (i.e. in coding.h:encode_coding_string) or on the
> > > more basic level, in coding.c:encode_coding_object?
> 
> > Ping!
> 
> At the moment, all I can say is that changing
> coding.h:encode_coding_string is quite safe.  But,
> encode_coding_object is used by Lisp functions
> encode-coding-region and encode-coding-string, and thus the
> change will break some packages that use them on unibyte
> string/buffer.

I fixed this in encode-coding-string.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
       [not found]                 ` <83praof8mu.fsf@gnu.org>
  2009-08-05 14:01                   ` Pierre Bogossian
@ 2009-08-27 11:15                   ` Kenichi Handa
  1 sibling, 0 replies; 19+ messages in thread
From: Kenichi Handa @ 2009-08-27 11:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, 4047, bogossian

In article <83praof8mu.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > At the moment, all I can say is that changing
> > coding.h:encode_coding_string is quite safe.  But,
> > encode_coding_object is used by Lisp functions
> > encode-coding-region and encode-coding-string, and thus the
> > change will break some packages that use them on unibyte
> > string/buffer.

> I fixed this in encode-coding-string.

I have overlooked this part:

Stefan wrote:
> I'd recommend we change encode_coding_string to be
> a nop on unibyte strings (tho, we may want to let it obey EOL
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> conversions).
  ^^^^^^^^^^^

We surely need eol conversion in sending a unibyte string to
a process.  So, I've just installed this change.

2009-08-27  Kenichi Handa  <handa@m17n.org>

	* process.c (send_process): Use encode_coding_object instead of
	encode_coding_string to perform eol-conversion even if the string
	is unibyte.

Index: process.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/process.c,v
retrieving revision 1.593
retrieving revision 1.594
diff -u -r1.593 -r1.594
--- process.c	17 Aug 2009 21:04:07 -0000	1.593
+++ process.c	27 Aug 2009 11:12:54 -0000	1.594
@@ -5721,7 +5721,8 @@
 	}
       else if (STRINGP (object))
 	{
-	  encode_coding_string (coding, object, 1);
+	  encode_coding_object (coding, object, 0, 0, SCHARS (object),
+				SBYTES (object), Qt);
 	}
       else
 	{

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-08-27 11:15 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-07  8:50 bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark Pierre Bogossian
2009-08-08 12:20 ` Eli Zaretskii
2009-08-08 13:22   ` Eli Zaretskii
2009-08-08 14:29     ` Andreas Schwab
2009-08-08 15:29       ` Eli Zaretskii
2009-08-08 15:47         ` Andreas Schwab
2009-08-08 17:24           ` Eli Zaretskii
2009-08-08 17:57             ` Lennart Borgman
2009-08-08 15:56         ` Lennart Borgman
2009-08-08 17:25           ` Eli Zaretskii
2009-08-10 19:45       ` Stefan Monnier
2009-08-11  0:51         ` Kenichi Handa
2009-08-14  9:02           ` Eli Zaretskii
2009-08-21  9:33             ` Eli Zaretskii
2009-08-21 12:18               ` Kenichi Handa
     [not found]                 ` <83praof8mu.fsf@gnu.org>
2009-08-05 14:01                   ` Pierre Bogossian
2009-08-06 17:49                     ` Eli Zaretskii
2009-08-22 10:30                     ` bug#4047: marked as done (23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark) Emacs bug Tracking System
2009-08-27 11:15                   ` bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).