unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Fcall_process: wrong conversion
@ 2006-05-15  6:09 Herbert Euler
  2006-05-15 14:25 ` Stefan Monnier
  0 siblings, 1 reply; 18+ messages in thread
From: Herbert Euler @ 2006-05-15  6:09 UTC (permalink / raw)
  Cc: herberteuler

Hello,

Fcall_process in callproc.c, which is correspond to `call-process',
cannot handle UTF-16 (both LE or BE) correctly.  Take a look at line
417 to 424, callproc.c:

      for (i = 4; i < nargs; i++)
	{
	  argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]);
	  if (CODING_REQUIRE_ENCODING (&argument_coding))
	    /* We must encode this argument.  */
	    args[i] = encode_coding_string (&argument_coding, args[i], 1);
	  new_argv[i - 3] = SDATA (args[i]);
	}

If encoding is UTF-16, encode_coding_string will convert all ascii
characters in an argument to wide ones, and add prefix to that
argument.  For example, if argv[4] is "-hex", it may be converted to
"\376\377\0-\0h\0e\0x", which is normally not a correct argument to
most programs and so causes these programs complaining about it.  Even
wide characters are converted to wrong arguments by adding "\376\377"
or "\377\376".

I found this problem when applying `hexl-mode' to UTF-16 texts.  Could
somebody help solve it?

And I don't know whether similar problems resides somewhere else.

Thanks very much.

Regards,
Guanpeng Xu

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-15  6:09 Fcall_process: wrong conversion Herbert Euler
@ 2006-05-15 14:25 ` Stefan Monnier
  2006-05-15 15:17   ` Herbert Euler
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier @ 2006-05-15 14:25 UTC (permalink / raw)
  Cc: emacs-devel

> Fcall_process in callproc.c, which is correspond to `call-process',
> cannot handle UTF-16 (both LE or BE) correctly.  Take a look at line

Actually, it handles it just fine.  The problem is that call-process and
start-process both use the same coding system to encode arguments and to
encode the data sent via stdin to the process, whereas you want them to
be distinct.
If you want them to be distinct, then you need to manually encode your
arguments before passing them to call-process.

I.e. the bug with hexl-mode is in hexl.el.  Please report it separately
indicating how to reproduce the problem (I don't know how to "applying
`hexl-mode' to UTF-16 texts").


        Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-15 14:25 ` Stefan Monnier
@ 2006-05-15 15:17   ` Herbert Euler
  2006-05-15 16:06     ` Stefan Monnier
  0 siblings, 1 reply; 18+ messages in thread
From: Herbert Euler @ 2006-05-15 15:17 UTC (permalink / raw)
  Cc: emacs-devel

I followed these steps:

    - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE
      is OK.  For example, create a file contains "a" in UTF-16LE as
      its content and name this file with "1".

    - Visit file "1" with C-x C-f.

In fact, files in UTF-16 can be interpreted as UTF-16 text, or ASCII
text with non-ASCII characters.  The UTF-16LE representation of
content of file "1" is "a", and the ASCII representation is
"\377\376a^@", where "\377\376" means the text is in UTF-16LE
encoding, and in which "a" is represented as "a^@" (^@ is \0 here).
If for some reason Emacs doesn't visit the file with correct encoding,
one can type C-x RET r followed by the correct encoding and RET to
correct it.

    - In case the buffer is encoded with raw-text-unix, the content is
      displayed as "\377\376a^@".  Type M-x hexl-mode RET, correct
      result is displayed (no description here, since it's easy to
      get).

    - In case the buffer is encoded with utf-16-le, the content is
      displayed as "a".  Type M-x hexl-mode RET, the result is

          \377?: Invalid argument

      displayed in the buffer.

This is because hexl-mode finishes its job as follows:

    1. Store the buffer content in a temporary file.

    2. Invoke "hexl" with argument "-hex" and stdin set to the
       temporary file, and put its output into the same buffer.  This
       is done by calling `call-process-region' (and so
       `call-process').

    3. Manipulate the output to generate correct result.

When the buffer is encoded with raw-text-unix, the code of
`Fcall_process' in callproc.c shown in the last mail will not convert
the argument "-hex", so the actual command to be invoked is "hexl
-hex".  But if the buffer is encoded with utf-16-le, "-hex" will be
converted to "\377\376-^@h^@e^@x^@", so the command to be invoked is
"hexl \377\376-^@h^@e^@x^@".  Since "^@" is actually '\0', "hexl"
would see "\377\376-" as its first argument.  That's why the content
displayed in the second case is an error message.  The following code
of hexl-mode can't manipulate the (wrong) output correctly as a
result.

Hope I've described clearly.

Regards,
Guanpeng Xu


>From: Stefan Monnier <monnier@iro.umontreal.ca>
>To: "Herbert Euler" <herberteuler@hotmail.com>
>CC: emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Mon, 15 May 2006 10:25:27 -0400
>
> > Fcall_process in callproc.c, which is correspond to `call-process',
> > cannot handle UTF-16 (both LE or BE) correctly.  Take a look at line
>
>Actually, it handles it just fine.  The problem is that call-process and
>start-process both use the same coding system to encode arguments and to
>encode the data sent via stdin to the process, whereas you want them to
>be distinct.
>If you want them to be distinct, then you need to manually encode your
>arguments before passing them to call-process.
>
>I.e. the bug with hexl-mode is in hexl.el.  Please report it separately
>indicating how to reproduce the problem (I don't know how to "applying
>`hexl-mode' to UTF-16 texts").
>
>
>         Stefan

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-15 15:17   ` Herbert Euler
@ 2006-05-15 16:06     ` Stefan Monnier
  2006-05-16  2:59       ` Herbert Euler
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier @ 2006-05-15 16:06 UTC (permalink / raw)
  Cc: emacs-devel

>    - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE
>      is OK.  For example, create a file contains "a" in UTF-16LE as
>      its content and name this file with "1".
[...]
>    - In case the buffer is encoded with utf-16-le, the content is
>      displayed as "a".  Type M-x hexl-mode RET, the result is

>          \377?: Invalid argument

>      displayed in the buffer.

Thanks.  I've installed the patch below which should fix the problem.
Please confirm,


        Stefan


--- hexl.el	11 avr 2006 12:45:49 -0400	1.103
+++ hexl.el	15 mai 2006 12:02:32 -0400	
@@ -704,7 +704,12 @@
 	(buffer-undo-list t))
     (apply 'call-process-region (point-min) (point-max)
 	   (expand-file-name hexl-program exec-directory)
-	   t t nil (split-string hexl-options))
+	   t t nil
+           ;; Manually encode the args, otherwise they're encoded using
+           ;; coding-system-for-write (i.e. buffer-file-coding-system) which
+           ;; may not be what we want (e.g. utf-16 on a non-utf-16 system).
+           (mapcar (lambda (s) (encode-coding-string s locale-coding-system))
+                   (split-string hexl-options)))
     (if (> (point) (hexl-address-to-marker hexl-max-address))
 	(hexl-goto-address hexl-max-address))))

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-15 16:06     ` Stefan Monnier
@ 2006-05-16  2:59       ` Herbert Euler
  2006-05-16  4:10         ` Kenichi Handa
  0 siblings, 1 reply; 18+ messages in thread
From: Herbert Euler @ 2006-05-16  2:59 UTC (permalink / raw)


This doesn't work.  I've followed the code, seems the reason is as
follows.

You changed the code in hexl.el to:

  (let ((coding-system-for-read 'raw-text)
        (coding-system-for-write buffer-file-coding-system)
        (buffer-undo-list t))
    (apply 'call-process-region (point-min) (point-max)
           (expand-file-name hexl-program exec-directory)
           t t nil
           ;; Manually encode the args, otherwise they're encoded using
           ;; coding-system-for-write (i.e. buffer-file-coding-system) which
           ;; may not be what we want (e.g. utf-16 on a non-utf-16 system).
           (mapcar (lambda (s) (encode-coding-string s 
locale-coding-system))
                   (split-string hexl-options)))

So when invoking call-process, the value of `coding-system-for-write'
is not nil.  In my test, it is `utf-16le-with-signature'.  The
coding-decide part in callproc.c is line 269 to 300:

    if (nargs >= 5)
      {
        int must_encode = 0;

        for (i = 4; i < nargs; i++)
          CHECK_STRING (args[i]);

        for (i = 4; i < nargs; i++)
          if (STRING_MULTIBYTE (args[i]))
            must_encode = 1;

        if (!NILP (Vcoding_system_for_write))
          val = Vcoding_system_for_write;
        else if (! must_encode)
          val = Qnil;
        else
          {
            args2 = (Lisp_Object *) alloca ((nargs + 1) * sizeof *args2);
            args2[0] = Qcall_process;
            for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
            coding_systems = Ffind_operation_coding_system (nargs + 1, 
args2);
            if (CONSP (coding_systems))
              val = XCDR (coding_systems);
            else if (CONSP (Vdefault_process_coding_system))
              val = XCDR (Vdefault_process_coding_system);
            else
              val = Qnil;
          }
        val = coding_inherit_eol_type (val, Qnil);
        setup_coding_system (Fcheck_coding_system (val), &argument_coding);
      }
  }

If `Vcoding_system_for_write' is not nil, `val' will be set to that
value.  So at the last line of this code, `detector', `decoder', and
`encoder' field of `argument_coding' will be set to UTF-16 relative
ones, and CODING_REQUIRE_ENCODING_MASK flag is turned on for
`common_flags' of `argument_coding' in coding.c, line 5042 to 5059:

  else if (EQ (coding_type, Qutf_16))
    {
      val = AREF (attrs, coding_attr_utf_16_bom);
      CODING_UTF_16_BOM (coding) = (CONSP (val) ? utf_16_detect_bom
                                    : EQ (val, Qt) ? utf_16_with_bom
                                    : utf_16_without_bom);
      val = AREF (attrs, coding_attr_utf_16_endian);
      CODING_UTF_16_ENDIAN (coding) = (EQ (val, Qbig) ? utf_16_big_endian
                                       : utf_16_little_endian);
      CODING_UTF_16_SURROGATE (coding) = 0;
      coding->detector = detect_coding_utf_16;
      coding->decoder = decode_coding_utf_16;
      coding->encoder = encode_coding_utf_16;
      coding->common_flags
        |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK);
      if (CODING_UTF_16_BOM (coding) == utf_16_detect_bom)
        coding->common_flags |= CODING_REQUIRE_DETECTION_MASK;
    }

Go back to line 410 to 427, callproc.c:

  if (nargs > 4)
    {
      register int i;
      struct gcpro gcpro1, gcpro2, gcpro3;

      GCPRO3 (infile, buffer, current_dir);
      argument_coding.dst_multibyte = 0;
      for (i = 4; i < nargs; i++)
        {
          argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]);
          if (CODING_REQUIRE_ENCODING (&argument_coding))
            /* We must encode this argument.  */
            args[i] = encode_coding_string (&argument_coding, args[i], 1);
          new_argv[i - 3] = SDATA (args[i]);
        }
      UNGCPRO;
      new_argv[nargs - 3] = 0;
    }

`CODING_REQUIRE_ENCODING' test the following things (line 491 to 496,
coding.h):

/* Return 1 if the coding context CODING requires code conversion on
   encoding.  */
#define CODING_REQUIRE_ENCODING(coding)				\
  ((coding)->src_multibyte					\
   || (coding)->common_flags & CODING_REQUIRE_ENCODING_MASK	\
   || (coding)->mode & CODING_MODE_SELECTIVE_DISPLAY)

Although `argument_coding.src_multibyte' may be 0,
`argument_coding.common_flags & CODING_REQUIRE_ENCODING_MASK' must be
non-zero in this case.  So `CODING_REQUIRE_ENCODING
(&argument_coding)' will return true.

As a result, whether arguments are encoded with `encode-coding-string'
like in your change will not affect the conversion done by
`call-process'.  Perhaps we should not set `coding-system-for-write'
in `let' special form in such conditions.

And there is another problem: if `locale-coding-system' is UTF-16, is
it correct to add prefix "\377\376" or "\376\377" to every command
argument?  If not, the current code of `call-process' is wrong, since
it will always add the prefix.

Regards,
Guanpeng Xu


>From: Stefan Monnier <monnier@iro.umontreal.ca>
>To: "Herbert Euler" <herberteuler@hotmail.com>
>CC: emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Mon, 15 May 2006 12:06:48 -0400
>
> >    - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE
> >      is OK.  For example, create a file contains "a" in UTF-16LE as
> >      its content and name this file with "1".
>[...]
> >    - In case the buffer is encoded with utf-16-le, the content is
> >      displayed as "a".  Type M-x hexl-mode RET, the result is
>
> >          \377?: Invalid argument
>
> >      displayed in the buffer.
>
>Thanks.  I've installed the patch below which should fix the problem.
>Please confirm,
>
>
>         Stefan
>
>
>--- hexl.el	11 avr 2006 12:45:49 -0400	1.103
>+++ hexl.el	15 mai 2006 12:02:32 -0400
>@@ -704,7 +704,12 @@
>  	(buffer-undo-list t))
>      (apply 'call-process-region (point-min) (point-max)
>  	   (expand-file-name hexl-program exec-directory)
>-	   t t nil (split-string hexl-options))
>+	   t t nil
>+           ;; Manually encode the args, otherwise they're encoded using
>+           ;; coding-system-for-write (i.e. buffer-file-coding-system) 
>which
>+           ;; may not be what we want (e.g. utf-16 on a non-utf-16 
>system).
>+           (mapcar (lambda (s) (encode-coding-string s 
>locale-coding-system))
>+                   (split-string hexl-options)))
>      (if (> (point) (hexl-address-to-marker hexl-max-address))
>  	(hexl-goto-address hexl-max-address))))
>
>
>
>_______________________________________________
>Emacs-devel mailing list
>Emacs-devel@gnu.org
>http://lists.gnu.org/mailman/listinfo/emacs-devel

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-16  2:59       ` Herbert Euler
@ 2006-05-16  4:10         ` Kenichi Handa
  2006-05-16  4:34           ` Herbert Euler
  2006-05-18 17:35           ` Stefan Monnier
  0 siblings, 2 replies; 18+ messages in thread
From: Kenichi Handa @ 2006-05-16  4:10 UTC (permalink / raw)
  Cc: emacs-devel

In article <BAY112-F7409CF46063B56C37BE1ADAA00@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes:

> `CODING_REQUIRE_ENCODING' test the following things (line 491 to 496,
> coding.h):

> /* Return 1 if the coding context CODING requires code conversion on
>    encoding.  */
> #define CODING_REQUIRE_ENCODING(coding)				\
>   ((coding)->src_multibyte					\
>    || (coding)->common_flags & CODING_REQUIRE_ENCODING_MASK	\
>    || (coding)->mode & CODING_MODE_SELECTIVE_DISPLAY)

That is to make it possible to do encoding of unibyte
string/buffer generated by string-as-unibyte or
(set-buffer-multibyte nil) from multibyte string/buffer.
Perhaps we should not allow such an operation, but as this
feature is there for long, it seems dangerous to change it
now.

How about disabling encoding only for process arguments if
they are already unibyte?  I think such a change is very
safe.

> And there is another problem: if `locale-coding-system' is UTF-16, is
> it correct to add prefix "\377\376" or "\376\377" to every command
> argument?  If not, the current code of `call-process' is wrong, since
> it will always add the prefix.

I think there's no locale that uses utf-16, and it's
impossible to support such a locale because most of basic
libc functions that accept a filename require that it is
terminated by NULL.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-16  4:10         ` Kenichi Handa
@ 2006-05-16  4:34           ` Herbert Euler
  2006-05-16  4:39             ` Kenichi Handa
  2006-05-18 17:35           ` Stefan Monnier
  1 sibling, 1 reply; 18+ messages in thread
From: Herbert Euler @ 2006-05-16  4:34 UTC (permalink / raw)
  Cc: emacs-devel

>From: Kenichi Handa <handa@m17n.org>
>To: "Herbert Euler" <herberteuler@hotmail.com>
>CC: emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Tue, 16 May 2006 13:10:30 +0900
>
>I think there's no locale that uses utf-16, and it's
>impossible to support such a locale because most of basic
>libc functions that accept a filename require that it is
>terminated by NULL.

Oh, I see my fault.  At the same time, I see whether a string is
unibyte-string is tested with STRING_MULTIBYTE (line 674 to 676,
lisp.h):

    /* Nonzero if STR is a multibyte string.  */
    #define STRING_MULTIBYTE(STR)  \
      (XSTRING (STR)->size_byte >= 0)

I don't know how `size_byte' is set.  Is it done by scanning a string
and watching the range of each byte (or some bytes) of the string?  If
it is in this case and we assume that no command argument will be in
UTF-16 encode, disabling argument encoding for unibyte-string seems
the best solution.

Regards,
Guanpeng Xu

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-16  4:34           ` Herbert Euler
@ 2006-05-16  4:39             ` Kenichi Handa
  2006-05-16  5:40               ` Herbert Euler
  0 siblings, 1 reply; 18+ messages in thread
From: Kenichi Handa @ 2006-05-16  4:39 UTC (permalink / raw)
  Cc: emacs-devel

In article <BAY112-F28627FA81042D1139276A3DAA00@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes:

> Oh, I see my fault.  At the same time, I see whether a string is
> unibyte-string is tested with STRING_MULTIBYTE (line 674 to 676,
> lisp.h):

>     /* Nonzero if STR is a multibyte string.  */
>     #define STRING_MULTIBYTE(STR)  \
>       (XSTRING (STR)->size_byte >= 0)

> I don't know how `size_byte' is set.  Is it done by scanning a string
> and watching the range of each byte (or some bytes) of the
> string?

No.  XSTRING (STR)->size_byte is set when a string is
created depending on how it is created (by
make_unibyte_string or make_multibyte_string or ...).

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-16  4:39             ` Kenichi Handa
@ 2006-05-16  5:40               ` Herbert Euler
  2006-05-18  2:24                 ` Kenichi Handa
  0 siblings, 1 reply; 18+ messages in thread
From: Herbert Euler @ 2006-05-16  5:40 UTC (permalink / raw)
  Cc: emacs-devel

>From: Kenichi Handa <handa@m17n.org>
>To: "Herbert Euler" <herberteuler@hotmail.com>
>CC: emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Tue, 16 May 2006 13:39:54 +0900
>
>In article <BAY112-F28627FA81042D1139276A3DAA00@phx.gbl>, "Herbert Euler" 
><herberteuler@hotmail.com> writes:
>
> > Oh, I see my fault.  At the same time, I see whether a string is
> > unibyte-string is tested with STRING_MULTIBYTE (line 674 to 676,
> > lisp.h):
>
> >     /* Nonzero if STR is a multibyte string.  */
> >     #define STRING_MULTIBYTE(STR)  \
> >       (XSTRING (STR)->size_byte >= 0)
>
> > I don't know how `size_byte' is set.  Is it done by scanning a string
> > and watching the range of each byte (or some bytes) of the
> > string?
>
>No.  XSTRING (STR)->size_byte is set when a string is
>created depending on how it is created (by
>make_unibyte_string or make_multibyte_string or ...).

What is encoding arguments for?  For unifying character encodings?
I.e. if the file is in japanese-shift-jis, but command argument is in
chinese-gbk, encoding arguments will make sure all characters are in
japanese-shift-jis, won't it?

As you stated, there is no locale uses utf-16, so if utf-16 characters
appear as command arguments, we can't expect most programs will have
correct behaviors or at least the same behaviors as no utf-16
characters appear as command arguments, even if the commands are
invoked within, for instance, shell scripts.

So perhaps we should only prevent encoding arguments for utf-16?

Regards,
Guanpeng Xu

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-16  5:40               ` Herbert Euler
@ 2006-05-18  2:24                 ` Kenichi Handa
  2006-05-18  6:07                   ` Herbert Euler
  2006-05-19  3:01                   ` Herbert Euler
  0 siblings, 2 replies; 18+ messages in thread
From: Kenichi Handa @ 2006-05-18  2:24 UTC (permalink / raw)
  Cc: emacs-devel

In article <BAY112-F3293F5E8A45FB2A7EA1BC3DAA00@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes:

> What is encoding arguments for?

To give them to a program/process in an encoding the program
requests.

> For unifying character encodings?

I don't understand the meaning of "unifying character
encodings".

> I.e. if the file is in japanese-shift-jis, but command argument is in
> chinese-gbk, encoding arguments will make sure all characters are in
> japanese-shift-jis, won't it?

I don't understand what "if ..." part actually means.  Who
makes command argument in chinese-gbk?

> As you stated, there is no locale uses utf-16, so if utf-16 characters
> appear as command arguments, we can't expect most programs will have
> correct behaviors or at least the same behaviors as no utf-16
> characters appear as command arguments, even if the commands are
> invoked within, for instance, shell scripts.

> So perhaps we should only prevent encoding arguments for utf-16?

It seems to be a good workaround for the hexl-mode because
it won't break anything.  So, I installed a proper change
for that.

Though, it doesn't solve the generic problem of "how to
handle the case that the program requests different encoding
for arguments and file (or stdin)".  I think we should solve
it after the release.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-18  2:24                 ` Kenichi Handa
@ 2006-05-18  6:07                   ` Herbert Euler
  2006-05-18  6:14                     ` Herbert Euler
  2006-05-18  6:26                     ` Kenichi Handa
  2006-05-19  3:01                   ` Herbert Euler
  1 sibling, 2 replies; 18+ messages in thread
From: Herbert Euler @ 2006-05-18  6:07 UTC (permalink / raw)
  Cc: emacs-devel

>From: Kenichi Handa <handa@m17n.org>
>To: "Herbert Euler" <herberteuler@hotmail.com>
>CC: emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Thu, 18 May 2006 11:24:55 +0900
>
> > For unifying character encodings?
>
>I don't understand the meaning of "unifying character
>encodings".

I meant to make encoding for arguments and file the same.

> > I.e. if the file is in japanese-shift-jis, but command argument is in
> > chinese-gbk, encoding arguments will make sure all characters are in
> > japanese-shift-jis, won't it?
>
>I don't understand what "if ..." part actually means.  Who
>makes command argument in chinese-gbk?

For example, I wrote a lisp command which uses `call-process' and
contains characters in chinese-gbk as arguments.  I meant, when
I apply this command to a japanese-shift-jis file, `call-process' will
encode the chinese-gbk characters to japanese-shift-jis in background,
won't it?

> > As you stated, there is no locale uses utf-16, so if utf-16 characters
> > appear as command arguments, we can't expect most programs will have
> > correct behaviors or at least the same behaviors as no utf-16
> > characters appear as command arguments, even if the commands are
> > invoked within, for instance, shell scripts.
>
> > So perhaps we should only prevent encoding arguments for utf-16?
>
>It seems to be a good workaround for the hexl-mode because
>it won't break anything.  So, I installed a proper change
>for that.
>
>Though, it doesn't solve the generic problem of "how to
>handle the case that the program requests different encoding
>for arguments and file (or stdin)".  I think we should solve
>it after the release.

In my opinion, most programs seem not to require different
encoding for arguments and file.  Think about a program
requires Japanese relative encoding as file encoding and
Chinese relative encoding as argument encoding.  If I provide
simplified Chinese characters, which are not in the specific
Japanese encoding, in command arguments, this program
seems hardly taking a acceptable behavior, even if I execute
the program by typing it in Shell.

Namely, cross-encoding would make sense only if all the
different encodings contain all characters involved and
represent them the same way in an execution.  In other
conditions, users can't expect acceptable result.  This
unique condition is likely what already exists in the current
code, except converting to utf-16.

Regards,
Guanpeng Xu

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-18  6:07                   ` Herbert Euler
@ 2006-05-18  6:14                     ` Herbert Euler
  2006-05-18  6:26                     ` Kenichi Handa
  1 sibling, 0 replies; 18+ messages in thread
From: Herbert Euler @ 2006-05-18  6:14 UTC (permalink / raw)
  Cc: emacs-devel

>From: "Herbert Euler" <herberteuler@hotmail.com>
>To: handa@m17n.org
>CC: emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Thu, 18 May 2006 14:07:04 +0800
>
>In my opinion, most programs seem not to require different
>encoding for arguments and file.  Think about a program
>requires Japanese relative encoding as file encoding and
>Chinese relative encoding as argument encoding.  If I provide
>simplified Chinese characters, which are not in the specific
>Japanese encoding, in command arguments, this program
>seems hardly taking a acceptable behavior, even if I execute
>the program by typing it in Shell.
>
>Namely, cross-encoding would make sense only if all the
>different encodings contain all characters involved and
>represent them the same way in an execution.  In other
>conditions, users can't expect acceptable result.  This
>unique condition is likely what already exists in the current
>code, except converting to utf-16.

There do be exceptions, such as programs converting
arguments internally.  But even these programs are
not likely to use more than two encodings as argument
encoding.  To me, it seems that these programs are
not generally used for general purposes, so when
`call-process' is applied to these programs
it's the caller's responsibility to adjust encoding.

Regards,
Guanpeng Xu

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-18  6:07                   ` Herbert Euler
  2006-05-18  6:14                     ` Herbert Euler
@ 2006-05-18  6:26                     ` Kenichi Handa
  2006-05-18  6:40                       ` Herbert Euler
  1 sibling, 1 reply; 18+ messages in thread
From: Kenichi Handa @ 2006-05-18  6:26 UTC (permalink / raw)
  Cc: emacs-devel

In article <BAY112-F33EB4F35D6ECFFD897275DDAA60@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes:

>> > For unifying character encodings?
>> 
>> I don't understand the meaning of "unifying character
>> encodings".

> I meant to make encoding for arguments and file the same.

I see.

>> > I.e. if the file is in japanese-shift-jis, but command argument is in
>> > chinese-gbk, encoding arguments will make sure all characters are in
>> > japanese-shift-jis, won't it?
>> 
>> I don't understand what "if ..." part actually means.  Who
>> makes command argument in chinese-gbk?

> For example, I wrote a lisp command which uses `call-process' and
> contains characters in chinese-gbk as arguments.  I meant, when
> I apply this command to a japanese-shift-jis file, `call-process' will
> encode the chinese-gbk characters to japanese-shift-jis in background,
> won't it?

It's hard to understand what you mean.  What do you mean by
"apply this command to ... file"?  Does it mean that you
give the file name to call-process as INFILE argument?  But,
how does it result in "encode the chinese-gbk characters to
japanese-shift-jis"?  Emacs doesn't detect the encoding of
INFILE.  So how does Emacs know about `japanese-shift-jis'
first of all?

And first of all, CVS Emacs doesn't have chinese-gbk coding
system.  Are you talking about the behavior of
emacs-unicode-2?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-18  6:26                     ` Kenichi Handa
@ 2006-05-18  6:40                       ` Herbert Euler
  0 siblings, 0 replies; 18+ messages in thread
From: Herbert Euler @ 2006-05-18  6:40 UTC (permalink / raw)
  Cc: emacs-devel

>From: Kenichi Handa <handa@m17n.org>
>To: "Herbert Euler" <herberteuler@hotmail.com>
>CC: emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Thu, 18 May 2006 15:26:54 +0900
>
> >> > I.e. if the file is in japanese-shift-jis, but command argument is in
> >> > chinese-gbk, encoding arguments will make sure all characters are in
> >> > japanese-shift-jis, won't it?
> >>
> >> I don't understand what "if ..." part actually means.  Who
> >> makes command argument in chinese-gbk?
>
> > For example, I wrote a lisp command which uses `call-process' and
> > contains characters in chinese-gbk as arguments.  I meant, when
> > I apply this command to a japanese-shift-jis file, `call-process' will
> > encode the chinese-gbk characters to japanese-shift-jis in background,
> > won't it?
>
>It's hard to understand what you mean.  What do you mean by
>"apply this command to ... file"?  Does it mean that you
>give the file name to call-process as INFILE argument?  But,
>how does it result in "encode the chinese-gbk characters to
>japanese-shift-jis"?  Emacs doesn't detect the encoding of
>INFILE.  So how does Emacs know about `japanese-shift-jis'
>first of all?
>
>And first of all, CVS Emacs doesn't have chinese-gbk coding
>system.  Are you talking about the behavior of
>emacs-unicode-2?

Encodings here are just examples; I should use them as
encoding A and B.  And my opinion is wrong, I don't know
all the real behavior of `call-process', I thought arguments
will be encoded to the file encoding.

Regards,
Guanpeng Xu

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-16  4:10         ` Kenichi Handa
  2006-05-16  4:34           ` Herbert Euler
@ 2006-05-18 17:35           ` Stefan Monnier
  2006-05-19  2:49             ` Herbert Euler
  1 sibling, 1 reply; 18+ messages in thread
From: Stefan Monnier @ 2006-05-18 17:35 UTC (permalink / raw)
  Cc: Herbert Euler, emacs-devel

>> `CODING_REQUIRE_ENCODING' test the following things (line 491 to 496,
>> coding.h):

>> /* Return 1 if the coding context CODING requires code conversion on
>> encoding.  */
>> #define CODING_REQUIRE_ENCODING(coding)				\
>> ((coding)->src_multibyte					\
>> || (coding)->common_flags & CODING_REQUIRE_ENCODING_MASK	\
>> || (coding)->mode & CODING_MODE_SELECTIVE_DISPLAY)

> That is to make it possible to do encoding of unibyte string/buffer
> generated by string-as-unibyte or (set-buffer-multibyte nil) from
> multibyte string/buffer.  Perhaps we should not allow such an operation,
> but as this feature is there for long, it seems dangerous to change
> it now.

The problem is that if you allow encoding to be applied to unibyte strings,
then there is no reliable way to represent bytes (as opposed to 8bit chars):
there's always the risk that they'll be encoded.
I'd rather make it clear that a unibyte string contains bytes and not chars.

> How about disabling encoding only for process arguments if
> they are already unibyte?  I think such a change is very
> safe.

Yes, that sounds right.


        Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-18 17:35           ` Stefan Monnier
@ 2006-05-19  2:49             ` Herbert Euler
  2006-05-19 10:41               ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Herbert Euler @ 2006-05-19  2:49 UTC (permalink / raw)
  Cc: emacs-devel

>From: Stefan Monnier <monnier@iro.umontreal.ca>
>To: Kenichi Handa <handa@m17n.org>
>CC: "Herbert Euler" <herberteuler@hotmail.com>,  emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Thu, 18 May 2006 13:35:23 -0400
>
> > How about disabling encoding only for process arguments if
> > they are already unibyte?  I think such a change is very
> > safe.
>
>Yes, that sounds right.

I'm sorry but how does Emacs decide whether a string
is unibyte?

Regards,
Guanpeng Xu

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-18  2:24                 ` Kenichi Handa
  2006-05-18  6:07                   ` Herbert Euler
@ 2006-05-19  3:01                   ` Herbert Euler
  1 sibling, 0 replies; 18+ messages in thread
From: Herbert Euler @ 2006-05-19  3:01 UTC (permalink / raw)
  Cc: emacs-devel

>From: Kenichi Handa <handa@m17n.org>
>To: "Herbert Euler" <herberteuler@hotmail.com>
>CC: emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Thu, 18 May 2006 11:24:55 +0900
>
>Though, it doesn't solve the generic problem of "how to
>handle the case that the program requests different encoding
>for arguments and file (or stdin)".  I think we should solve
>it after the release.

What kind of programs is `call-process' designed for calling?
Maybe another function that is able to set different encoding
for file and argument is a better interface to programs requests
different encoding for file and argument than `call-process'.

Regards,
Guanpeng Xu

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fcall_process: wrong conversion
  2006-05-19  2:49             ` Herbert Euler
@ 2006-05-19 10:41               ` Eli Zaretskii
  0 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2006-05-19 10:41 UTC (permalink / raw)
  Cc: emacs-devel

> From: "Herbert Euler" <herberteuler@hotmail.com>
> Date: Fri, 19 May 2006 10:49:33 +0800
> Cc: emacs-devel@gnu.org
> 
> I'm sorry but how does Emacs decide whether a string
> is unibyte?

See multibyte-string-p.  It boils down to this macro from lisp.h:

    /* Nonzero if STR is a multibyte string.  */
    #define STRING_MULTIBYTE(STR)  \
      (XSTRING (STR)->size_byte >= 0)

In other words, this information is recorded in the string object
itself.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-05-19 10:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-15  6:09 Fcall_process: wrong conversion Herbert Euler
2006-05-15 14:25 ` Stefan Monnier
2006-05-15 15:17   ` Herbert Euler
2006-05-15 16:06     ` Stefan Monnier
2006-05-16  2:59       ` Herbert Euler
2006-05-16  4:10         ` Kenichi Handa
2006-05-16  4:34           ` Herbert Euler
2006-05-16  4:39             ` Kenichi Handa
2006-05-16  5:40               ` Herbert Euler
2006-05-18  2:24                 ` Kenichi Handa
2006-05-18  6:07                   ` Herbert Euler
2006-05-18  6:14                     ` Herbert Euler
2006-05-18  6:26                     ` Kenichi Handa
2006-05-18  6:40                       ` Herbert Euler
2006-05-19  3:01                   ` Herbert Euler
2006-05-18 17:35           ` Stefan Monnier
2006-05-19  2:49             ` Herbert Euler
2006-05-19 10:41               ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).