all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Herbert Euler" <herberteuler@hotmail.com>
Subject: Re: Fcall_process: wrong conversion
Date: Tue, 16 May 2006 10:59:10 +0800	[thread overview]
Message-ID: <BAY112-F7409CF46063B56C37BE1ADAA00@phx.gbl> (raw)
In-Reply-To: <k6z8xp3z31n.fsf-monnier+emacs@gnu.org>

This doesn't work.  I've followed the code, seems the reason is as
follows.

You changed the code in hexl.el to:

  (let ((coding-system-for-read 'raw-text)
        (coding-system-for-write buffer-file-coding-system)
        (buffer-undo-list t))
    (apply 'call-process-region (point-min) (point-max)
           (expand-file-name hexl-program exec-directory)
           t t nil
           ;; Manually encode the args, otherwise they're encoded using
           ;; coding-system-for-write (i.e. buffer-file-coding-system) which
           ;; may not be what we want (e.g. utf-16 on a non-utf-16 system).
           (mapcar (lambda (s) (encode-coding-string s 
locale-coding-system))
                   (split-string hexl-options)))

So when invoking call-process, the value of `coding-system-for-write'
is not nil.  In my test, it is `utf-16le-with-signature'.  The
coding-decide part in callproc.c is line 269 to 300:

    if (nargs >= 5)
      {
        int must_encode = 0;

        for (i = 4; i < nargs; i++)
          CHECK_STRING (args[i]);

        for (i = 4; i < nargs; i++)
          if (STRING_MULTIBYTE (args[i]))
            must_encode = 1;

        if (!NILP (Vcoding_system_for_write))
          val = Vcoding_system_for_write;
        else if (! must_encode)
          val = Qnil;
        else
          {
            args2 = (Lisp_Object *) alloca ((nargs + 1) * sizeof *args2);
            args2[0] = Qcall_process;
            for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
            coding_systems = Ffind_operation_coding_system (nargs + 1, 
args2);
            if (CONSP (coding_systems))
              val = XCDR (coding_systems);
            else if (CONSP (Vdefault_process_coding_system))
              val = XCDR (Vdefault_process_coding_system);
            else
              val = Qnil;
          }
        val = coding_inherit_eol_type (val, Qnil);
        setup_coding_system (Fcheck_coding_system (val), &argument_coding);
      }
  }

If `Vcoding_system_for_write' is not nil, `val' will be set to that
value.  So at the last line of this code, `detector', `decoder', and
`encoder' field of `argument_coding' will be set to UTF-16 relative
ones, and CODING_REQUIRE_ENCODING_MASK flag is turned on for
`common_flags' of `argument_coding' in coding.c, line 5042 to 5059:

  else if (EQ (coding_type, Qutf_16))
    {
      val = AREF (attrs, coding_attr_utf_16_bom);
      CODING_UTF_16_BOM (coding) = (CONSP (val) ? utf_16_detect_bom
                                    : EQ (val, Qt) ? utf_16_with_bom
                                    : utf_16_without_bom);
      val = AREF (attrs, coding_attr_utf_16_endian);
      CODING_UTF_16_ENDIAN (coding) = (EQ (val, Qbig) ? utf_16_big_endian
                                       : utf_16_little_endian);
      CODING_UTF_16_SURROGATE (coding) = 0;
      coding->detector = detect_coding_utf_16;
      coding->decoder = decode_coding_utf_16;
      coding->encoder = encode_coding_utf_16;
      coding->common_flags
        |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK);
      if (CODING_UTF_16_BOM (coding) == utf_16_detect_bom)
        coding->common_flags |= CODING_REQUIRE_DETECTION_MASK;
    }

Go back to line 410 to 427, callproc.c:

  if (nargs > 4)
    {
      register int i;
      struct gcpro gcpro1, gcpro2, gcpro3;

      GCPRO3 (infile, buffer, current_dir);
      argument_coding.dst_multibyte = 0;
      for (i = 4; i < nargs; i++)
        {
          argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]);
          if (CODING_REQUIRE_ENCODING (&argument_coding))
            /* We must encode this argument.  */
            args[i] = encode_coding_string (&argument_coding, args[i], 1);
          new_argv[i - 3] = SDATA (args[i]);
        }
      UNGCPRO;
      new_argv[nargs - 3] = 0;
    }

`CODING_REQUIRE_ENCODING' test the following things (line 491 to 496,
coding.h):

/* Return 1 if the coding context CODING requires code conversion on
   encoding.  */
#define CODING_REQUIRE_ENCODING(coding)				\
  ((coding)->src_multibyte					\
   || (coding)->common_flags & CODING_REQUIRE_ENCODING_MASK	\
   || (coding)->mode & CODING_MODE_SELECTIVE_DISPLAY)

Although `argument_coding.src_multibyte' may be 0,
`argument_coding.common_flags & CODING_REQUIRE_ENCODING_MASK' must be
non-zero in this case.  So `CODING_REQUIRE_ENCODING
(&argument_coding)' will return true.

As a result, whether arguments are encoded with `encode-coding-string'
like in your change will not affect the conversion done by
`call-process'.  Perhaps we should not set `coding-system-for-write'
in `let' special form in such conditions.

And there is another problem: if `locale-coding-system' is UTF-16, is
it correct to add prefix "\377\376" or "\376\377" to every command
argument?  If not, the current code of `call-process' is wrong, since
it will always add the prefix.

Regards,
Guanpeng Xu


>From: Stefan Monnier <monnier@iro.umontreal.ca>
>To: "Herbert Euler" <herberteuler@hotmail.com>
>CC: emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Mon, 15 May 2006 12:06:48 -0400
>
> >    - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE
> >      is OK.  For example, create a file contains "a" in UTF-16LE as
> >      its content and name this file with "1".
>[...]
> >    - In case the buffer is encoded with utf-16-le, the content is
> >      displayed as "a".  Type M-x hexl-mode RET, the result is
>
> >          \377?: Invalid argument
>
> >      displayed in the buffer.
>
>Thanks.  I've installed the patch below which should fix the problem.
>Please confirm,
>
>
>         Stefan
>
>
>--- hexl.el	11 avr 2006 12:45:49 -0400	1.103
>+++ hexl.el	15 mai 2006 12:02:32 -0400
>@@ -704,7 +704,12 @@
>  	(buffer-undo-list t))
>      (apply 'call-process-region (point-min) (point-max)
>  	   (expand-file-name hexl-program exec-directory)
>-	   t t nil (split-string hexl-options))
>+	   t t nil
>+           ;; Manually encode the args, otherwise they're encoded using
>+           ;; coding-system-for-write (i.e. buffer-file-coding-system) 
>which
>+           ;; may not be what we want (e.g. utf-16 on a non-utf-16 
>system).
>+           (mapcar (lambda (s) (encode-coding-string s 
>locale-coding-system))
>+                   (split-string hexl-options)))
>      (if (> (point) (hexl-address-to-marker hexl-max-address))
>  	(hexl-goto-address hexl-max-address))))
>
>
>
>_______________________________________________
>Emacs-devel mailing list
>Emacs-devel@gnu.org
>http://lists.gnu.org/mailman/listinfo/emacs-devel

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.com/

  reply	other threads:[~2006-05-16  2:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-15  6:09 Fcall_process: wrong conversion Herbert Euler
2006-05-15 14:25 ` Stefan Monnier
2006-05-15 15:17   ` Herbert Euler
2006-05-15 16:06     ` Stefan Monnier
2006-05-16  2:59       ` Herbert Euler [this message]
2006-05-16  4:10         ` Kenichi Handa
2006-05-16  4:34           ` Herbert Euler
2006-05-16  4:39             ` Kenichi Handa
2006-05-16  5:40               ` Herbert Euler
2006-05-18  2:24                 ` Kenichi Handa
2006-05-18  6:07                   ` Herbert Euler
2006-05-18  6:14                     ` Herbert Euler
2006-05-18  6:26                     ` Kenichi Handa
2006-05-18  6:40                       ` Herbert Euler
2006-05-19  3:01                   ` Herbert Euler
2006-05-18 17:35           ` Stefan Monnier
2006-05-19  2:49             ` Herbert Euler
2006-05-19 10:41               ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BAY112-F7409CF46063B56C37BE1ADAA00@phx.gbl \
    --to=herberteuler@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.