* Fcall_process: wrong conversion @ 2006-05-15 6:09 Herbert Euler 2006-05-15 14:25 ` Stefan Monnier 0 siblings, 1 reply; 18+ messages in thread From: Herbert Euler @ 2006-05-15 6:09 UTC (permalink / raw) Cc: herberteuler Hello, Fcall_process in callproc.c, which is correspond to `call-process', cannot handle UTF-16 (both LE or BE) correctly. Take a look at line 417 to 424, callproc.c: for (i = 4; i < nargs; i++) { argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]); if (CODING_REQUIRE_ENCODING (&argument_coding)) /* We must encode this argument. */ args[i] = encode_coding_string (&argument_coding, args[i], 1); new_argv[i - 3] = SDATA (args[i]); } If encoding is UTF-16, encode_coding_string will convert all ascii characters in an argument to wide ones, and add prefix to that argument. For example, if argv[4] is "-hex", it may be converted to "\376\377\0-\0h\0e\0x", which is normally not a correct argument to most programs and so causes these programs complaining about it. Even wide characters are converted to wrong arguments by adding "\376\377" or "\377\376". I found this problem when applying `hexl-mode' to UTF-16 texts. Could somebody help solve it? And I don't know whether similar problems resides somewhere else. Thanks very much. Regards, Guanpeng Xu _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-15 6:09 Fcall_process: wrong conversion Herbert Euler @ 2006-05-15 14:25 ` Stefan Monnier 2006-05-15 15:17 ` Herbert Euler 0 siblings, 1 reply; 18+ messages in thread From: Stefan Monnier @ 2006-05-15 14:25 UTC (permalink / raw) Cc: emacs-devel > Fcall_process in callproc.c, which is correspond to `call-process', > cannot handle UTF-16 (both LE or BE) correctly. Take a look at line Actually, it handles it just fine. The problem is that call-process and start-process both use the same coding system to encode arguments and to encode the data sent via stdin to the process, whereas you want them to be distinct. If you want them to be distinct, then you need to manually encode your arguments before passing them to call-process. I.e. the bug with hexl-mode is in hexl.el. Please report it separately indicating how to reproduce the problem (I don't know how to "applying `hexl-mode' to UTF-16 texts"). Stefan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-15 14:25 ` Stefan Monnier @ 2006-05-15 15:17 ` Herbert Euler 2006-05-15 16:06 ` Stefan Monnier 0 siblings, 1 reply; 18+ messages in thread From: Herbert Euler @ 2006-05-15 15:17 UTC (permalink / raw) Cc: emacs-devel I followed these steps: - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE is OK. For example, create a file contains "a" in UTF-16LE as its content and name this file with "1". - Visit file "1" with C-x C-f. In fact, files in UTF-16 can be interpreted as UTF-16 text, or ASCII text with non-ASCII characters. The UTF-16LE representation of content of file "1" is "a", and the ASCII representation is "\377\376a^@", where "\377\376" means the text is in UTF-16LE encoding, and in which "a" is represented as "a^@" (^@ is \0 here). If for some reason Emacs doesn't visit the file with correct encoding, one can type C-x RET r followed by the correct encoding and RET to correct it. - In case the buffer is encoded with raw-text-unix, the content is displayed as "\377\376a^@". Type M-x hexl-mode RET, correct result is displayed (no description here, since it's easy to get). - In case the buffer is encoded with utf-16-le, the content is displayed as "a". Type M-x hexl-mode RET, the result is \377?: Invalid argument displayed in the buffer. This is because hexl-mode finishes its job as follows: 1. Store the buffer content in a temporary file. 2. Invoke "hexl" with argument "-hex" and stdin set to the temporary file, and put its output into the same buffer. This is done by calling `call-process-region' (and so `call-process'). 3. Manipulate the output to generate correct result. When the buffer is encoded with raw-text-unix, the code of `Fcall_process' in callproc.c shown in the last mail will not convert the argument "-hex", so the actual command to be invoked is "hexl -hex". But if the buffer is encoded with utf-16-le, "-hex" will be converted to "\377\376-^@h^@e^@x^@", so the command to be invoked is "hexl \377\376-^@h^@e^@x^@". Since "^@" is actually '\0', "hexl" would see "\377\376-" as its first argument. That's why the content displayed in the second case is an error message. The following code of hexl-mode can't manipulate the (wrong) output correctly as a result. Hope I've described clearly. Regards, Guanpeng Xu >From: Stefan Monnier <monnier@iro.umontreal.ca> >To: "Herbert Euler" <herberteuler@hotmail.com> >CC: emacs-devel@gnu.org >Subject: Re: Fcall_process: wrong conversion >Date: Mon, 15 May 2006 10:25:27 -0400 > > > Fcall_process in callproc.c, which is correspond to `call-process', > > cannot handle UTF-16 (both LE or BE) correctly. Take a look at line > >Actually, it handles it just fine. The problem is that call-process and >start-process both use the same coding system to encode arguments and to >encode the data sent via stdin to the process, whereas you want them to >be distinct. >If you want them to be distinct, then you need to manually encode your >arguments before passing them to call-process. > >I.e. the bug with hexl-mode is in hexl.el. Please report it separately >indicating how to reproduce the problem (I don't know how to "applying >`hexl-mode' to UTF-16 texts"). > > > Stefan _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-15 15:17 ` Herbert Euler @ 2006-05-15 16:06 ` Stefan Monnier 2006-05-16 2:59 ` Herbert Euler 0 siblings, 1 reply; 18+ messages in thread From: Stefan Monnier @ 2006-05-15 16:06 UTC (permalink / raw) Cc: emacs-devel > - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE > is OK. For example, create a file contains "a" in UTF-16LE as > its content and name this file with "1". [...] > - In case the buffer is encoded with utf-16-le, the content is > displayed as "a". Type M-x hexl-mode RET, the result is > \377?: Invalid argument > displayed in the buffer. Thanks. I've installed the patch below which should fix the problem. Please confirm, Stefan --- hexl.el 11 avr 2006 12:45:49 -0400 1.103 +++ hexl.el 15 mai 2006 12:02:32 -0400 @@ -704,7 +704,12 @@ (buffer-undo-list t)) (apply 'call-process-region (point-min) (point-max) (expand-file-name hexl-program exec-directory) - t t nil (split-string hexl-options)) + t t nil + ;; Manually encode the args, otherwise they're encoded using + ;; coding-system-for-write (i.e. buffer-file-coding-system) which + ;; may not be what we want (e.g. utf-16 on a non-utf-16 system). + (mapcar (lambda (s) (encode-coding-string s locale-coding-system)) + (split-string hexl-options))) (if (> (point) (hexl-address-to-marker hexl-max-address)) (hexl-goto-address hexl-max-address)))) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-15 16:06 ` Stefan Monnier @ 2006-05-16 2:59 ` Herbert Euler 2006-05-16 4:10 ` Kenichi Handa 0 siblings, 1 reply; 18+ messages in thread From: Herbert Euler @ 2006-05-16 2:59 UTC (permalink / raw) This doesn't work. I've followed the code, seems the reason is as follows. You changed the code in hexl.el to: (let ((coding-system-for-read 'raw-text) (coding-system-for-write buffer-file-coding-system) (buffer-undo-list t)) (apply 'call-process-region (point-min) (point-max) (expand-file-name hexl-program exec-directory) t t nil ;; Manually encode the args, otherwise they're encoded using ;; coding-system-for-write (i.e. buffer-file-coding-system) which ;; may not be what we want (e.g. utf-16 on a non-utf-16 system). (mapcar (lambda (s) (encode-coding-string s locale-coding-system)) (split-string hexl-options))) So when invoking call-process, the value of `coding-system-for-write' is not nil. In my test, it is `utf-16le-with-signature'. The coding-decide part in callproc.c is line 269 to 300: if (nargs >= 5) { int must_encode = 0; for (i = 4; i < nargs; i++) CHECK_STRING (args[i]); for (i = 4; i < nargs; i++) if (STRING_MULTIBYTE (args[i])) must_encode = 1; if (!NILP (Vcoding_system_for_write)) val = Vcoding_system_for_write; else if (! must_encode) val = Qnil; else { args2 = (Lisp_Object *) alloca ((nargs + 1) * sizeof *args2); args2[0] = Qcall_process; for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; coding_systems = Ffind_operation_coding_system (nargs + 1, args2); if (CONSP (coding_systems)) val = XCDR (coding_systems); else if (CONSP (Vdefault_process_coding_system)) val = XCDR (Vdefault_process_coding_system); else val = Qnil; } val = coding_inherit_eol_type (val, Qnil); setup_coding_system (Fcheck_coding_system (val), &argument_coding); } } If `Vcoding_system_for_write' is not nil, `val' will be set to that value. So at the last line of this code, `detector', `decoder', and `encoder' field of `argument_coding' will be set to UTF-16 relative ones, and CODING_REQUIRE_ENCODING_MASK flag is turned on for `common_flags' of `argument_coding' in coding.c, line 5042 to 5059: else if (EQ (coding_type, Qutf_16)) { val = AREF (attrs, coding_attr_utf_16_bom); CODING_UTF_16_BOM (coding) = (CONSP (val) ? utf_16_detect_bom : EQ (val, Qt) ? utf_16_with_bom : utf_16_without_bom); val = AREF (attrs, coding_attr_utf_16_endian); CODING_UTF_16_ENDIAN (coding) = (EQ (val, Qbig) ? utf_16_big_endian : utf_16_little_endian); CODING_UTF_16_SURROGATE (coding) = 0; coding->detector = detect_coding_utf_16; coding->decoder = decode_coding_utf_16; coding->encoder = encode_coding_utf_16; coding->common_flags |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK); if (CODING_UTF_16_BOM (coding) == utf_16_detect_bom) coding->common_flags |= CODING_REQUIRE_DETECTION_MASK; } Go back to line 410 to 427, callproc.c: if (nargs > 4) { register int i; struct gcpro gcpro1, gcpro2, gcpro3; GCPRO3 (infile, buffer, current_dir); argument_coding.dst_multibyte = 0; for (i = 4; i < nargs; i++) { argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]); if (CODING_REQUIRE_ENCODING (&argument_coding)) /* We must encode this argument. */ args[i] = encode_coding_string (&argument_coding, args[i], 1); new_argv[i - 3] = SDATA (args[i]); } UNGCPRO; new_argv[nargs - 3] = 0; } `CODING_REQUIRE_ENCODING' test the following things (line 491 to 496, coding.h): /* Return 1 if the coding context CODING requires code conversion on encoding. */ #define CODING_REQUIRE_ENCODING(coding) \ ((coding)->src_multibyte \ || (coding)->common_flags & CODING_REQUIRE_ENCODING_MASK \ || (coding)->mode & CODING_MODE_SELECTIVE_DISPLAY) Although `argument_coding.src_multibyte' may be 0, `argument_coding.common_flags & CODING_REQUIRE_ENCODING_MASK' must be non-zero in this case. So `CODING_REQUIRE_ENCODING (&argument_coding)' will return true. As a result, whether arguments are encoded with `encode-coding-string' like in your change will not affect the conversion done by `call-process'. Perhaps we should not set `coding-system-for-write' in `let' special form in such conditions. And there is another problem: if `locale-coding-system' is UTF-16, is it correct to add prefix "\377\376" or "\376\377" to every command argument? If not, the current code of `call-process' is wrong, since it will always add the prefix. Regards, Guanpeng Xu >From: Stefan Monnier <monnier@iro.umontreal.ca> >To: "Herbert Euler" <herberteuler@hotmail.com> >CC: emacs-devel@gnu.org >Subject: Re: Fcall_process: wrong conversion >Date: Mon, 15 May 2006 12:06:48 -0400 > > > - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE > > is OK. For example, create a file contains "a" in UTF-16LE as > > its content and name this file with "1". >[...] > > - In case the buffer is encoded with utf-16-le, the content is > > displayed as "a". Type M-x hexl-mode RET, the result is > > > \377?: Invalid argument > > > displayed in the buffer. > >Thanks. I've installed the patch below which should fix the problem. >Please confirm, > > > Stefan > > >--- hexl.el 11 avr 2006 12:45:49 -0400 1.103 >+++ hexl.el 15 mai 2006 12:02:32 -0400 >@@ -704,7 +704,12 @@ > (buffer-undo-list t)) > (apply 'call-process-region (point-min) (point-max) > (expand-file-name hexl-program exec-directory) >- t t nil (split-string hexl-options)) >+ t t nil >+ ;; Manually encode the args, otherwise they're encoded using >+ ;; coding-system-for-write (i.e. buffer-file-coding-system) >which >+ ;; may not be what we want (e.g. utf-16 on a non-utf-16 >system). >+ (mapcar (lambda (s) (encode-coding-string s >locale-coding-system)) >+ (split-string hexl-options))) > (if (> (point) (hexl-address-to-marker hexl-max-address)) > (hexl-goto-address hexl-max-address)))) > > > >_______________________________________________ >Emacs-devel mailing list >Emacs-devel@gnu.org >http://lists.gnu.org/mailman/listinfo/emacs-devel _________________________________________________________________ Don't just search. Find. Check out the new MSN Search! http://search.msn.com/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-16 2:59 ` Herbert Euler @ 2006-05-16 4:10 ` Kenichi Handa 2006-05-16 4:34 ` Herbert Euler 2006-05-18 17:35 ` Stefan Monnier 0 siblings, 2 replies; 18+ messages in thread From: Kenichi Handa @ 2006-05-16 4:10 UTC (permalink / raw) Cc: emacs-devel In article <BAY112-F7409CF46063B56C37BE1ADAA00@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes: > `CODING_REQUIRE_ENCODING' test the following things (line 491 to 496, > coding.h): > /* Return 1 if the coding context CODING requires code conversion on > encoding. */ > #define CODING_REQUIRE_ENCODING(coding) \ > ((coding)->src_multibyte \ > || (coding)->common_flags & CODING_REQUIRE_ENCODING_MASK \ > || (coding)->mode & CODING_MODE_SELECTIVE_DISPLAY) That is to make it possible to do encoding of unibyte string/buffer generated by string-as-unibyte or (set-buffer-multibyte nil) from multibyte string/buffer. Perhaps we should not allow such an operation, but as this feature is there for long, it seems dangerous to change it now. How about disabling encoding only for process arguments if they are already unibyte? I think such a change is very safe. > And there is another problem: if `locale-coding-system' is UTF-16, is > it correct to add prefix "\377\376" or "\376\377" to every command > argument? If not, the current code of `call-process' is wrong, since > it will always add the prefix. I think there's no locale that uses utf-16, and it's impossible to support such a locale because most of basic libc functions that accept a filename require that it is terminated by NULL. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-16 4:10 ` Kenichi Handa @ 2006-05-16 4:34 ` Herbert Euler 2006-05-16 4:39 ` Kenichi Handa 2006-05-18 17:35 ` Stefan Monnier 1 sibling, 1 reply; 18+ messages in thread From: Herbert Euler @ 2006-05-16 4:34 UTC (permalink / raw) Cc: emacs-devel >From: Kenichi Handa <handa@m17n.org> >To: "Herbert Euler" <herberteuler@hotmail.com> >CC: emacs-devel@gnu.org >Subject: Re: Fcall_process: wrong conversion >Date: Tue, 16 May 2006 13:10:30 +0900 > >I think there's no locale that uses utf-16, and it's >impossible to support such a locale because most of basic >libc functions that accept a filename require that it is >terminated by NULL. Oh, I see my fault. At the same time, I see whether a string is unibyte-string is tested with STRING_MULTIBYTE (line 674 to 676, lisp.h): /* Nonzero if STR is a multibyte string. */ #define STRING_MULTIBYTE(STR) \ (XSTRING (STR)->size_byte >= 0) I don't know how `size_byte' is set. Is it done by scanning a string and watching the range of each byte (or some bytes) of the string? If it is in this case and we assume that no command argument will be in UTF-16 encode, disabling argument encoding for unibyte-string seems the best solution. Regards, Guanpeng Xu _________________________________________________________________ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-16 4:34 ` Herbert Euler @ 2006-05-16 4:39 ` Kenichi Handa 2006-05-16 5:40 ` Herbert Euler 0 siblings, 1 reply; 18+ messages in thread From: Kenichi Handa @ 2006-05-16 4:39 UTC (permalink / raw) Cc: emacs-devel In article <BAY112-F28627FA81042D1139276A3DAA00@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes: > Oh, I see my fault. At the same time, I see whether a string is > unibyte-string is tested with STRING_MULTIBYTE (line 674 to 676, > lisp.h): > /* Nonzero if STR is a multibyte string. */ > #define STRING_MULTIBYTE(STR) \ > (XSTRING (STR)->size_byte >= 0) > I don't know how `size_byte' is set. Is it done by scanning a string > and watching the range of each byte (or some bytes) of the > string? No. XSTRING (STR)->size_byte is set when a string is created depending on how it is created (by make_unibyte_string or make_multibyte_string or ...). --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-16 4:39 ` Kenichi Handa @ 2006-05-16 5:40 ` Herbert Euler 2006-05-18 2:24 ` Kenichi Handa 0 siblings, 1 reply; 18+ messages in thread From: Herbert Euler @ 2006-05-16 5:40 UTC (permalink / raw) Cc: emacs-devel >From: Kenichi Handa <handa@m17n.org> >To: "Herbert Euler" <herberteuler@hotmail.com> >CC: emacs-devel@gnu.org >Subject: Re: Fcall_process: wrong conversion >Date: Tue, 16 May 2006 13:39:54 +0900 > >In article <BAY112-F28627FA81042D1139276A3DAA00@phx.gbl>, "Herbert Euler" ><herberteuler@hotmail.com> writes: > > > Oh, I see my fault. At the same time, I see whether a string is > > unibyte-string is tested with STRING_MULTIBYTE (line 674 to 676, > > lisp.h): > > > /* Nonzero if STR is a multibyte string. */ > > #define STRING_MULTIBYTE(STR) \ > > (XSTRING (STR)->size_byte >= 0) > > > I don't know how `size_byte' is set. Is it done by scanning a string > > and watching the range of each byte (or some bytes) of the > > string? > >No. XSTRING (STR)->size_byte is set when a string is >created depending on how it is created (by >make_unibyte_string or make_multibyte_string or ...). What is encoding arguments for? For unifying character encodings? I.e. if the file is in japanese-shift-jis, but command argument is in chinese-gbk, encoding arguments will make sure all characters are in japanese-shift-jis, won't it? As you stated, there is no locale uses utf-16, so if utf-16 characters appear as command arguments, we can't expect most programs will have correct behaviors or at least the same behaviors as no utf-16 characters appear as command arguments, even if the commands are invoked within, for instance, shell scripts. So perhaps we should only prevent encoding arguments for utf-16? Regards, Guanpeng Xu _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-16 5:40 ` Herbert Euler @ 2006-05-18 2:24 ` Kenichi Handa 2006-05-18 6:07 ` Herbert Euler 2006-05-19 3:01 ` Herbert Euler 0 siblings, 2 replies; 18+ messages in thread From: Kenichi Handa @ 2006-05-18 2:24 UTC (permalink / raw) Cc: emacs-devel In article <BAY112-F3293F5E8A45FB2A7EA1BC3DAA00@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes: > What is encoding arguments for? To give them to a program/process in an encoding the program requests. > For unifying character encodings? I don't understand the meaning of "unifying character encodings". > I.e. if the file is in japanese-shift-jis, but command argument is in > chinese-gbk, encoding arguments will make sure all characters are in > japanese-shift-jis, won't it? I don't understand what "if ..." part actually means. Who makes command argument in chinese-gbk? > As you stated, there is no locale uses utf-16, so if utf-16 characters > appear as command arguments, we can't expect most programs will have > correct behaviors or at least the same behaviors as no utf-16 > characters appear as command arguments, even if the commands are > invoked within, for instance, shell scripts. > So perhaps we should only prevent encoding arguments for utf-16? It seems to be a good workaround for the hexl-mode because it won't break anything. So, I installed a proper change for that. Though, it doesn't solve the generic problem of "how to handle the case that the program requests different encoding for arguments and file (or stdin)". I think we should solve it after the release. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-18 2:24 ` Kenichi Handa @ 2006-05-18 6:07 ` Herbert Euler 2006-05-18 6:14 ` Herbert Euler 2006-05-18 6:26 ` Kenichi Handa 2006-05-19 3:01 ` Herbert Euler 1 sibling, 2 replies; 18+ messages in thread From: Herbert Euler @ 2006-05-18 6:07 UTC (permalink / raw) Cc: emacs-devel >From: Kenichi Handa <handa@m17n.org> >To: "Herbert Euler" <herberteuler@hotmail.com> >CC: emacs-devel@gnu.org >Subject: Re: Fcall_process: wrong conversion >Date: Thu, 18 May 2006 11:24:55 +0900 > > > For unifying character encodings? > >I don't understand the meaning of "unifying character >encodings". I meant to make encoding for arguments and file the same. > > I.e. if the file is in japanese-shift-jis, but command argument is in > > chinese-gbk, encoding arguments will make sure all characters are in > > japanese-shift-jis, won't it? > >I don't understand what "if ..." part actually means. Who >makes command argument in chinese-gbk? For example, I wrote a lisp command which uses `call-process' and contains characters in chinese-gbk as arguments. I meant, when I apply this command to a japanese-shift-jis file, `call-process' will encode the chinese-gbk characters to japanese-shift-jis in background, won't it? > > As you stated, there is no locale uses utf-16, so if utf-16 characters > > appear as command arguments, we can't expect most programs will have > > correct behaviors or at least the same behaviors as no utf-16 > > characters appear as command arguments, even if the commands are > > invoked within, for instance, shell scripts. > > > So perhaps we should only prevent encoding arguments for utf-16? > >It seems to be a good workaround for the hexl-mode because >it won't break anything. So, I installed a proper change >for that. > >Though, it doesn't solve the generic problem of "how to >handle the case that the program requests different encoding >for arguments and file (or stdin)". I think we should solve >it after the release. In my opinion, most programs seem not to require different encoding for arguments and file. Think about a program requires Japanese relative encoding as file encoding and Chinese relative encoding as argument encoding. If I provide simplified Chinese characters, which are not in the specific Japanese encoding, in command arguments, this program seems hardly taking a acceptable behavior, even if I execute the program by typing it in Shell. Namely, cross-encoding would make sense only if all the different encodings contain all characters involved and represent them the same way in an execution. In other conditions, users can't expect acceptable result. This unique condition is likely what already exists in the current code, except converting to utf-16. Regards, Guanpeng Xu _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-18 6:07 ` Herbert Euler @ 2006-05-18 6:14 ` Herbert Euler 2006-05-18 6:26 ` Kenichi Handa 1 sibling, 0 replies; 18+ messages in thread From: Herbert Euler @ 2006-05-18 6:14 UTC (permalink / raw) Cc: emacs-devel >From: "Herbert Euler" <herberteuler@hotmail.com> >To: handa@m17n.org >CC: emacs-devel@gnu.org >Subject: Re: Fcall_process: wrong conversion >Date: Thu, 18 May 2006 14:07:04 +0800 > >In my opinion, most programs seem not to require different >encoding for arguments and file. Think about a program >requires Japanese relative encoding as file encoding and >Chinese relative encoding as argument encoding. If I provide >simplified Chinese characters, which are not in the specific >Japanese encoding, in command arguments, this program >seems hardly taking a acceptable behavior, even if I execute >the program by typing it in Shell. > >Namely, cross-encoding would make sense only if all the >different encodings contain all characters involved and >represent them the same way in an execution. In other >conditions, users can't expect acceptable result. This >unique condition is likely what already exists in the current >code, except converting to utf-16. There do be exceptions, such as programs converting arguments internally. But even these programs are not likely to use more than two encodings as argument encoding. To me, it seems that these programs are not generally used for general purposes, so when `call-process' is applied to these programs it's the caller's responsibility to adjust encoding. Regards, Guanpeng Xu _________________________________________________________________ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-18 6:07 ` Herbert Euler 2006-05-18 6:14 ` Herbert Euler @ 2006-05-18 6:26 ` Kenichi Handa 2006-05-18 6:40 ` Herbert Euler 1 sibling, 1 reply; 18+ messages in thread From: Kenichi Handa @ 2006-05-18 6:26 UTC (permalink / raw) Cc: emacs-devel In article <BAY112-F33EB4F35D6ECFFD897275DDAA60@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes: >> > For unifying character encodings? >> >> I don't understand the meaning of "unifying character >> encodings". > I meant to make encoding for arguments and file the same. I see. >> > I.e. if the file is in japanese-shift-jis, but command argument is in >> > chinese-gbk, encoding arguments will make sure all characters are in >> > japanese-shift-jis, won't it? >> >> I don't understand what "if ..." part actually means. Who >> makes command argument in chinese-gbk? > For example, I wrote a lisp command which uses `call-process' and > contains characters in chinese-gbk as arguments. I meant, when > I apply this command to a japanese-shift-jis file, `call-process' will > encode the chinese-gbk characters to japanese-shift-jis in background, > won't it? It's hard to understand what you mean. What do you mean by "apply this command to ... file"? Does it mean that you give the file name to call-process as INFILE argument? But, how does it result in "encode the chinese-gbk characters to japanese-shift-jis"? Emacs doesn't detect the encoding of INFILE. So how does Emacs know about `japanese-shift-jis' first of all? And first of all, CVS Emacs doesn't have chinese-gbk coding system. Are you talking about the behavior of emacs-unicode-2? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-18 6:26 ` Kenichi Handa @ 2006-05-18 6:40 ` Herbert Euler 0 siblings, 0 replies; 18+ messages in thread From: Herbert Euler @ 2006-05-18 6:40 UTC (permalink / raw) Cc: emacs-devel >From: Kenichi Handa <handa@m17n.org> >To: "Herbert Euler" <herberteuler@hotmail.com> >CC: emacs-devel@gnu.org >Subject: Re: Fcall_process: wrong conversion >Date: Thu, 18 May 2006 15:26:54 +0900 > > >> > I.e. if the file is in japanese-shift-jis, but command argument is in > >> > chinese-gbk, encoding arguments will make sure all characters are in > >> > japanese-shift-jis, won't it? > >> > >> I don't understand what "if ..." part actually means. Who > >> makes command argument in chinese-gbk? > > > For example, I wrote a lisp command which uses `call-process' and > > contains characters in chinese-gbk as arguments. I meant, when > > I apply this command to a japanese-shift-jis file, `call-process' will > > encode the chinese-gbk characters to japanese-shift-jis in background, > > won't it? > >It's hard to understand what you mean. What do you mean by >"apply this command to ... file"? Does it mean that you >give the file name to call-process as INFILE argument? But, >how does it result in "encode the chinese-gbk characters to >japanese-shift-jis"? Emacs doesn't detect the encoding of >INFILE. So how does Emacs know about `japanese-shift-jis' >first of all? > >And first of all, CVS Emacs doesn't have chinese-gbk coding >system. Are you talking about the behavior of >emacs-unicode-2? Encodings here are just examples; I should use them as encoding A and B. And my opinion is wrong, I don't know all the real behavior of `call-process', I thought arguments will be encoded to the file encoding. Regards, Guanpeng Xu _________________________________________________________________ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-18 2:24 ` Kenichi Handa 2006-05-18 6:07 ` Herbert Euler @ 2006-05-19 3:01 ` Herbert Euler 1 sibling, 0 replies; 18+ messages in thread From: Herbert Euler @ 2006-05-19 3:01 UTC (permalink / raw) Cc: emacs-devel >From: Kenichi Handa <handa@m17n.org> >To: "Herbert Euler" <herberteuler@hotmail.com> >CC: emacs-devel@gnu.org >Subject: Re: Fcall_process: wrong conversion >Date: Thu, 18 May 2006 11:24:55 +0900 > >Though, it doesn't solve the generic problem of "how to >handle the case that the program requests different encoding >for arguments and file (or stdin)". I think we should solve >it after the release. What kind of programs is `call-process' designed for calling? Maybe another function that is able to set different encoding for file and argument is a better interface to programs requests different encoding for file and argument than `call-process'. Regards, Guanpeng Xu _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-16 4:10 ` Kenichi Handa 2006-05-16 4:34 ` Herbert Euler @ 2006-05-18 17:35 ` Stefan Monnier 2006-05-19 2:49 ` Herbert Euler 1 sibling, 1 reply; 18+ messages in thread From: Stefan Monnier @ 2006-05-18 17:35 UTC (permalink / raw) Cc: Herbert Euler, emacs-devel >> `CODING_REQUIRE_ENCODING' test the following things (line 491 to 496, >> coding.h): >> /* Return 1 if the coding context CODING requires code conversion on >> encoding. */ >> #define CODING_REQUIRE_ENCODING(coding) \ >> ((coding)->src_multibyte \ >> || (coding)->common_flags & CODING_REQUIRE_ENCODING_MASK \ >> || (coding)->mode & CODING_MODE_SELECTIVE_DISPLAY) > That is to make it possible to do encoding of unibyte string/buffer > generated by string-as-unibyte or (set-buffer-multibyte nil) from > multibyte string/buffer. Perhaps we should not allow such an operation, > but as this feature is there for long, it seems dangerous to change > it now. The problem is that if you allow encoding to be applied to unibyte strings, then there is no reliable way to represent bytes (as opposed to 8bit chars): there's always the risk that they'll be encoded. I'd rather make it clear that a unibyte string contains bytes and not chars. > How about disabling encoding only for process arguments if > they are already unibyte? I think such a change is very > safe. Yes, that sounds right. Stefan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-18 17:35 ` Stefan Monnier @ 2006-05-19 2:49 ` Herbert Euler 2006-05-19 10:41 ` Eli Zaretskii 0 siblings, 1 reply; 18+ messages in thread From: Herbert Euler @ 2006-05-19 2:49 UTC (permalink / raw) Cc: emacs-devel >From: Stefan Monnier <monnier@iro.umontreal.ca> >To: Kenichi Handa <handa@m17n.org> >CC: "Herbert Euler" <herberteuler@hotmail.com>, emacs-devel@gnu.org >Subject: Re: Fcall_process: wrong conversion >Date: Thu, 18 May 2006 13:35:23 -0400 > > > How about disabling encoding only for process arguments if > > they are already unibyte? I think such a change is very > > safe. > >Yes, that sounds right. I'm sorry but how does Emacs decide whether a string is unibyte? Regards, Guanpeng Xu _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fcall_process: wrong conversion 2006-05-19 2:49 ` Herbert Euler @ 2006-05-19 10:41 ` Eli Zaretskii 0 siblings, 0 replies; 18+ messages in thread From: Eli Zaretskii @ 2006-05-19 10:41 UTC (permalink / raw) Cc: emacs-devel > From: "Herbert Euler" <herberteuler@hotmail.com> > Date: Fri, 19 May 2006 10:49:33 +0800 > Cc: emacs-devel@gnu.org > > I'm sorry but how does Emacs decide whether a string > is unibyte? See multibyte-string-p. It boils down to this macro from lisp.h: /* Nonzero if STR is a multibyte string. */ #define STRING_MULTIBYTE(STR) \ (XSTRING (STR)->size_byte >= 0) In other words, this information is recorded in the string object itself. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2006-05-19 10:41 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-05-15 6:09 Fcall_process: wrong conversion Herbert Euler 2006-05-15 14:25 ` Stefan Monnier 2006-05-15 15:17 ` Herbert Euler 2006-05-15 16:06 ` Stefan Monnier 2006-05-16 2:59 ` Herbert Euler 2006-05-16 4:10 ` Kenichi Handa 2006-05-16 4:34 ` Herbert Euler 2006-05-16 4:39 ` Kenichi Handa 2006-05-16 5:40 ` Herbert Euler 2006-05-18 2:24 ` Kenichi Handa 2006-05-18 6:07 ` Herbert Euler 2006-05-18 6:14 ` Herbert Euler 2006-05-18 6:26 ` Kenichi Handa 2006-05-18 6:40 ` Herbert Euler 2006-05-19 3:01 ` Herbert Euler 2006-05-18 17:35 ` Stefan Monnier 2006-05-19 2:49 ` Herbert Euler 2006-05-19 10:41 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).