unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
@ 2010-07-22 12:31 Laimonas Vėbra
  2010-07-22 14:33 ` Jason Rumney
  2010-07-22 19:50 ` Eli Zaretskii
  0 siblings, 2 replies; 23+ messages in thread
From: Laimonas Vėbra @ 2010-07-22 12:31 UTC (permalink / raw)
  To: 6705

Below is a comment that i wrote for myself in the cmpdproxy.c (it 
explains the problem). I have a half (i suppose -- portable enough) of 
working solution/fix for it using MultiByteToWideChar API function, but 
i won't send a (partly working) patch, unless someone from the 
developers who agree with the problem and intend to fix it will ask for it.

Besides, the patch itself is larger than 10 diff lines and it uses 
(duplicates by copying) some helper functions/declarations 
(open_input_file(), close_file_data(), rva_to_section(), 
w32_executable_type, RVA_TO_PTR) from unexw32.c, so it may need some 
code refactoring.

This problem certainly needs some discussion (how best to solve it) 
because it addresses unicode communication aspects/issues. If some won't 
bother reading all the description, then here is a simple question -- 
how do one can/should (clearly) pass utf-8 arguments to an external 
(cygwin) app on windows? I suppose, now it's not possible.

Thank you for your attention.


> /* When calling cygwin executable we need to explicitly convert utf-8
>    arguments (it's encoding yhat Emacs uses internally and passes args to
>    external commands, when coding-system-for-write is nil) to utf-16 and
>    call unicode (wide) API function CreateProcess(W).
>    That needs to be done, because of this transcoding chain which
>    migth (and it definitely WILL if args contains unicode, i.e. non
>    ascii/locale_charset character) result in corrupted args:
>
>    WINAPI/OS layer:
>    multibyte string args (utf-8) -> CreateProcessA():
>    locale_codepage -> unicode (utf-16)
>
>    ->
>
>    CYGWIN layer:
>    unicode (utf-16) <-> utf-8 ->
>    cygwin locale env (LC_XXX, LANG; default: C.UTF-8)
>
>
>    Example #1:
>    utf-8 string 'žą'; 'ž'(0xC5, 0xBE) 'ą'(0xC4, 0x85) transcoding
>    (to cygwin locale env charset) chain:
>
>    converting #1:
>    locale_codepage (lt, LCID: 1063, ansi/oem cp: cp1257/cp775) -> utf-16;
>
>    utf-8 string 'žą' in locale codepage (cp1257) represenation: 'žą'
>    'Å'(0xC5), '¾'(0xBE), 'Ä'(0xC4), '…'(0x85).
>
>    string converted to utf-16: 'žą'
>    U+00C5(Å), U+00BE(¾), U+00C4(Ä), U+2026(…).
>
>    utf-16: 'žą': 'Å'(U+00C5), '¾'(U+00BE), 'Ä'(U+00C4), '…'(U+2026).
>    <->
>    utf-8 : 'žą': 'Å'(0xC385), '¾'(0xC2BE), 'Ä'(0xC384), '…'(0xE280).
>
>    converting #2:
>    utf-16/utf-8 -> cygwin locale env (LANG = lt_LT.cp1257);
>
>    utf-8 string 'žą' (0xC3, 0x85, 0xC2, 0xBE, 0xC3, 0x84, 0xE2, 0x80)
>    converted to cp1257: 'žą' (0xC5, 0xBE, 0xC4, 0x85)
>
>    cp1257 string 'žą' in utf-8 representation: 'žą'; 'ž'(0xC5BE), 'ą'(0xC485)
>
>    Although string was (should be) converted to cp1257 (according to
>    cygwin locale env variables), its original value ('žą'), after transcoding
>    to cp1257 (in cp1257 representation as it should be), is corrupted and indeed
>    passed args are (were preserved) in utf-8 encoding.
>    It's important to note that such "original value preservation" happens
>    only because of successful circumstances, when we are converting to windows
>    locale codepage/charset and arg string (utf-8) in  windows locale
>    representation doesn't result in some unconvertible character/combination
>    (e.g. undefined characters) and it's possible to convert back (from utf-16/utf-8
>    to locale charset). Corruption _always_ occurs  if we ar converting to other 	
>    codepage/charset than the current windows locale codepage.
>
>    Consider unsuccessful/erroneous conversion example:
>    utf-8 string/character 'ĥ' (U+0125) passed to cygwin (utf-8):
>
>    utf-8 string 'ĥ'(0xC4A5) in locale codepage (cp1257) representation: 'Ä'
>    (0xA5('') is undefined in cp1257 and it doesn't map to unicode)
>
>    converting #1:
>    locale_codepage (lt, LCID: 1063, ansi/oem cp: cp1257/cp775) -> utf-16;
>
>    utf-8 string 'ĥ' in cp1257 representation: 'Ä'
>
>    string converted to utf-16: 'Ä' (0x00C4, 0xF8FD)
>    (http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1257.txt)
>    0xA5 (cp1257) is mapped to 0xF8FD in Unicode (Private Use Area Range: E000–F8FF)
>
>    utf-16: 'Ä': 'Ä'(U+00C4), ''(U+F8FD)
>    <->
>    utf-8 : 'Ä': 'Ä'(0xC384), ''(0xEFA3BD)
>
>    converting #2:
>    utf-16/utf-8 -> cygwin locale env (LANG = C.UTF-8);
>
>
>    utf-16 string 'Ä': 'Ä'(U+00C4), ''(U+F8FD)
>    converted to utf-8: 'Ä': 'Ä'(0xC384), ''(0xEFA3BD)
>
>    So, original string value 'ĥ' is transcoded to an invalid 'Ä' although that
>    shouldn't happen (as no conversion is supposed; neither implicitly, nor
>    explicitly)
>
>
>    Concluding all: erroneous conversion _always_ occurs, when we are converting
>    to codepage/charset other than the current windows locale codepage, although
>    corruption might occur even if we are not supposed to convert at all
>    (just pass utf-8 encoded arguments).
>
>
> */





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-22 12:31 bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion) Laimonas Vėbra
@ 2010-07-22 14:33 ` Jason Rumney
  2010-07-22 18:14   ` bug#6546: " Laimonas Vėbra
  2010-07-22 19:50 ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Jason Rumney @ 2010-07-22 14:33 UTC (permalink / raw)
  To: Laimonas Vėbra; +Cc: 6705

Laimonas Vėbra <laimonas.vebra@gmail.com> writes:

> This problem certainly needs some discussion (how best to solve it) because it addresses unicode communication aspects/issues. If some won't bother reading all the description, then here is a simple question -- 
> how do one can/should (clearly) pass utf-8 arguments to an external
> (cygwin) app on windows? I suppose, now it's not possible.

Don't use cmdproxy with Cygwin programs. If you need a shell in between,
use Cygwin bash.  cmdproxy is a wrapper to get around some problems with
various versions of the Windows native cmd.exe and command.com shell
programs.  Mixing Cygwin and native Windows is not advised.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6546: bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-22 14:33 ` Jason Rumney
@ 2010-07-22 18:14   ` Laimonas Vėbra
  0 siblings, 0 replies; 23+ messages in thread
From: Laimonas Vėbra @ 2010-07-22 18:14 UTC (permalink / raw)
  Cc: 6546

Jason Rumney wrote:

> Don't use cmdproxy with Cygwin programs. If you need a shell in
> between, use Cygwin bash.  cmdproxy is a wrapper to get around some
> problems with various versions of the Windows native cmd.exe and
> command.com shell programs.  Mixing Cygwin and native Windows is not
> advised.

That doesn't solve the problem (try to pass utf-8 string from Emacs to 
cygwin/bin/(ba)sh.exe or any other cygwin app), nor it anyhow 
complicates the matter (cmdproxy just passes commandline to 
CreateProcess(); same happens in w32proc.c calling /bin/sh instead of 
cmdproxy.exe). The problem is not cmdproxy itself, but winapi/cygwin 
layer and the way the args are passed/transcoded using CreateProcess(A) 
-> cygwin layer.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-22 12:31 bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion) Laimonas Vėbra
  2010-07-22 14:33 ` Jason Rumney
@ 2010-07-22 19:50 ` Eli Zaretskii
  2010-07-22 20:59   ` Laimonas Vėbra
  2010-07-22 22:56   ` Juanma Barranquero
  1 sibling, 2 replies; 23+ messages in thread
From: Eli Zaretskii @ 2010-07-22 19:50 UTC (permalink / raw)
  To: Laimonas Vėbra; +Cc: 6705

> Date: Thu, 22 Jul 2010 15:31:44 +0300
> From: Laimonas Vėbra <laimonas.vebra@gmail.com>
> Cc: 
> 
> Below is a comment that i wrote for myself in the cmpdproxy.c (it 
> explains the problem). I have a half (i suppose -- portable enough) of 
> working solution/fix for it using MultiByteToWideChar API function, but 
> i won't send a (partly working) patch, unless someone from the 
> developers who agree with the problem and intend to fix it will ask for it.

Sorry, I cannot understand your comments.  You talk about corrupted
conversion, but never add any detailed explanations, just examples.
Could you please elaborate?

The only thing I understand is that conversion of Unicode characters
to a Windows codepage that doesn't support those characters will be
lossy.  That part is clear, but:

  . did you try setting up your Windows to use the UTF-8 codepage
    65001?

  . since Cygwin 1.7 switched to using UTF-8, it parted itself even
    further from native Windows applications, so you now have one more
    reason to use the Cygwin build of Emacs instead of the native one

> Besides, the patch itself is larger than 10 diff lines and it uses 
> (duplicates by copying) some helper functions/declarations 
> (open_input_file(), close_file_data(), rva_to_section(), 
> w32_executable_type, RVA_TO_PTR) from unexw32.c, so it may need some 
> code refactoring.

This means we will be unable to accept the patches, even if we agree
to them, without you signing legal papers assigning copyright to the
FSF.

> how do one can/should (clearly) pass utf-8 arguments to an external 
> (cygwin) app on windows? I suppose, now it's not possible.

The Windows build of Emacs doesn't yet use the UTF-16 APIs.  Doing
that is a large job; volunteers are welcome, of course.  However,
passing UTF-8 arguments to subprocesses is hardly the first or the
most important thing that should be done in that regard, IMO.  File
access is much more important, this being an editor.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-22 19:50 ` Eli Zaretskii
@ 2010-07-22 20:59   ` Laimonas Vėbra
  2010-07-23 10:21     ` Eli Zaretskii
  2010-07-22 22:56   ` Juanma Barranquero
  1 sibling, 1 reply; 23+ messages in thread
From: Laimonas Vėbra @ 2010-07-22 20:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 6705

Eli Zaretskii wrote:

> Sorry, I cannot understand your comments.  You talk about corrupted
> conversion, but never add any detailed explanations, just examples.
> Could you please elaborate?

That was supposed to be detailed explanations through the detailed 
examples. It is the way it happens. I did check/investigate; args that 
comes from Emacs (w32proc.c) #1, which are passed to cmdproxy #2 and -- 
after all -- what subprocess/app receives #3. Have you read that 
explanation? What part of the explanation of the corrupted conversion is 
unclear (i'll try to explain; sorry, my english is not so fluent)?

>
>    . did you try setting up your Windows to use the UTF-8 codepage
>      65001?

Well, i could switch to Linux instead...
(it is definitely not the solution or at least the same "solution" as 
not to use unicode at all; windows locale settings is what it is set (be 
it Lithuanian, Cyrillic, Italian, whatever) and is not going to be 
changed nor it needs to be for the correct behavior.

>
>    . since Cygwin 1.7 switched to using UTF-8, it parted itself even
>      further from native Windows applications, so you now have one more
>      reason to use the Cygwin build of Emacs instead of the native one

Well ok, but it (cygwin) work pretty well under/with utf-16 API layer...
(IMHO it's not the problem)

> The Windows build of Emacs doesn't yet use the UTF-16 APIs.  Doing
> that is a large job; volunteers are welcome, of course.  However,

In the context of external communication with cygwin -- it doesn't need 
to use (everywhere), but it needs to convert its output to utf-16 
explicitly and call CreateProcessW().

mingw and (possibly) other applications receives args (main(): **argv, 
GetCommanLineA()) unchanged (the same as they were passed from Emacs).
So, it (passing arguments in whatever encoding, except utf-16 and 
others, with NULL values) just works without any changes.

cygwin applications, on the other hand, receives unchanged arguments 
only through WINAPI GetCommandLineA (which is almost never used...); 
**argv args are transcoded as i explained.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-22 19:50 ` Eli Zaretskii
  2010-07-22 20:59   ` Laimonas Vėbra
@ 2010-07-22 22:56   ` Juanma Barranquero
  2010-07-23 10:26     ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Juanma Barranquero @ 2010-07-22 22:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 6705

On Thu, Jul 22, 2010 at 21:50, Eli Zaretskii <eliz@gnu.org> wrote:

> The Windows build of Emacs doesn't yet use the UTF-16 APIs.  Doing
> that is a large job; volunteers are welcome, of course.

What do you mean, using the W versions of API functions? That would be
incompatible with 16-bit Windows (or, at least, as compatible as the
Microsoft Layer for Unicode allows, and I wouldn't like to bet on it).
Or are you talking about abandoning 16-bit Windows compatibility?

    Juanma





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-22 20:59   ` Laimonas Vėbra
@ 2010-07-23 10:21     ` Eli Zaretskii
  2010-07-23 12:57       ` Laimonas Vėbra
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2010-07-23 10:21 UTC (permalink / raw)
  To: Laimonas Vėbra; +Cc: 6705

> Date: Thu, 22 Jul 2010 23:59:09 +0300
> From: Laimonas Vėbra <laimonas.vebra@gmail.com>
> CC: 6705@debbugs.gnu.org
> 
> Eli Zaretskii wrote:
> 
> > Sorry, I cannot understand your comments.  You talk about corrupted
> > conversion, but never add any detailed explanations, just examples.
> > Could you please elaborate?
> 
> That was supposed to be detailed explanations through the detailed 
> examples. It is the way it happens. I did check/investigate;

I don't doubt that you checked, I just don't understand the
description of the problem.

Once again, if all you want to say is that you want to invoke external
programs with command-line arguments encoded in anything other than
the current locale's encoding, then this will not currently work in
the native Windows build.  But if you are trying to say anything else,
please elaborate.

> args that 
> comes from Emacs (w32proc.c) #1, which are passed to cmdproxy #2 and -- 
> after all -- what subprocess/app receives #3. Have you read that 
> explanation?

Yes.

> What part of the explanation of the corrupted conversion is unclear

None of it.  Perhaps instead of going by example, just describe what
encoding you used, in what Emacs command, and what was corrupted as
result.

> >    . since Cygwin 1.7 switched to using UTF-8, it parted itself even
> >      further from native Windows applications, so you now have one more
> >      reason to use the Cygwin build of Emacs instead of the native one
> 
> Well ok, but it (cygwin) work pretty well under/with utf-16 API layer...

I didn't try to imply that Cygwin was the problem.  I was suggesting
to use the Cygwin build of Emacs.  Why do you insist on using the
native w32 build, when it is obvious that the compatibility between
what it does and what Cygwin expects is marginal at best?

> In the context of external communication with cygwin -- it doesn't need 
> to use (everywhere), but it needs to convert its output to utf-16 
> explicitly and call CreateProcessW().

Yes.  But it doesn't make sense to do this kind of surgery in Emacs
without benefiting from the *W APIs all over, does it?






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-22 22:56   ` Juanma Barranquero
@ 2010-07-23 10:26     ` Eli Zaretskii
  2010-07-23 10:45       ` Juanma Barranquero
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2010-07-23 10:26 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 6705

> From: Juanma Barranquero <lekktu@gmail.com>
> Date: Fri, 23 Jul 2010 00:56:33 +0200
> Cc: Laimonas Vėbra <laimonas.vebra@gmail.com>, 6705@debbugs.gnu.org
> 
> On Thu, Jul 22, 2010 at 21:50, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > The Windows build of Emacs doesn't yet use the UTF-16 APIs.  Doing
> > that is a large job; volunteers are welcome, of course.
> 
> What do you mean, using the W versions of API functions? That would be
> incompatible with 16-bit Windows (or, at least, as compatible as the
> Microsoft Layer for Unicode allows, and I wouldn't like to bet on it).
> Or are you talking about abandoning 16-bit Windows compatibility?

Neither.  I'm talking about using the W APIs on versions of Windows
that support them, and A APIs elsewhere, including on Windows 9X.

That's why I said this is a non-trivial job.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 10:26     ` Eli Zaretskii
@ 2010-07-23 10:45       ` Juanma Barranquero
  2010-07-23 14:16         ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Juanma Barranquero @ 2010-07-23 10:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 6705

On Fri, Jul 23, 2010 at 12:26, Eli Zaretskii <eliz@gnu.org> wrote:

> Neither.  I'm talking about using the W APIs on versions of Windows
> that support them, and A APIs elsewhere, including on Windows 9X.

OK, I see. Lots and lots of wrappers and checks.

> That's why I said this is a non-trivial job.

Yes. But it can be done incrementally, I think.

    Juanma





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 10:21     ` Eli Zaretskii
@ 2010-07-23 12:57       ` Laimonas Vėbra
  2010-07-23 14:35         ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Laimonas Vėbra @ 2010-07-23 12:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 6705

Eli Zaretskii wrote:
>> Date: Thu, 22 Jul 2010 23:59:09 +0300
>> From: Laimonas Vėbra<laimonas.vebra@gmail.com>
>> CC: 6705@debbugs.gnu.org
>>
>> Eli Zaretskii wrote:
>>
>>> Sorry, I cannot understand your comments.  You talk about corrupted
>>> conversion, but never add any detailed explanations, just examples.
>>> Could you please elaborate?
>>
>> That was supposed to be detailed explanations through the detailed
>> examples. It is the way it happens. I did check/investigate;
>
> I don't doubt that you checked, I just don't understand the
> description of the problem.
>
> Once again, if all you want to say is that you want to invoke external
> programs with command-line arguments encoded in anything other than
> the current locale's encoding, then this will not currently work in
> the native Windows build.  But if you are trying to say anything else,
> please elaborate.

Well, it will work. It's not the problem to pass utf-8 arguments to 
natvive (mingw) apps. It won't work with cygwin, and that „won't work“ 
is not for sure (it can, under some circumstances, and i'd say inproper 
setup). So i think i should elaborate.


> None of it.  Perhaps instead of going by example, just describe what
> encoding you used, in what Emacs command, and what was corrupted as
> result.

Ok, from the begining.
I'd like to grep for some utf-8 encoded string. Choose it whatever (non 
ascii) value you like, let's say 'ĔĿİ' (hex: 0x[C494, C4BF, C4B0]).

echo -e "-ĔĿİ-\n_ĔĿİ_\nELI\nĔĿİ" > file.txt

grep --version
GNU grep 2.6.3 (cygwin)

wscript.echo (GetLocale())
1063
http://www.cryer.co.uk/brian/windows/info_windows_locale_table.htm

LANG="" (that means not set, cygwin default value "C.UTF-8")

M-x grep
grep -nH -e "ĔĿİ" file.txt

Grep finished with no matches found at Fri Jul 23 13:56:22

Why?

Because:
grep.c gets args "Ä”ÄæÄ°" (utf-8 string, hex: 0x[C384, E2809D, C384, 
C3A6, C384, C2B0]).

Why?
Because original string value 0x[C494, C4BF, C4B0] in interpreted to be 
in the current locale codepage (cp1257) encoding/charset:
http://msdn.microsoft.com/en-us/goglobal/cc305150.aspx

and is interpreted (by the cygwin/os api) as six characters: 0x (C4, 94, 
C4, BF, C4, B0); i.e. 'Ä”ÄæÄ°', converted to utf-16 and then to utf-8.



> I didn't try to imply that Cygwin was the problem.  I was suggesting
> to use the Cygwin build of Emacs.  Why do you insist on using the
> native w32 build, when it is obvious that the compatibility between
> what it does and what Cygwin expects is marginal at best?

I tried to imply, that cygwin tools is mature/consistent enough for the 
w32 to work with. And from that point of view there is no advantage of 
using cygwin Emacs build instead of native one (cygwin build is slower 
and potentially more buggy)

> Yes.  But it doesn't make sense to do this kind of surgery in Emacs
> without benefiting from the *W APIs all over, does it?

Why? We benefit at least in that sense, that both of them (native and 
cygwin app) will work (correctly) on w32. As correctly as it's possible 
with the current code.
In other words -- (why) do you think it's not worth to tune Emacs with 
cygwin system (plenty of useful tools; especially if we think about 
working (efficiently, the same) with emacs on different systems: *nix, w32)?





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 10:45       ` Juanma Barranquero
@ 2010-07-23 14:16         ` Eli Zaretskii
  2010-07-25 23:38           ` Stefan Monnier
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2010-07-23 14:16 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 6705

> From: Juanma Barranquero <lekktu@gmail.com>
> Date: Fri, 23 Jul 2010 12:45:08 +0200
> Cc: laimonas.vebra@gmail.com, 6705@debbugs.gnu.org
> 
> On Fri, Jul 23, 2010 at 12:26, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > Neither.  I'm talking about using the W APIs on versions of Windows
> > that support them, and A APIs elsewhere, including on Windows 9X.
> 
> OK, I see. Lots and lots of wrappers and checks.
> 
> > That's why I said this is a non-trivial job.
> 
> Yes. But it can be done incrementally, I think.

I'm not sure, some of them need to be done as a group, or else
features will become broken.  If someone can suggest a plan for
incrementally adding this stuff, we could discuss it.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 12:57       ` Laimonas Vėbra
@ 2010-07-23 14:35         ` Eli Zaretskii
  2010-07-23 15:35           ` Laimonas Vėbra
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2010-07-23 14:35 UTC (permalink / raw)
  To: Laimonas Vėbra; +Cc: 6705

> Date: Fri, 23 Jul 2010 15:57:46 +0300
> From: Laimonas Vėbra <laimonas.vebra@gmail.com>
> CC: 6705@debbugs.gnu.org
> 
> It's not the problem to pass utf-8 arguments to natvive (mingw)
> apps.

If these MinGW applications use Unicode (UTF-16) APIs, that's true.
But if they use the ANSI APIs (and most of them do), then you simply
cannot pass to them command-line arguments encoded in any encoding
other than the current codepage.

> I'd like to grep for some utf-8 encoded string.

Stop.  That's your problem, right there: you can't have this, not
unless your current codepage is 65001.

> > I didn't try to imply that Cygwin was the problem.  I was suggesting
> > to use the Cygwin build of Emacs.  Why do you insist on using the
> > native w32 build, when it is obvious that the compatibility between
> > what it does and what Cygwin expects is marginal at best?
> 
> I tried to imply, that cygwin tools is mature/consistent enough for the 
> w32 to work with.

Mature, but incompatible with the w32 build of Emacs.  And Cygwin 1.7
made them even more incompatible.

> In other words -- (why) do you think it's not worth to tune Emacs with 
> cygwin system (plenty of useful tools; especially if we think about 
> working (efficiently, the same) with emacs on different systems: *nix, w32)?

In my view, users of the w32 build of Emacs who use Cygwin tools
outside Emacs are a minority.  There are native w32 ports of most of
the tools you have in Cygwin, and there is the Cygwin build of Emacs.
I don't see why the handful of Emacs developers who contribute to the
w32 port should invest a significant part of their scarce resources on
fixing incompatibilities between the w32 Emacs and Cygwin, when a
Cygwin build of Emacs is available and works pretty well, judging by
the few of its users who are active on the emacs-devel list.  I don't
know why you say it's "potentially" more buggy -- it uses mostly the
same code that runs on GNU/Linux, so actually it should be _less_
buggy than the native w32 build, because it is used by a larger number
of users.  Did you even try to switch to the Cygwin build?  If not,
perhaps you should.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 14:35         ` Eli Zaretskii
@ 2010-07-23 15:35           ` Laimonas Vėbra
  2010-07-23 18:06             ` Eli Zaretskii
  2010-07-23 19:07             ` Juanma Barranquero
  0 siblings, 2 replies; 23+ messages in thread
From: Laimonas Vėbra @ 2010-07-23 15:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 6705

Eli Zaretskii wrote:
>> Date: Fri, 23 Jul 2010 15:57:46 +0300
>> From: Laimonas Vėbra<laimonas.vebra@gmail.com>
>> CC: 6705@debbugs.gnu.org
>>
>> It's not the problem to pass utf-8 arguments to natvive (mingw)
>> apps.
>
> If these MinGW applications use Unicode (UTF-16) APIs, that's true.
> But if they use the ANSI APIs (and most of them do), then you simply
> cannot pass to them command-line arguments encoded in any encoding
> other than the current codepage.

It's not true when we're launching subprocess using CreateProcessA()
and passing args to it, i.e. like Emacs does). Try:

#include <stdio.h>
int main (int argc, char ** argv) {
	printf("argv[1]: %s\n", argv[1]);
	return 0;
}

gcc.EXE (GCC) 3.4.5 (mingw-vista special r3)

gcc -o test test.c

M-x grep
test.exe "ĔĿİ" > out.txt

$ cat out.txt
argv[1]: ĔĿİ


> In my view, users of the w32 build of Emacs who use Cygwin tools
> outside Emacs are a minority.  There are native w32 ports of most of
> the tools you have in Cygwin, and there is the Cygwin build of Emacs.
> I don't see why the handful of Emacs developers who contribute to the
> w32 port should invest a significant part of their scarce resources on

bzr log says that much of the active development of the w32proc.c and 
others actually ended somewhere in the 2001-2003... ;-)
On the other hand -- why when you think w32 developers should invest 
their time developing w32 stuff at all (if we have cygwin build which 
works „pretty well“)...?

> fixing incompatibilities between the w32 Emacs and Cygwin, when a
> Cygwin build of Emacs is available and works pretty well, judging by
> the few of its users who are active on the emacs-devel list.  I don't

Are they using it in unicode aspect/context? It's the most important 
question, because many people don't get any problems if their are not 
dealing with unicode (or at first/least with non english 
ansi/multilingual aspects).


> know why you say it's "potentially" more buggy -- it uses mostly the
> same code that runs on GNU/Linux, so actually it should be _less_
> buggy than the native w32 build, because it is used by a larger number
> of users.  Did you even try to switch to the Cygwin build?  If not,
> perhaps you should.

Same question -- why when bother with w32 development at all?






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 15:35           ` Laimonas Vėbra
@ 2010-07-23 18:06             ` Eli Zaretskii
  2010-07-23 18:53               ` Laimonas Vėbra
  2010-07-23 19:07             ` Juanma Barranquero
  1 sibling, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2010-07-23 18:06 UTC (permalink / raw)
  To: Laimonas Vėbra; +Cc: 6705

> Date: Fri, 23 Jul 2010 18:35:40 +0300
> From: Laimonas Vėbra <laimonas.vebra@gmail.com>
> CC: 6705@debbugs.gnu.org
> 
> M-x grep
> test.exe "ĔĿİ" > out.txt
> 
> $ cat out.txt
> argv[1]: ĔĿİ

And what does that prove, exactly?  That MinGW programs can support
non-ASCII characters?  I never said they didn't.

> bzr log says that much of the active development of the w32proc.c and 
> others actually ended somewhere in the 2001-2003... ;-)

So what?

> On the other hand -- why when you think w32 developers should invest 
> their time developing w32 stuff at all (if we have cygwin build which 
> works „pretty well“)...?

For the users who use Emacs in conjunction with native w32 programs.
Not every Windows program is necessarily available in Cygwin, you
know.  If you need to use Emacs in a non-Cygwin environment, the
native build will fit much better.

> > fixing incompatibilities between the w32 Emacs and Cygwin, when a
> > Cygwin build of Emacs is available and works pretty well, judging by
> > the few of its users who are active on the emacs-devel list.  I don't
> 
> Are they using it in unicode aspect/context?

I don't know; why won't you ask on emacs-devel?

> > know why you say it's "potentially" more buggy -- it uses mostly the
> > same code that runs on GNU/Linux, so actually it should be _less_
> > buggy than the native w32 build, because it is used by a larger number
> > of users.  Did you even try to switch to the Cygwin build?  If not,
> > perhaps you should.
> 
> Same question -- why when bother with w32 development at all?

See above: same answer.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 18:06             ` Eli Zaretskii
@ 2010-07-23 18:53               ` Laimonas Vėbra
  2010-07-23 21:25                 ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Laimonas Vėbra @ 2010-07-23 18:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 6705

Eli Zaretskii wrote:
>> Date: Fri, 23 Jul 2010 18:35:40 +0300
>> From: Laimonas Vėbra<laimonas.vebra@gmail.com>
>> CC: 6705@debbugs.gnu.org
>>
>> M-x grep
>> test.exe "ĔĿİ">  out.txt
>>
>> $ cat out.txt
>> argv[1]: ĔĿİ
>
> And what does that prove, exactly?  That MinGW programs can support
> non-ASCII characters?  I never said they didn't.

Exactly? It means, that now mingw (native) programs can receive args 
from Emacs in whatever (except utf-16|32 and maybe few others) encoding 
without corruption. That's what i also expect (want) for the cygwin.
	





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 15:35           ` Laimonas Vėbra
  2010-07-23 18:06             ` Eli Zaretskii
@ 2010-07-23 19:07             ` Juanma Barranquero
  2010-07-23 19:51               ` Laimonas Vėbra
  1 sibling, 1 reply; 23+ messages in thread
From: Juanma Barranquero @ 2010-07-23 19:07 UTC (permalink / raw)
  To: Laimonas Vėbra; +Cc: 6705

On Fri, Jul 23, 2010 at 17:35, Laimonas Vėbra <laimonas.vebra@gmail.com> wrote:

> bzr log says that much of the active development of the w32proc.c and others
> actually ended somewhere in the 2001-2003... ;-)

Are you volunteering? By all means, welcome aboard!

> On the other hand -- why when you think w32 developers should invest their
> time developing w32 stuff at all (if we have cygwin build which works
> „pretty well“)...?

Because the Cygwin build works pretty well when your environment is
Cygwin. And some of us, to put it mildly, wouldn't touch that crap
again if it were the last working environment on the surface of the
Earth. I certainly don't plan to spend a second making Emacs work
better with Cygwin (tought of course I have nothing against other
people doing it; I'm not an anti-Cygwin zealot).

> Same question -- why when bother with w32 development at all?

Seems like there's a sizable user base of non-Cygwin users of Emacs on
Windows. And some of us like to work on Emacs development.

    Juanma





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 19:07             ` Juanma Barranquero
@ 2010-07-23 19:51               ` Laimonas Vėbra
  2010-07-23 19:59                 ` Juanma Barranquero
  0 siblings, 1 reply; 23+ messages in thread
From: Laimonas Vėbra @ 2010-07-23 19:51 UTC (permalink / raw)
  To: Juanma Barranquero, 6705

Juanma Barranquero wrote:
> On Fri, Jul 23, 2010 at 17:35, Laimonas Vėbra<laimonas.vebra@gmail.com>  wrote:
>
>> bzr log says that much of the active development of the w32proc.c and others
>> actually ended somewhere in the 2001-2003... ;-)
>
> Are you volunteering? By all means, welcome aboard!

I'd like to and if you haven't noticed -- i'am trying... ;-)

> Because the Cygwin build works pretty well when your environment is
> Cygwin. And some of us, to put it mildly, wouldn't touch that crap
> again if it were the last working environment on the surface of the
> Earth. I certainly don't plan to spend a second making Emacs work
> better with Cygwin (tought of course I have nothing against other
> people doing it; I'm not an anti-Cygwin zealot).

And that is why i also prefer native build of Emacs (and think, that it 
should be less buggy).
I suppose, that making it work better with Cygwin (like other external 
apps) would just make it better.
In the sources there are already much (well, not so little) of the 
cygwin related stuff, so if it ain't going to be dropped, then why it 
couldn't/shouldn't be improved?

>
>> Same question -- why when bother with w32 development at all?
>
> Seems like there's a sizable user base of non-Cygwin users of Emacs on
> Windows. And some of us like to work on Emacs development.

That question implied, that it's worth to improve w32.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 19:51               ` Laimonas Vėbra
@ 2010-07-23 19:59                 ` Juanma Barranquero
  0 siblings, 0 replies; 23+ messages in thread
From: Juanma Barranquero @ 2010-07-23 19:59 UTC (permalink / raw)
  To: Laimonas Vėbra; +Cc: 6705

On Fri, Jul 23, 2010 at 21:51, Laimonas Vėbra <laimonas.vebra@gmail.com> wrote:

> I'd like to and if you haven't noticed -- i'am trying... ;-)

Good luck. (I'm being sincere, not facetious.)

> And that is why i also prefer native build of Emacs (and think, that it
> should be less buggy).

I've never felt the native build was abnormally buggy (compared to the
POSIX builds).

> I suppose, that making it work better with Cygwin (like other external apps)
> would just make it better.

Sure, assuming that there's a non-hackish way to do it. It's just that
there are lots of things that seem to me more useful, given our
limited resources. But YMMV.

> In the sources there are already much (well, not so little) of the cygwin
> related stuff, so if it ain't going to be dropped, then why it
> couldn't/shouldn't be improved?

Again: if there's a way to improve it that does not make already
existing code worse, why not?

    Juanma





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 18:53               ` Laimonas Vėbra
@ 2010-07-23 21:25                 ` Eli Zaretskii
  2010-07-23 21:53                   ` Laimonas Vėbra
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2010-07-23 21:25 UTC (permalink / raw)
  To: Laimonas Vėbra; +Cc: 6705

> Date: Fri, 23 Jul 2010 21:53:00 +0300
> From: Laimonas Vėbra <laimonas.vebra@gmail.com>
> CC: 6705@debbugs.gnu.org
> 
> Eli Zaretskii wrote:
> >> Date: Fri, 23 Jul 2010 18:35:40 +0300
> >> From: Laimonas Vėbra<laimonas.vebra@gmail.com>
> >> CC: 6705@debbugs.gnu.org
> >>
> >> M-x grep
> >> test.exe "ĔĿİ">  out.txt
> >>
> >> $ cat out.txt
> >> argv[1]: ĔĿİ
> >
> > And what does that prove, exactly?  That MinGW programs can support
> > non-ASCII characters?  I never said they didn't.
> 
> Exactly? It means, that now mingw (native) programs can receive args 
> from Emacs in whatever (except utf-16|32 and maybe few others) encoding 
> without corruption.

Only if that encoding matches the current user's codepage.

Now come on, this discussion has nothing more to contribute to the
subject.  Time to stop it.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 21:25                 ` Eli Zaretskii
@ 2010-07-23 21:53                   ` Laimonas Vėbra
  2010-07-24 20:33                     ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Laimonas Vėbra @ 2010-07-23 21:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 6705

[-- Attachment #1: Type: text/plain, Size: 1241 bytes --]

Eli Zaretskii wrote:

> Only if that encoding matches the current user's codepage.

No. NO. *NO*. Try. I already showed how and i can repeat it AGAIN:

#include <stdio.h>
int main (int argc, char ** argv) {
     printf("argv[1]: %s\n", argv[1]);
     return 0;
}

gcc.EXE (GCC) 3.4.5 (mingw-vista special r3)

gcc -o test test.c

emacs -Q

M-x eval-expression
(setq coding-system-for-write 'utf-8)
M-x grep
test.exe "Šešios žąsys su šešiais žąsyčiais" >> out.txt

M-x eval-expression
(setq coding-system-for-write 'cp1257)
M-x grep
test.exe "Šešios žąsys su šešiais žąsyčiais" >> out.txt


M-x eval-expression
(setq coding-system-for-write 'utf-8)
M-x grep
test.exe "С новым годом Ели Заретски!" >> out.txt


M-x eval-expression
(setq coding-system-for-write 'cp1251)
M-x grep
test.exe "С новым годом Ели Заретски!" >> out.txt


M-x eval-expression
(setq coding-system-for-write 'utf-8)
M-x grep
test.exe "Χρήση εθνικών και ειδικών χαρακτήρων" >> out.txt

M-x eval-expression
(setq coding-system-for-write 'iso-8859-7)
M-x grep
test.exe "Χρήση εθνικών και ειδικών χαρακτήρων" >> out.txt


File 'out.txt' attached.


[-- Attachment #2: out.txt --]
[-- Type: text/plain, Size: 321 bytes --]

argv[1]: Šešios žąsys su šešiais žąsyčiais
argv[1]: Ðeðios þàsys su ðeðiais þàsyèiais
argv[1]: С новым годом Ели Заретски!
argv[1]: Ñ íîâûì ãîäîì Åëè Çàðåòñêè!
argv[1]: Χρήση εθνικών και ειδικών χαρακτήρων
argv[1]: ×ñÞóç åèíéêþí êáé åéäéêþí ÷áñáêôÞñùí

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 21:53                   ` Laimonas Vėbra
@ 2010-07-24 20:33                     ` Eli Zaretskii
  2010-07-25 10:10                       ` Laimonas Vėbra
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2010-07-24 20:33 UTC (permalink / raw)
  To: Laimonas Vėbra; +Cc: 6705

> Date: Sat, 24 Jul 2010 00:53:41 +0300
> From: Laimonas Vėbra <laimonas.vebra@gmail.com>
> CC: 6705@debbugs.gnu.org
> 
> M-x eval-expression
> (setq coding-system-for-write 'utf-8)
> M-x grep
> test.exe "Šešios žąsys su šešiais žąsyčiais" >> out.txt
> 
> M-x eval-expression
> (setq coding-system-for-write 'cp1257)
> M-x grep
> test.exe "Šešios žąsys su šešiais žąsyčiais" >> out.txt
> 
> 
> M-x eval-expression
> (setq coding-system-for-write 'utf-8)
> M-x grep
> test.exe "С новым годом Ели Заретски!" >> out.txt
> 
> 
> M-x eval-expression
> (setq coding-system-for-write 'cp1251)
> M-x grep
> test.exe "С новым годом Ели Заретски!" >> out.txt
> 
> 
> M-x eval-expression
> (setq coding-system-for-write 'utf-8)
> M-x grep
> test.exe "Χρήση εθνικών και ειδικών χαρακτήρων" >> out.txt
> 
> M-x eval-expression
> (setq coding-system-for-write 'iso-8859-7)
> M-x grep
> test.exe "Χρήση εθνικών και ειδικών χαρακτήρων" >> out.txt

If all you want is to pass raw bytes to programs that will be unable
to interpret them _except_ as raw bytes, then where's the problem?
The above already works in Emacs, doesn't it?

> File 'out.txt' attached.

Try visiting it, and you will see what I mean.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-24 20:33                     ` Eli Zaretskii
@ 2010-07-25 10:10                       ` Laimonas Vėbra
  0 siblings, 0 replies; 23+ messages in thread
From: Laimonas Vėbra @ 2010-07-25 10:10 UTC (permalink / raw)
  To: Eli Zaretskii, 6705

Eli Zaretskii wrote:

> If all you want is to pass raw bytes to programs that will be unable
> to interpret them _except_ as raw bytes, then where's the problem?
> The above already works in Emacs, doesn't it?

All i want is to say that it doesn't work with cygwin. Emacs can't pass 
raw data to it.
(it's not a problem for me (someone) to write/use a transcoding proxy 
util (it's possible with raw data), although it perfectly works 
(greping, for example) without it, but for that i need _correct_raw_ _data_)





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
  2010-07-23 14:16         ` Eli Zaretskii
@ 2010-07-25 23:38           ` Stefan Monnier
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan Monnier @ 2010-07-25 23:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Juanma Barranquero, 6705

> I'm not sure, some of them need to be done as a group, or else
> features will become broken.  If someone can suggest a plan for
> incrementally adding this stuff, we could discuss it.

Actually, if it can be done incrementally, then there's no need for
a plan: the patches can be sent and evaluated one-by-one.


        Stefan






^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-07-25 23:38 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-22 12:31 bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion) Laimonas Vėbra
2010-07-22 14:33 ` Jason Rumney
2010-07-22 18:14   ` bug#6546: " Laimonas Vėbra
2010-07-22 19:50 ` Eli Zaretskii
2010-07-22 20:59   ` Laimonas Vėbra
2010-07-23 10:21     ` Eli Zaretskii
2010-07-23 12:57       ` Laimonas Vėbra
2010-07-23 14:35         ` Eli Zaretskii
2010-07-23 15:35           ` Laimonas Vėbra
2010-07-23 18:06             ` Eli Zaretskii
2010-07-23 18:53               ` Laimonas Vėbra
2010-07-23 21:25                 ` Eli Zaretskii
2010-07-23 21:53                   ` Laimonas Vėbra
2010-07-24 20:33                     ` Eli Zaretskii
2010-07-25 10:10                       ` Laimonas Vėbra
2010-07-23 19:07             ` Juanma Barranquero
2010-07-23 19:51               ` Laimonas Vėbra
2010-07-23 19:59                 ` Juanma Barranquero
2010-07-22 22:56   ` Juanma Barranquero
2010-07-23 10:26     ` Eli Zaretskii
2010-07-23 10:45       ` Juanma Barranquero
2010-07-23 14:16         ` Eli Zaretskii
2010-07-25 23:38           ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).