From: Alexis <flexibeast@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: "args-out-of-range" error when using data from external process on Windows
Date: Thu, 18 Apr 2024 21:20:55 +1000 [thread overview]
Message-ID: <87mspqzzl4.fsf@gmail.com> (raw)
In-Reply-To: <86il0ff4qe.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 18 Apr 2024 11:35:21 +0300")
Thanks again for your assistance!
As some additional context: i haven't actively used a Windows
system in more than a decade - it was Windows 7 - and even then, i
was running it in a VM in order to run some other software. i've
also never used Windows outside of an "Australian English"
context, and have never done any dev work on the Windows
platform. So i've got only a minimal idea of how Windows does
various things nowadays, and have never needed to become familiar
with sysadmin-/dev-level Windows documentation. Until now. :-)
Specific responses inline below.
> I don't think I understand the setting of LC_ALL part. First,
> AFAIK Windows programs generally ignore LC_* environment
> variables. If you read the Microsoft documentation of
> 'setlocale', here:
>
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170
>
> you will not see any reference to environment variables there.
Thanks for this link; it gives me a good starting point to explore
the Win docs on this issue.
> The Windows 'setlocale' supports only LC_* _categories_ in
> direct calls to the function, and doesn't consider the
> corresponding environment variables. The Emacs source code
> doesn't reference LC_* environment variables on MS-Windows,
> either. So how did the user set LC_ALL, and why did it have any
> effect whatsoever on the issue?
They didn't say; all they wrote
(https://github.com/flexibeast/ebuku/issues/31#issuecomment-2058171986)
was:
> I ... changed my LC_ALL to zh_CN.UTF-8. Ebuku can find the db
> now.
i'll ask them.
> Second, the user sets a UTF-8 locale, which as I wrote up-thread
> is not a good idea on MS-Windows. It could well cause failures
> in invoking external programs from Emacs, if the arguments to
> those programs include non-ASCII characters. In general, on
> MS-Windows Emacs can only safely invoke programs with non-ASCII
> characters in the command-line arguments if those characters can
> be encoded by the system codepage, in this case codepage-936
> AFAIU.
Thanks, i'll add that to the information i pass back to the user
on that GitHub issue.
> Regarding the "invalid string for collation: Invalid argument"
> error: how does ebuku determine the LOCALE argument with which
> it calls string-collate-lessp? It is important to understand
> what was the locale with which w32_compare_strings was called in
> that case.
The single use of `string-collate-lessp` doesn't pass any LOCALE
argument, as i just wanted it to use the user's current locale for
sorting a given bookmark's tags into the appropriate
lexicographical order.
> Finally, the issues with Windows-style file names with drive
> letters and with file names that begin with "~" lead me to
> believe that perhaps the underlying program 'buku' is not a
> native Windows program, but a Cygwin or MSYS program, in which
> case there could be incompatibilities both regarding file names
> and regarding handling of non-ASCII characters (Cygwin and MSYS
> use UTF-8 by default, whereas the native Windows build of Emacs
> does not).
Sorry; i mentioned in my first email, but didn't reiterate in my
second, that `buku` is Python-based.
> You need to take a good look at whether non-ASCII characters are
> passed to 'buku' in this case, and how the output from 'buku' is
> decoded.
👍
> Also, ebuku-buku-path and ebuku-database-path should both be
> quoted with shell-quote-argument (but I don't think this is a
> problem in this case). Can ARGS include whitespace or characters
> special for the Windows shell? if so, each argument should be
> quoted with shell-quote-argument as well.
Thanks, noted.
> How output is decoded when it is put into the temporary buffer
> is also of interest -- what is the value of
> buffer-file-coding-system in the temporary buffer after reading
> output, in the OP's case?
*nod*
> Emacs on MS-Windows
> cannot use UTF-8 when encoding command-line arguments for
> sub-programs, it can only use the system codepage. Using
> set-language-environment as above will force Emacs to encode
> command-line arguments in UTF-8, which could very well be the
> reason for some of these problems.
Ah okay.
> No.
>
> The issue is complicated by several factors and will take a long
> post to explain. The upshot is that for passing non-ASCII
> characters safely to subprograms on their command lines, Emacs
> should use the system codepage, not UTF-8 or anything else (and
> definitely not UTF-16). This might require some tricky juggling
> with coding-system related settings when you call call-process,
> because coding-system-for-write is used for both encoding of the
> command-line arguments and of the stuff we send to the
> sub-program, so if they both can include non-ASCII characters,
> some care is in order. (By contrast, coding-system-for-read can
> be always bound to UTF-8 to decode the output correctly --
> assuming 'buku' outputs UTF-8 encoded text on MS-Windows.)
That's very helpful, thank you.
> The more important question is: can CRAB emoji be safely encoded
> by codepage 936, the system codepage of the OP? If not, and if
> that emoji can appear in the command-line arguments of a 'buku'
> invocation (as opposed to in the text we write to or read from
> 'buku'), then this character cannot be used at all with this
> package on MS-Windows.
>
> (And please note that Emacs now has a native SQLite support,
> which should make many of these complications simply disappear.)
It would certainly make many things easier to just interact with
the db directly. That said, doing so would involve a substantial
rewrite, and i've got many things on my plate nowadays, including
supporting disabled loved ones while having chronic health issues
myself. But maybe i can open an issue requesting help to start and
develop a branch doing such a rewrite.
> As for why the problems disappear when the CRAB emoji is
> removed: as I wrote elsewhere, that's probably because all the
> other characters are plain ASCII, so all the encoding-related
> issues don't matter.
*nod*
> They don't have any effect on Emacs on MS-Windows, that's for
> sure. Whether they have effect on 'buku' depends on whether
> it's a native MS-Windows program or Cygwin/MSYS program, and
> also on its code (a program could potentially augment the MS
> 'setlocale' function with its own code which looks at the LC_*
> environment variables, and does TRT in the application code).
*nod*
>> But what should i do to handle the more general case of an
>> arbitrary encoding? Do i need to have a defcustom, with
>> 'reasonable defaults', that the user can set if necessary,
>> which i use as the value to pass to coding-system-for-read?
>
> That depends on what encoding does 'buku' expect on input and
> what encoding does it use on output. If it always uses UTF-8,
> you just need to make sure Emacs uses UTF-8 when encoding and
> decoding text passed to and from 'buku' (but note the caveat
> about encoding the command-line arguments -- these _must_ be
> encoded in the system codepage). If, OTOH, the encoding used by
> 'buku' can be changed dynamically, and Emacs cannot know what it
> is (for example, if it is determined by the encoding of the text
> put in the SQL database by the user), then a user option is in
> order.
Great, thank you.
>> As i interpret their comments in the above discussions so far,
>> yes, they had themselves set LANG to "zh_CN.UTF-8" (and yes, as
>> described above, had definitely `set-language-environment` as
>> "UTF-8".
>
> NOT RECOMMENDED!
*chuckle* i'll be sure to pass this on. :-)
Thanks again!
Alexis.
next prev parent reply other threads:[~2024-04-18 11:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-18 5:39 "args-out-of-range" error when using data from external process on Windows Alexis
2024-04-18 6:01 ` Eli Zaretskii
2024-04-18 7:07 ` Alexis
2024-04-18 8:35 ` Eli Zaretskii
2024-04-18 11:20 ` Alexis [this message]
2024-04-19 3:16 ` Alexis
2024-04-19 7:29 ` Eli Zaretskii
2024-04-21 0:57 ` Alexis
2024-04-18 6:05 ` Eli Zaretskii
-- strict thread matches above, loose matches on Subject: below --
2024-04-18 6:11 Alexis
2024-04-18 7:08 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mspqzzl4.fsf@gmail.com \
--to=flexibeast@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.