From: Alexis <flexibeast@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: "args-out-of-range" error when using data from external process on Windows
Date: Thu, 18 Apr 2024 17:07:25 +1000 [thread overview]
Message-ID: <87y19bywr6.fsf@gmail.com> (raw)
In-Reply-To: <86msprfbul.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 18 Apr 2024 09:01:38 +0300")
Eli Zaretskii <eliz@gnu.org> writes:
> Crystal ball says the package assumes UTF-8 encoding of the text
> from the sub-process, which is generally not what happens on
> Windows. Or maybe the package assumes that UTF-8 text from a
> sub-process will necessarily be decoded as UTF-8, which again
> can fail if the default coding-systems are not UTF-8 (which
> happens on Windows). The upshot is that the Lisp code expects
> some number of characters, but gets a different number of
> characters instead.
>
> But this is all basically stabbing in the dark, since I have no
> idea what that package does and what the program whose output it
> reads does.
Hi Eli,
Thanks for your prompt reply. Sorry for my email not being more
descriptive and self-contained. i linked to the GitHub issue:
https://github.com/flexibeast/ebuku/issues/32
as there is already an extended discussion there about this issue,
which itself links to a previous issue and discussion:
https://github.com/flexibeast/ebuku/issues/31
in which the user first reported an "Invalid string for collation"
issue. That issue was addressed, after some discussion, by setting
LC_ALL to the same value that the user had set LANG,
i.e. "zh_CN.UTF-8". That left us with issue 32, which is the one
i'm asking about here.
Some better background about the software involved:
`buku` provides a command-line interface to an SQLite-based
database of Web bookmarks, allowing one to save, delete and search
for bookmarks, with each bookmark able to have a comment and tags
associated with it.
`Ebuku` is a package that provides an Emacs-based UI for buku. It
allows the user to add bookmarks, edit them, remove them, search
them etc. without actually leaving Emacs. It does so by running
`call-process` to call `buku` with the appropriate options,
receiving the resulting output in a buffer, then processing the
data in that buffer in order to present the user with the relevant
results.
ebuku.el has a function:
(defun ebuku--call-buku (args)
"Internal function for calling `buku' with list ARGS." (unless
ebuku-buku-path
(error "Couldn't find buku: check 'ebuku-buku-path'"))
(apply #'call-process
`(,ebuku-buku-path nil t nil
"--np" "--nc" "--db"
,ebuku-database-path ,@args)))
which gets called in several places - e.g.
https://github.com/flexibeast/ebuku/blob/c854d128cba8576fe9693c19109b5deafb573e99/ebuku.el#L534
- to put the contents inside a temp buffer, which is then 'parsed'
for the information to be presented to the user.
In a comment from a couple of days ago, and after having noted in
a comment on issue 31:
https://github.com/flexibeast/ebuku/issues/31#issuecomment-2053557703
that they'd set LANG on their system to "zh_CN.UTF-8", the user
wrote
(https://github.com/flexibeast/ebuku/issues/32#issuecomment-2058289816):
> I set the value with (set-language-environment "UTF-8"). I
> remember I set up this value bacause I don't want my files
> containing Chinese to be encoded by GBK encoding.
Then, in
https://github.com/flexibeast/ebuku/issues/32#issuecomment-2058498373,
i wrote:
> if i remember correctly, the default encoding used by Windows is
> UTF-16, not UTF-8. So i'm wondering if that's somehow being used
> to transfer data from the buku process to the Emacs process,
> regardless of the value of LANG and LC_ALL, and regardless of
> the encoding of the buku database itself?
to which the user responded:
> I think the Powershell will use UTF-16 to encode instead of
> UTF-8.
Is that correct? Is that the case despite the user having
specified "zh_CN.UTF-8"? But if that's the case, why does removing
the CRAB emoji from text being operated on by string-match /
match-string make the issue disappear? Is it perhaps something to
do with
the code point for the CRAB emoji being outside the BMP?
> Suggest that you ask the user who reported that to show the
> actual output of the sub-process (e.g., by running the same
> command outside of Emacs and redirecting output to a file), and
> if the output looks correct, examine the Lisp code which
> processes that output, with an eye on how the text is decoded.
> For example, if the text from the sub-process is supposed to be
> UTF-8 encoded, your Lisp code should bind coding-system-for-read
> to 'utf-8', to make sure it is decoded correctly.
Thanks, i can certainly do that, modulo the issue of whether the
LANG and LC_ALL variables have any effect data transferred between
the `buku` sub-process and Emacs. But what should i do to handle
the more general case of an arbitrary encoding? Do i need to have
a defcustom, with 'reasonable defaults', that the user can set if
necessary, which i use as the value to pass to
coding-system-for-read?
> Btw: using UTF-8 by default on MS-Windows is not a very good
> idea, even with Windows 11 where one can enable UTF-8 support
> (did they do it, btw?). Windows still doesn't support UTF-8
> well, even after the improvements in Windows 11, so the above
> settings might very well cause trouble. Suggest to ask the user
> to try the same recipe in "emacs -Q", and if the zh_CN.UTF-8
> stuff is set up outside Emacs, to try without it.
As i interpret their comments in the above discussions so far,
yes, they had themselves set LANG to "zh_CN.UTF-8" (and yes, as
described above, had definitely `set-language-environment` as
"UTF-8".
i'll certainly take your suggestions back to the user.
Thanks again,
Alexis.
next prev parent reply other threads:[~2024-04-18 7:07 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-18 5:39 "args-out-of-range" error when using data from external process on Windows Alexis
2024-04-18 6:01 ` Eli Zaretskii
2024-04-18 7:07 ` Alexis [this message]
2024-04-18 8:35 ` Eli Zaretskii
2024-04-18 11:20 ` Alexis
2024-04-19 3:16 ` Alexis
2024-04-19 7:29 ` Eli Zaretskii
2024-04-21 0:57 ` Alexis
2024-04-18 6:05 ` Eli Zaretskii
-- strict thread matches above, loose matches on Subject: below --
2024-04-18 6:11 Alexis
2024-04-18 7:08 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y19bywr6.fsf@gmail.com \
--to=flexibeast@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.