all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Alexis <flexibeast@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: "args-out-of-range" error when using data from external process on Windows
Date: Thu, 18 Apr 2024 17:07:25 +1000	[thread overview]
Message-ID: <87y19bywr6.fsf@gmail.com> (raw)
In-Reply-To: <86msprfbul.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 18 Apr 2024 09:01:38 +0300")


Eli Zaretskii <eliz@gnu.org> writes:

> Crystal ball says the package assumes UTF-8 encoding of the text 
> from the sub-process, which is generally not what happens on 
> Windows.  Or maybe the package assumes that UTF-8 text from a 
> sub-process will necessarily be decoded as UTF-8, which again 
> can fail if the default coding-systems are not UTF-8 (which 
> happens on Windows).  The upshot is that the Lisp code expects 
> some number of characters, but gets a different number of 
> characters instead.
>
> But this is all basically stabbing in the dark, since I have no 
> idea what that package does and what the program whose output it 
> reads does. 

Hi Eli,

Thanks for your prompt reply. Sorry for my email not being more 
descriptive and self-contained. i linked to the GitHub issue:

  https://github.com/flexibeast/ebuku/issues/32

as there is already an extended discussion there about this issue, 
which itself links to a previous issue and discussion:

  https://github.com/flexibeast/ebuku/issues/31

in which the user first reported an "Invalid string for collation" 
issue. That issue was addressed, after some discussion, by setting 
LC_ALL to the same value that the user had set LANG, 
i.e. "zh_CN.UTF-8". That left us with issue 32, which is the one 
i'm asking about here.

Some better background about the software involved:

`buku` provides a command-line interface to an SQLite-based 
database of Web bookmarks, allowing one to save, delete and search 
for bookmarks, with each bookmark able to have a comment and tags 
associated with it.

`Ebuku` is a package that provides an Emacs-based UI for buku. It 
allows the user to add bookmarks, edit them, remove them, search 
them etc. without actually leaving Emacs. It does so by running 
`call-process` to call `buku` with the appropriate options, 
receiving the resulting output in a buffer, then processing the 
data in that buffer in order to present the user with the relevant 
results.

ebuku.el has a function:

(defun ebuku--call-buku (args) 
  "Internal function for calling `buku' with list ARGS."  (unless 
  ebuku-buku-path 
    (error "Couldn't find buku: check 'ebuku-buku-path'")) 
  (apply #'call-process 
         `(,ebuku-buku-path nil t nil 
                            "--np" "--nc" "--db" 
                            ,ebuku-database-path ,@args))) 

which gets called in several places - e.g. 
https://github.com/flexibeast/ebuku/blob/c854d128cba8576fe9693c19109b5deafb573e99/ebuku.el#L534 
- to put the contents inside a temp buffer, which is then 'parsed' 
for the information to be presented to the user.

In a comment from a couple of days ago, and after having noted in 
a comment on issue 31:

  https://github.com/flexibeast/ebuku/issues/31#issuecomment-2053557703

that they'd set LANG on their system to "zh_CN.UTF-8", the user 
wrote 
(https://github.com/flexibeast/ebuku/issues/32#issuecomment-2058289816):

> I set the value with (set-language-environment "UTF-8").  I 
> remember I set up this value bacause I don't want my files 
> containing Chinese to be encoded by GBK encoding.

Then, in 
https://github.com/flexibeast/ebuku/issues/32#issuecomment-2058498373, 
i wrote:

> if i remember correctly, the default encoding used by Windows is 
> UTF-16, not UTF-8. So i'm wondering if that's somehow being used 
> to transfer data from the buku process to the Emacs process, 
> regardless of the value of LANG and LC_ALL, and regardless of 
> the encoding of the buku database itself?

to which the user responded:

> I think the Powershell will use UTF-16 to encode instead of 
> UTF-8.

Is that correct? Is that the case despite the user having 
specified "zh_CN.UTF-8"? But if that's the case, why does removing 
the CRAB emoji from text being operated on by string-match / 
match-string make the issue disappear? Is it perhaps something to 
do with
the code point for the CRAB emoji being outside the BMP?

> Suggest that you ask the user who reported that to show the 
> actual output of the sub-process (e.g., by running the same 
> command outside of Emacs and redirecting output to a file), and 
> if the output looks correct, examine the Lisp code which 
> processes that output, with an eye on how the text is decoded. 
> For example, if the text from the sub-process is supposed to be 
> UTF-8 encoded, your Lisp code should bind coding-system-for-read 
> to 'utf-8', to make sure it is decoded correctly.

Thanks, i can certainly do that, modulo the issue of whether the 
LANG and LC_ALL variables have any effect data transferred between 
the `buku` sub-process and Emacs. But what should i do to handle 
the more general case of an arbitrary encoding? Do i need to have 
a defcustom, with 'reasonable defaults', that the user can set if 
necessary, which i use as the value to pass to 
coding-system-for-read?

> Btw: using UTF-8 by default on MS-Windows is not a very good 
> idea, even with Windows 11 where one can enable UTF-8 support 
> (did they do it, btw?).  Windows still doesn't support UTF-8 
> well, even after the improvements in Windows 11, so the above 
> settings might very well cause trouble.  Suggest to ask the user 
> to try the same recipe in "emacs -Q", and if the zh_CN.UTF-8 
> stuff is set up outside Emacs, to try without it.

As i interpret their comments in the above discussions so far, 
yes, they had themselves set LANG to "zh_CN.UTF-8" (and yes, as 
described above, had definitely `set-language-environment` as 
"UTF-8".

i'll certainly take your suggestions back to the user.

Thanks again,


Alexis.



  reply	other threads:[~2024-04-18  7:07 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-18  5:39 "args-out-of-range" error when using data from external process on Windows Alexis
2024-04-18  6:01 ` Eli Zaretskii
2024-04-18  7:07   ` Alexis [this message]
2024-04-18  8:35     ` Eli Zaretskii
2024-04-18 11:20       ` Alexis
2024-04-19  3:16       ` Alexis
2024-04-19  7:29         ` Eli Zaretskii
2024-04-21  0:57           ` Alexis
2024-04-18  6:05 ` Eli Zaretskii
  -- strict thread matches above, loose matches on Subject: below --
2024-04-18  6:11 Alexis
2024-04-18  7:08 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y19bywr6.fsf@gmail.com \
    --to=flexibeast@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.