From: Juri Linkov <juri@linkov.net>
To: Dmitry Gutov <dgutov@yandex.ru>
Cc: Eli Zaretskii <eliz@gnu.org>,
stephen.berman@gmx.net, emacs-devel@gnu.org
Subject: Re: dired-do-find-regexp failure with latin-1 encoding
Date: Sun, 29 Nov 2020 21:37:23 +0200 [thread overview]
Message-ID: <87tut8zfmk.fsf@mail.linkov.net> (raw)
In-Reply-To: <247a8edb-7b70-ad32-1ba1-43b5458a82b0@yandex.ru> (Dmitry Gutov's message of "Sat, 28 Nov 2020 23:04:10 +0200")
>>>> Adding -a probably cannot do any harm, but its support should be
>>>> detected, since I don't think it's portable enough (it isn't in the
>>>> latest Posix spec, at least).
>>>
>>> Are you sure about that? Are we sure it won't make searching binary
>>> files slower, for example?
>> It will be slower, but more useful: by default Grep just says "Binary
>> file foo matches".
>
> Do we want to search the "binary" files at all? Right now we simply filter
> such matches out (see the definition of xref-matches-in-files), and I have
> seen no complaints.
There are two cases: a really binary file, and a legit ascii file
with an occasional ^@ char. And grep can't distinguish one from another.
There is an option --binary-files=binary, but unfortunately it doesn't help,
it still outputs "Binary file matches".
So xref parser needs to be smart enough to detect whether the matched line
contains binary garbage when '-a' is used, or it's purely ascii.
Moreover, I think we should apply the same heuristics to the grep output
in grep.el and add '-a' to the grep command by default. Then grep.el
should prettify the lines with real binary garbage e.g. by hiding groups of
bytes between 0 and 32, or adding a 'display' property with ellipsis.
>>> Also, the manual has this warning:
>>>
>>> Warning: The -a option might output binary garbage, which can have
>>> nasty side effects if the output is a terminal and if the terminal
>>> driver interprets some of it as commands.
>>>
>>> ...which might conceivably mess up our parsing of Grep output sometimes?
>> This is not relevant, since we read that output, there's no terminal
>> device driver to interpret it and get messed up.
>
> Our interpreter is our regexp with which we parse. But I suppose as long as
> Grep doesn't insert unexpected newlines, the parser will be fine.
For grep output a bigger problem is that grep on binary data
might output too long lines before the terminating newline.
>> I actually don't think I understand why we need -a in this case, since
>> Grep looks for null bytes to decide this is a binary file, and encoded
>> non-ASCII characters don't have null bytes 9except if they are in
>> UTF-16).
>
> Good question.
The grep manual says that binary data are either output bytes that
are improperly encoded for the current locale, or null input bytes.
next prev parent reply other threads:[~2020-11-29 19:37 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-28 18:03 dired-do-find-regexp failure with latin-1 encoding Stephen Berman
2020-11-28 18:11 ` Eli Zaretskii
2020-11-28 18:46 ` Stephen Berman
2020-11-28 19:13 ` Eli Zaretskii
2020-11-28 19:44 ` Stephen Berman
2020-11-28 19:49 ` Eli Zaretskii
2020-11-28 20:16 ` Dmitry Gutov
2020-11-28 20:29 ` Eli Zaretskii
2020-11-28 21:04 ` Dmitry Gutov
2020-11-29 0:49 ` Dmitry Gutov
2020-11-29 15:19 ` Eli Zaretskii
2020-11-29 16:27 ` Dmitry Gutov
2020-11-29 17:18 ` Eli Zaretskii
2020-11-29 17:32 ` Dmitry Gutov
2020-11-29 18:42 ` Eli Zaretskii
2020-11-29 19:48 ` Dmitry Gutov
2020-11-29 15:06 ` Eli Zaretskii
2020-11-29 15:14 ` Yuri Khan
2020-11-29 15:36 ` Stephen Berman
2020-11-29 15:50 ` Eli Zaretskii
2020-11-29 16:07 ` Dmitry Gutov
2020-11-29 17:12 ` Eli Zaretskii
2020-11-29 17:19 ` Dmitry Gutov
2020-11-29 17:25 ` Eli Zaretskii
2020-11-29 17:44 ` Dmitry Gutov
2020-11-29 18:51 ` Eli Zaretskii
2020-11-29 19:07 ` Dmitry Gutov
2020-11-29 19:32 ` Eli Zaretskii
2020-11-29 19:34 ` Eli Zaretskii
2020-11-29 19:49 ` Stephen Berman
2020-11-29 19:49 ` Gregory Heytings via Emacs development discussions.
2020-11-29 19:37 ` Juri Linkov [this message]
2020-11-30 1:08 ` Dmitry Gutov
2020-11-30 20:54 ` Juri Linkov
2020-12-01 0:34 ` Dmitry Gutov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tut8zfmk.fsf@mail.linkov.net \
--to=juri@linkov.net \
--cc=dgutov@yandex.ru \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=stephen.berman@gmx.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.