* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
@ 2025-01-05 10:35 Matthias Meulien
2025-01-05 18:03 ` Dmitry Gutov
0 siblings, 1 reply; 29+ messages in thread
From: Matthias Meulien @ 2025-01-05 10:35 UTC (permalink / raw)
To: 75379
[-- Attachment #1: Type: text/plain, Size: 9183 bytes --]
1. Make sure you have a Git repository with binary files containing say
the "copyright" word; One can clone
https://github.com/orontee/lesmotsdugene/ for example.
2. Start Emacs using a locale different from "C" or other English based
locales, for example "fr_FR.UTF8":
LANG=fr_FR.UTF8 emacs -Q
3. Then call `project-find-regexp' in the the Git repository identified
in step 1, and search for the word "copyright"; There's no results but
the following error message:
xref-matches-in-files: Search failed with status 0: grep:
content/images/planche_1.png : fichiers binaires correspondent
If Emacs is started with "C" locale, then there are results!
The problem comes from `xref-matches-in-files', precisely this block
where `grep' output has been hardcoded even if depending on the locale:
(when (and (/= (point-min) (point-max))
(not (looking-at grep-re))
;; TODO: Show these matches as well somehow?
;; Matching both Grep's and Ripgrep 13's messages.
(not (looking-at ".*[bB]inary file.* matches")))
(user-error "Search failed with status %d: %s" status
(buffer-substring (point-min) (line-end-position))))
As quick fix one cas use:
(map-do (lambda (key val)
(map-put xref-search-program-alist
key (concat "LANG=C " val)))
xref-search-program-alist)
In GNU Emacs 30.0.93 (build 1, x86_64-pc-linux-gnu, GTK+ Version
3.24.38, cairo version 1.16.0) of 2025-01-01 built on peitho
Repository revision: 7acfea19358da3a02e5884f5e7d56c87d7b16616
Repository branch: emacs-30
System Description: Debian GNU/Linux 12 (bookworm)
Configured using:
'configure --with-pgtk CFLAGS=-O3'
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY
INOTIFY PDUMPER PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XIM GTK3 ZLIB
Important settings:
value of $LANG: fr_FR.UTF-8
value of $XMODIFIERS: @im=ibus
locale-coding-system: utf-8-unix
Major mode: Markdown
Minor modes in effect:
highlight-changes-visible-mode: t
goto-address-mode: t
pulsar-global-mode: t
pulsar-mode: t
breadcrumb-mode: t
breadcrumb-local-mode: t
outline-minor-mode: t
guess-language-mode: t
flyspell-mode: t
desktop-save-mode: t
spacious-padding-mode: t
savehist-mode: t
server-mode: t
pixel-scroll-precision-mode: t
save-place-mode: t
electric-pair-mode: t
global-corfu-mode: t
corfu-mode: t
marginalia-mode: t
vertico-mode: t
global-display-fill-column-indicator-mode: t
global-so-long-mode: t
global-auto-revert-mode: t
auto-insert-mode: t
remember-notes-mode: t
which-key-mode: t
tooltip-mode: t
global-eldoc-mode: t
eldoc-mode: t
show-paren-mode: t
electric-layout-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tab-bar-mode: t
file-name-shadow-mode: t
context-menu-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
window-divider-mode: t
minibuffer-regexp-mode: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
Load-path shadows:
/home/matthias/.config/emacs/elpa/which-key-20240620.2145/which-key hides
/usr/local/share/emacs/30.0.93/lisp/which-key
/home/matthias/.config/emacs/elpa/modus-themes-20241228.1050/theme-loaddefs
hides /usr/local/share/emacs/30.0.93/lisp/theme-loaddefs
Features:
(shadow sort mail-extr emacsbug mm-archive network-stream url-cache
debbugs-gnu add-log debbugs soap-client url-http url-auth url-gw nsm
debbugs-compat tabify cus-start pcmpl-unix pcmpl-gnu edebug etags
fileloop bug-reference quail misearch multi-isearch cl-print tramp-cache
time-stamp shortdoc rng-xsd xsd-regexp rng-cmpct rng-nxml rng-valid
rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn
nxml-ns nxml-mode nxml-outln nxml-rap nxml-util nxml-enc xmltok generic
dired-aux mhtml-mode css-mode js sgml-mode facemenu find-dired ffap
help-fns radix-tree vc-hg vc-bzr vc-src vc-sccs vc-svn vc-cvs vc-rcs
log-view pcvs-util mule-util vc-dir vc hl-line display-line-numbers
hilit-chg goto-addr oc-basic org-element org-persist org-id org-refile
org-element-ast inline avl-tree generator ol-eww eww url-queue mm-url
ol-rmail ol-mhe ol-irc ol-info ol-gnus nnselect gnus-art mm-uu mml2015
mm-view mml-smime smime gnutls dig gnus-sum shr pixel-fill kinsoku
url-file svg dom gnus-group gnus-undo gnus-start gnus-dbus dbus xml
gnus-cloud nnimap nnmail mail-source utf7 nnoo gnus-spec gnus-int
gnus-range message sendmail yank-media puny rfc822 mml mml-sec epa epg
rfc6068 epg-config mm-decode mm-bodies mm-encode mail-parse rfc2231
rfc2047 rfc2045 ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus
nnheader gnus-util mail-utils range mm-util mail-prsvr ol-docview
doc-view jka-compr image-mode exif ol-bibtex bibtex ol-bbdb ol-w3m
ol-doi org-link-doi org ob ob-tangle ob-ref ob-lob ob-table ob-exp
org-macro org-src sh-script smie executable ob-comint org-pcomplete
org-list org-footnote org-faces org-entities ob-emacs-lisp ob-core
ob-eval org-cycle org-table ol org-fold org-fold-core org-keys oc
org-loaddefs cal-menu calendar cal-loaddefs org-version org-compat
org-macs pulsar eglot external-completion jsonrpc xref flymake ert ewoc
debug backtrace breadcrumb pulse imenu peitho-custom cus-edit cus-load
wid-edit dired-x dired dired-loaddefs grep reftex reftex-loaddefs
reftex-vars tex-mode compile markdown-mode edit-indirect color sql view
thingatpt scheme info-look python project pcase c++-ts-mode c-ts-mode
c-ts-common treesit skeleton find-file gdb-mi bindat gud noutline
outline ediff ediff-merg ediff-mult ediff-wind ediff-diff ediff-help
ediff-init ediff-util smerge-mode diff vc-git diff-mode track-changes
vc-dispatcher glasses whitespace guess-language flyspell find-func
ispell comp comp-cstr cl-extra warnings comp-run comp-common desktop
frameset spacious-padding modus-vivendi-tritanopia-theme
modus-vivendi-deuteranopia-theme modus-vivendi-tinted-theme
modus-vivendi-theme modus-operandi-tritanopia-theme
modus-operandi-deuteranopia-theme modus-operandi-tinted-theme
modus-operandi-theme modus-themes savehist server bookmark
text-property-search pp pixel-scroll cua-base time tar-mode arc-mode
archive-mode saveplace tramp-sh tramp trampver tramp-integration files-x
tramp-message help-mode tramp-compat xdg shell pcomplete comint ansi-osc
ring parse-time iso8601 time-date format-spec ansi-color tramp-loaddefs
elec-pair corfu marginalia vertico compat easy-mmode
display-fill-column-indicator so-long autorevert filenotify autoinsert
cc-mode cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine
cc-vars cc-defs generic-x derived remember diminish which-key face-remap
CMake-doc-autoloads Python-doc-autoloads breadcrumb-autoloads
cmake-mode-autoloads consult-autoloads corfu-autoloads debbugs-autoloads
devhelp-autoloads diminish-autoloads edit-indirect-autoloads
git-link-autoloads gnu-elpa-keyring-update-autoloads
guess-language-autoloads marginalia-autoloads markdown-mode-autoloads
meson-mode-autoloads modus-themes-autoloads nginx-mode-autoloads
powershell-autoloads pulsar-autoloads restclient-autoloads
rfc-mode-autoloads info spacious-padding-autoloads speechd-el-autoloads
systemd-autoloads rx tldr-autoloads vertico-autoloads
which-key-autoloads package browse-url url url-proxy url-privacy
url-expand url-methods url-history url-cookie generate-lisp-file
url-domsuf url-util mailcap url-handlers url-parse auth-source cl-seq
eieio eieio-core cl-macs icons password-cache json subr-x map byte-opt
gv bytecomp byte-compile url-vars cl-loaddefs cl-lib rmc iso-transl
tooltip cconv eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/pgtk-win pgtk-win term/common-win
touch-screen pgtk-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp
files window text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget keymap hashtable-print-readable backquote
threads dbusbind inotify dynamic-setting system-font-setting
font-render-setting cairo gtk pgtk lcms2 multi-tty move-toolbar
make-network-process native-compile emacs)
Memory information:
((conses 16 1437487 391817) (symbols 48 45565 30)
(strings 32 242916 64728) (string-bytes 1 8371886) (vectors 16 86880)
(vector-slots 8 1873292 92379) (floats 8 5893 1984)
(intervals 56 42369 288) (buffers 992 95))
--
Matthias
[-- Attachment #2: Type: text/html, Size: 10239 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 10:35 bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale Matthias Meulien
@ 2025-01-05 18:03 ` Dmitry Gutov
2025-01-05 18:46 ` Eli Zaretskii
2025-01-05 21:10 ` Matthias Meulien
0 siblings, 2 replies; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-05 18:03 UTC (permalink / raw)
To: Matthias Meulien, 75379
Hi!
On 05/01/2025 12:35, Matthias Meulien wrote:
> 1. Make sure you have a Git repository with binary files containing say
> the "copyright" word; One can clone
> https://github.com/orontee/lesmotsdugene/ <https://github.com/orontee/
> lesmotsdugene/> for example.
>
> 2. Start Emacs using a locale different from "C" or other English based
> locales, for example "fr_FR.UTF8":
>
> LANG=fr_FR.UTF8 emacs -Q
>
> 3. Then call `project-find-regexp' in the the Git repository identified
> in step 1, and search for the word "copyright"; There's no results but
> the following error message:
>
> xref-matches-in-files: Search failed with status 0: grep: content/
> images/planche_1.png : fichiers binaires correspondent
>
> If Emacs is started with "C" locale, then there are results!
Thanks for the detailed report.
> The problem comes from `xref-matches-in-files', precisely this block
> where `grep' output has been hardcoded even if depending on the locale:
>
> (when (and (/= (point-min) (point-max))
> (not (looking-at grep-re))
> ;; TODO: Show these matches as well somehow?
> ;; Matching both Grep's and Ripgrep 13's messages.
> (not (looking-at ".*[bB]inary file.* matches")))
> (user-error "Search failed with status %d: %s" status
> (buffer-substring (point-min) (line-end-position))))
>
> As quick fix one cas use:
>
> (map-do (lambda (key val)
> (map-put xref-search-program-alist
> key (concat "LANG=C " val)))
> xref-search-program-alist)
Overriding the language seems indeed the way to go here.
About using LANG specifically, any chance that it might interfere with
the system's configured encoding, e.g. UTF-8 vs other? In your example,
does searching for accented characters work as well?
IIUC we can try LC_MESSAGES as the more specialized var. Does
LC_MESSAGES=en work as well?
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 18:03 ` Dmitry Gutov
@ 2025-01-05 18:46 ` Eli Zaretskii
2025-01-05 19:35 ` Dmitry Gutov
2025-01-05 21:22 ` Matthias Meulien
2025-01-05 21:10 ` Matthias Meulien
1 sibling, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2025-01-05 18:46 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 75379, orontee
> Date: Sun, 5 Jan 2025 20:03:34 +0200
> From: Dmitry Gutov <dmitry@gutov.dev>
>
> Overriding the language seems indeed the way to go here.
>
> About using LANG specifically, any chance that it might interfere with
> the system's configured encoding, e.g. UTF-8 vs other? In your example,
> does searching for accented characters work as well?
>
> IIUC we can try LC_MESSAGES as the more specialized var. Does
> LC_MESSAGES=en work as well?
Please note that this doesn't work on Windows.
First, the Windows locale-dependent routines don't heed environment
variables, so setting LANG etc. in the environment will only do what
you expect if the program in question was either explicitly programmed
to pay attention to those variables or was linked with Gnulib
replacements for locale functions.
And second LC_MESSAGES is not supported by Windows locales at all.
Can't we instead have a database of these messages, like we do with
the "password" prompts?
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 18:46 ` Eli Zaretskii
@ 2025-01-05 19:35 ` Dmitry Gutov
2025-01-05 20:16 ` Eli Zaretskii
2025-01-05 21:22 ` Matthias Meulien
1 sibling, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-05 19:35 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 75379, orontee
On 05/01/2025 20:46, Eli Zaretskii wrote:
>> Date: Sun, 5 Jan 2025 20:03:34 +0200
>> From: Dmitry Gutov <dmitry@gutov.dev>
>>
>> Overriding the language seems indeed the way to go here.
>>
>> About using LANG specifically, any chance that it might interfere with
>> the system's configured encoding, e.g. UTF-8 vs other? In your example,
>> does searching for accented characters work as well?
>>
>> IIUC we can try LC_MESSAGES as the more specialized var. Does
>> LC_MESSAGES=en work as well?
>
> Please note that this doesn't work on Windows.
>
> First, the Windows locale-dependent routines don't heed environment
> variables, so setting LANG etc. in the environment will only do what
> you expect if the program in question was either explicitly programmed
> to pay attention to those variables or was linked with Gnulib
> replacements for locale functions.
>
> And second LC_MESSAGES is not supported by Windows locales at all.
Okay, but first of all, do Grep or Ripgrep use different localizations
on Windows, not just English?
If yes, is there a way to force locale at least for these ports?
> Can't we instead have a database of these messages, like we do with
> the "password" prompts?
Like the one is 'password-word-equivalents'? It seems like the approach
of last resort. But if nothing else will work...
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 19:35 ` Dmitry Gutov
@ 2025-01-05 20:16 ` Eli Zaretskii
2025-01-07 14:17 ` Dmitry Gutov
0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2025-01-05 20:16 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 75379, orontee
> Date: Sun, 5 Jan 2025 21:35:56 +0200
> Cc: orontee@gmail.com, 75379@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
>
> On 05/01/2025 20:46, Eli Zaretskii wrote:
> >> Date: Sun, 5 Jan 2025 20:03:34 +0200
> >> From: Dmitry Gutov <dmitry@gutov.dev>
> >>
> >> Overriding the language seems indeed the way to go here.
> >>
> >> About using LANG specifically, any chance that it might interfere with
> >> the system's configured encoding, e.g. UTF-8 vs other? In your example,
> >> does searching for accented characters work as well?
> >>
> >> IIUC we can try LC_MESSAGES as the more specialized var. Does
> >> LC_MESSAGES=en work as well?
> >
> > Please note that this doesn't work on Windows.
> >
> > First, the Windows locale-dependent routines don't heed environment
> > variables, so setting LANG etc. in the environment will only do what
> > you expect if the program in question was either explicitly programmed
> > to pay attention to those variables or was linked with Gnulib
> > replacements for locale functions.
> >
> > And second LC_MESSAGES is not supported by Windows locales at all.
>
> Okay, but first of all, do Grep or Ripgrep use different localizations
> on Windows, not just English?
For Grep, it depends on how it was configured when building. The
default configuration uses gettext to translate messages, and this
message is marked as translated.
For Ripgrep, I don't know.
> If yes, is there a way to force locale at least for these ports?
I'm not sure, and I don't have a port here that supports translations
which I could test. The only hope is if recent versions of Grep are
built in a way that does honor the environment variables, because the
Unix trick of saying "locale=FOO grep ..." doesn't work on Windows:
the locale is a global user-level setting.
Does someone who uses Windows have Grep built with gettext, and could
try setting the various locale-related environment variables?
> > Can't we instead have a database of these messages, like we do with
> > the "password" prompts?
>
> Like the one is 'password-word-equivalents'? It seems like the approach
> of last resort. But if nothing else will work...
Agree.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 18:03 ` Dmitry Gutov
2025-01-05 18:46 ` Eli Zaretskii
@ 2025-01-05 21:10 ` Matthias Meulien
2025-01-06 1:32 ` Dmitry Gutov
1 sibling, 1 reply; 29+ messages in thread
From: Matthias Meulien @ 2025-01-05 21:10 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 75379
[-- Attachment #1: Type: text/plain, Size: 726 bytes --]
> Thanks for the detailed report.
>
You're welcome.
I feel a bit culprit since I lived with that bug for such a long time and
only started debugging today... Shame on me.
(...) About using LANG specifically, any chance that it might interfere
> with
> the system's configured encoding, e.g. UTF-8 vs other? In your example,
> does searching for accented characters work as well?
>
Yes. I added LANG=C then checked succesfully that search succeed with
french guillemet «, accented letters é, ç, and non-breaking space.
> IIUC we can try LC_MESSAGES as the more specialized var. Does
> LC_MESSAGES=en work as well?
>
No. Matches in binary files make the search fail in that case.
--
Matthias
[-- Attachment #2: Type: text/html, Size: 1434 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 18:46 ` Eli Zaretskii
2025-01-05 19:35 ` Dmitry Gutov
@ 2025-01-05 21:22 ` Matthias Meulien
2025-01-05 21:29 ` Matthias Meulien
` (2 more replies)
1 sibling, 3 replies; 29+ messages in thread
From: Matthias Meulien @ 2025-01-05 21:22 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Dmitry Gutov, 75379
[-- Attachment #1: Type: text/plain, Size: 474 bytes --]
>
> Can't we instead have a database of these messages, like we do with
> the "password" prompts?
>
I am not familiar with all grep options but I saw that:
‘-I’
Process a binary file as if it did not contain matching data; this
is equivalent to the ‘--binary-files=without-match’ option.
Just tested
(setq xref-search-program-alist '((grep . "xargs -0 grep <C> --null -snHE
-I -e <R>"))
and it works fine on my side.
--
Matthias
[-- Attachment #2: Type: text/html, Size: 949 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 21:22 ` Matthias Meulien
@ 2025-01-05 21:29 ` Matthias Meulien
2025-01-06 13:03 ` Eli Zaretskii
2025-01-06 1:55 ` Dmitry Gutov
2025-01-06 13:02 ` Eli Zaretskii
2 siblings, 1 reply; 29+ messages in thread
From: Matthias Meulien @ 2025-01-05 21:29 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Dmitry Gutov, 75379
[-- Attachment #1: Type: text/plain, Size: 1101 bytes --]
And the rg (ripgrep) manual page says:
By default, ripgrep attempts to automatically skip binary files in order
to improve the relevance of results and make the search faster.
And ugrep support -I in with the same meaning as grep.
Thus I propose to add both -I to grep and ugrep, remove the processing of
search output for binary file messages, and extend doc string for users to
be aware that the behavior has (slightly) changed.
I can provide a patch in those lines if needed.
Le dim. 5 janv. 2025 à 22:22, Matthias Meulien <orontee@gmail.com> a écrit :
> Can't we instead have a database of these messages, like we do with
>> the "password" prompts?
>>
>
> I am not familiar with all grep options but I saw that:
>
> ‘-I’
> Process a binary file as if it did not contain matching data; this
> is equivalent to the ‘--binary-files=without-match’ option.
>
> Just tested
>
> (setq xref-search-program-alist '((grep . "xargs -0 grep <C> --null
> -snHE -I -e <R>"))
>
> and it works fine on my side.
> --
> Matthias
>
--
Matthias
[-- Attachment #2: Type: text/html, Size: 2048 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 21:10 ` Matthias Meulien
@ 2025-01-06 1:32 ` Dmitry Gutov
0 siblings, 0 replies; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-06 1:32 UTC (permalink / raw)
To: Matthias Meulien; +Cc: 75379
On 05/01/2025 23:10, Matthias Meulien wrote:
> Thanks for the detailed report.
>
>
> You're welcome.
>
> I feel a bit culprit since I lived with that bug for such a long time
> and only started debugging today... Shame on me.
Late is certainly better than never.
> (...) About using LANG specifically, any chance that it might
> interfere with
> the system's configured encoding, e.g. UTF-8 vs other? In your example,
> does searching for accented characters work as well?
>
>
> Yes. I added LANG=C then checked succesfully that search succeed with
> french guillemet «, accented letters é, ç, and non-breaking space.
Thanks, that's a good sign. Perhaps someone else with experience in
process output encoding could confirm that this is generally a sane
approach, one that shouldn't lead to fewer matches caused by encoding
mismatch.
> IIUC we can try LC_MESSAGES as the more specialized var. Does
> LC_MESSAGES=en work as well?
>
>
> No. Matches in binary files make the search fail in that case.
Hmm, what about LC_MESSAGES=C?
If neither works, could you try that approach in the terminal? Does it
result in French text anyway, meaning this variable doesn't affect the
language in Grep?
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 21:22 ` Matthias Meulien
2025-01-05 21:29 ` Matthias Meulien
@ 2025-01-06 1:55 ` Dmitry Gutov
2025-01-06 12:36 ` Matthias Meulien
2025-01-06 17:36 ` Juri Linkov
2025-01-06 13:02 ` Eli Zaretskii
2 siblings, 2 replies; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-06 1:55 UTC (permalink / raw)
To: Matthias Meulien, Eli Zaretskii; +Cc: 75379
On 05/01/2025 23:22, Matthias Meulien wrote:
>
> I am not familiar with all grep options but I saw that:
>
> ‘-I’
> Process a binary file as if it did not contain matching data; this
> is equivalent to the ‘--binary-files=without-match’ option.
>
> Just tested
>
> (setq xref-search-program-alist '((grep . "xargs -0 grep <C> --null -
> snHE -I -e <R>"))
>
> and it works fine on my side.
Thanks, this is a solid proposal, but as per comment:
;; TODO: Show these matches as well somehow?
we would probably want to print these weird matches as well, in the
future. As you mention, search programs have a flag which avoids
printing these matches, but in certain rare cases it might happen that a
mostly text file is detected as binary - and then it seems preferable to
print all of such matches in the buffer rather than ignore them. (Unless
people disagree?)
And yeah, it's an old comment, so this improvement is not high on the
list, but whenever we (I/you/anybody else) get around to implementing
it, we'd have to change the default entries in xref-search-program-alist
again - and these get customized by users over the years, which means
not everybody would get the fix together with the package's or Emacs's
update. So a fix using other means would be better if feasible.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-06 1:55 ` Dmitry Gutov
@ 2025-01-06 12:36 ` Matthias Meulien
2025-01-06 12:42 ` Matthias Meulien
2025-01-06 14:11 ` Dmitry Gutov
2025-01-06 17:36 ` Juri Linkov
1 sibling, 2 replies; 29+ messages in thread
From: Matthias Meulien @ 2025-01-06 12:36 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Eli Zaretskii, 75379
[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]
>
> Thanks, this is a solid proposal, but as per comment:
>
> ;; TODO: Show these matches as well somehow?
>
> we would probably want to print these weird matches as well, in the
> future. As you mention, search programs have a flag which avoids
> printing these matches, but in certain rare cases it might happen that a
> mostly text file is detected as binary - and then it seems preferable to
> print all of such matches in the buffer rather than ignore them. (Unless
> people disagree?)
>
> And yeah, it's an old comment, so this improvement is not high on the
> list, but whenever we (I/you/anybody else) get around to implementing
> it,
What would be the "right thing to do"? Should we call grep and ugrep with
"--binary-files=text" (and ripgrep has the equivalent "-a") and then ask
Emacs to guess whether each match is "compatible" with the process encoding
system and based on that decide whether to display the match or print a
warning like "match found among unprintable binary data" nearby the file
name?
--
Matthias
[-- Attachment #2: Type: text/html, Size: 1494 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-06 12:36 ` Matthias Meulien
@ 2025-01-06 12:42 ` Matthias Meulien
2025-01-06 14:13 ` Dmitry Gutov
2025-01-06 14:11 ` Dmitry Gutov
1 sibling, 1 reply; 29+ messages in thread
From: Matthias Meulien @ 2025-01-06 12:42 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Eli Zaretskii, 75379
[-- Attachment #1.1: Type: text/plain, Size: 1335 bytes --]
Or do we want to reproduce rgrep behavior with xref? When I search with
`rgrep` I get the following:
[image: image.png]
Le lun. 6 janv. 2025 à 13:36, Matthias Meulien <orontee@gmail.com> a écrit :
> Thanks, this is a solid proposal, but as per comment:
>>
>> ;; TODO: Show these matches as well somehow?
>>
>> we would probably want to print these weird matches as well, in the
>> future. As you mention, search programs have a flag which avoids
>> printing these matches, but in certain rare cases it might happen that a
>> mostly text file is detected as binary - and then it seems preferable to
>> print all of such matches in the buffer rather than ignore them. (Unless
>> people disagree?)
>>
>> And yeah, it's an old comment, so this improvement is not high on the
>> list, but whenever we (I/you/anybody else) get around to implementing
>> it,
>
>
> What would be the "right thing to do"? Should we call grep and ugrep with
> "--binary-files=text" (and ripgrep has the equivalent "-a") and then ask
> Emacs to guess whether each match is "compatible" with the process encoding
> system and based on that decide whether to display the match or print a
> warning like "match found among unprintable binary data" nearby the file
> name?
> --
> Matthias
>
--
Matthias
[-- Attachment #1.2: Type: text/html, Size: 2210 bytes --]
[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 123430 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 21:22 ` Matthias Meulien
2025-01-05 21:29 ` Matthias Meulien
2025-01-06 1:55 ` Dmitry Gutov
@ 2025-01-06 13:02 ` Eli Zaretskii
2025-01-06 14:13 ` Dmitry Gutov
2 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2025-01-06 13:02 UTC (permalink / raw)
To: Matthias Meulien; +Cc: dmitry, 75379
> From: Matthias Meulien <orontee@gmail.com>
> Date: Sun, 5 Jan 2025 22:22:33 +0100
> Cc: Dmitry Gutov <dmitry@gutov.dev>, 75379@debbugs.gnu.org
>
> Can't we instead have a database of these messages, like we do with
> the "password" prompts?
>
> I am not familiar with all grep options but I saw that:
>
> ‘-I’
> Process a binary file as if it did not contain matching data; this
> is equivalent to the ‘--binary-files=without-match’ option.
Isn't that GNU Grep specific?
In any case, we will have to document somewhere that removing this
option from the command is not recommended.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 21:29 ` Matthias Meulien
@ 2025-01-06 13:03 ` Eli Zaretskii
0 siblings, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2025-01-06 13:03 UTC (permalink / raw)
To: Matthias Meulien; +Cc: dmitry, 75379
> From: Matthias Meulien <orontee@gmail.com>
> Date: Sun, 5 Jan 2025 22:29:54 +0100
> Cc: Dmitry Gutov <dmitry@gutov.dev>, 75379@debbugs.gnu.org
>
> And the rg (ripgrep) manual page says:
>
> By default, ripgrep attempts to automatically skip binary files in order to improve the relevance of results
> and make the search faster.
>
> And ugrep support -I in with the same meaning as grep.
>
> Thus I propose to add both -I to grep and ugrep, remove the processing of search output for binary file
> messages, and extend doc string for users to be aware that the behavior has (slightly) changed.
Likewise here: we will need to warn people from removing these options
or adding something that would countermand them.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-06 12:36 ` Matthias Meulien
2025-01-06 12:42 ` Matthias Meulien
@ 2025-01-06 14:11 ` Dmitry Gutov
2025-01-07 5:42 ` Matthias Meulien
1 sibling, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-06 14:11 UTC (permalink / raw)
To: Matthias Meulien; +Cc: Eli Zaretskii, 75379
On 06/01/2025 14:36, Matthias Meulien wrote:
> Thanks, this is a solid proposal, but as per comment:
>
> ;; TODO: Show these matches as well somehow?
>
> we would probably want to print these weird matches as well, in the
> future. As you mention, search programs have a flag which avoids
> printing these matches, but in certain rare cases it might happen
> that a
> mostly text file is detected as binary - and then it seems
> preferable to
> print all of such matches in the buffer rather than ignore them.
> (Unless
> people disagree?)
>
> And yeah, it's an old comment, so this improvement is not high on the
> list, but whenever we (I/you/anybody else) get around to implementing
> it,
>
>
> What would be the "right thing to do"?
To try to fix the current behavior on FR locales, we would tell grep to
output its messages in English. That would make xref-matches-in-files
behave the same across languages.
Step 2 would be to render the "binary file matches" elements in the UI.
> Should we call grep and ugrep
> with "--binary-files=text" (and ripgrep has the equivalent "-a") and
> then ask Emacs to guess whether each match is "compatible" with the
> process encoding system and based on that decide whether to display the
> match or print a warning like "match found among unprintable binary
> data" nearby the file name?
That's a more advanced solution - not sure if we can handle edge cases
uniformly, such as actual binary files without newlines.
Apparently newer Grep versions (2.11+ or something like that) will also
break lines on null bytes, but that still creates higher odds of having
very long strings in the output sometimes.
Could be worth an experiment, though. A possible upside is being able to
display these matches just like the others.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-06 12:42 ` Matthias Meulien
@ 2025-01-06 14:13 ` Dmitry Gutov
0 siblings, 0 replies; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-06 14:13 UTC (permalink / raw)
To: Matthias Meulien; +Cc: Eli Zaretskii, 75379
On 06/01/2025 14:42, Matthias Meulien wrote:
> Or do we want to reproduce rgrep behavior with xref? When I search with
> `rgrep` I get the following:
>
> image.png
That's what I was thinking of, yes. Adding similar entries to the Xref
output buffer.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-06 13:02 ` Eli Zaretskii
@ 2025-01-06 14:13 ` Dmitry Gutov
0 siblings, 0 replies; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-06 14:13 UTC (permalink / raw)
To: Eli Zaretskii, Matthias Meulien; +Cc: 75379
On 06/01/2025 15:02, Eli Zaretskii wrote:
>> ‘-I’
>> Process a binary file as if it did not contain matching data; this
>> is equivalent to the ‘--binary-files=without-match’ option.
> Isn't that GNU Grep specific?
It's documented in https://man.freebsd.org/cgi/man.cgi?grep(1) too, at
the very least.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-06 1:55 ` Dmitry Gutov
2025-01-06 12:36 ` Matthias Meulien
@ 2025-01-06 17:36 ` Juri Linkov
2025-01-06 20:33 ` Dmitry Gutov
1 sibling, 1 reply; 29+ messages in thread
From: Juri Linkov @ 2025-01-06 17:36 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 75379, Eli Zaretskii, Matthias Meulien
> ;; TODO: Show these matches as well somehow?
>
> we would probably want to print these weird matches as well, in the
> future. As you mention, search programs have a flag which avoids printing
> these matches, but in certain rare cases it might happen that a mostly text
> file is detected as binary - and then it seems preferable to print all of
> such matches in the buffer rather than ignore them. (Unless people
> disagree?)
Indeed, "Binary file matches" is a very important message that
helps not to miss any matches in a text file that happens
to accidentally contain a NUL byte. This saved me many times
while using rgrep. 'project-find-regexp' could do the same,
and show the same messages in the *xref* output buffer.
So to not mess with translations, a simpler solution would be
just to copy all unhandled messages from grep/ripgrep output
to the xref buffer as is.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-06 17:36 ` Juri Linkov
@ 2025-01-06 20:33 ` Dmitry Gutov
2025-01-07 17:39 ` Juri Linkov
0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-06 20:33 UTC (permalink / raw)
To: Juri Linkov; +Cc: 75379, Eli Zaretskii, Matthias Meulien
On 06/01/2025 19:36, Juri Linkov wrote:
> Indeed, "Binary file matches" is a very important message that
> helps not to miss any matches in a text file that happens
> to accidentally contain a NUL byte. This saved me many times
> while using rgrep. 'project-find-regexp' could do the same,
> and show the same messages in the*xref* output buffer.
>
> So to not mess with translations, a simpler solution would be
> just to copy all unhandled messages from grep/ripgrep output
> to the xref buffer as is.
Good point, maybe we could show different messages this way.
But I think what I was trying to do there is distinguish between Grep
succeeding and ending up with an error (which we should report with
user-error), and the process exit status wasn't enough for that.
Indeed, here's a command to try:
git ls-files -z | xargs -0 grep gtags
In the Emacs repository (among others) it exits with the status 123,
apparently one or more of the Grep sub-invocations ended up with
non-zero status (likely 1, indicating "no matches"). Even though the
combined search finds a bunch of results, that doesn't change xargs's
exit status. And we can't special-case the status 123 because "if any
invocation of the command exited with status 1-125" covers both Grep
calls that found nothing and Grep calls which were done with
unrecognized flags (Grep exit status 2, IIUC).
Also, when we know the format of come messages we can parse the file
name out of them and create a button in the output buffer. Simply
copying any unhandled messages removes that possibility.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-06 14:11 ` Dmitry Gutov
@ 2025-01-07 5:42 ` Matthias Meulien
2025-01-07 12:45 ` Eli Zaretskii
2025-01-07 14:24 ` Dmitry Gutov
0 siblings, 2 replies; 29+ messages in thread
From: Matthias Meulien @ 2025-01-07 5:42 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Eli Zaretskii, 75379
[-- Attachment #1: Type: text/plain, Size: 530 bytes --]
> To try to fix the current behavior on FR locales, we would tell grep to
> output its messages in English. That would make xref-matches-in-files
> behave the same across languages.
>
> Step 2 would be to render the "binary file matches" elements in the UI.
>
Why not keep user locale setting and keep current grep args, but send a
generated file with NUL chars to collect the output message and make
current check generating the error be dynamically generated?
It would cover both cases at once. And work on Windows, right?
>
[-- Attachment #2: Type: text/html, Size: 1053 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-07 5:42 ` Matthias Meulien
@ 2025-01-07 12:45 ` Eli Zaretskii
2025-01-07 14:24 ` Dmitry Gutov
1 sibling, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2025-01-07 12:45 UTC (permalink / raw)
To: Matthias Meulien; +Cc: dmitry, 75379
> From: Matthias Meulien <orontee@gmail.com>
> Date: Tue, 7 Jan 2025 06:42:43 +0100
> Cc: Eli Zaretskii <eliz@gnu.org>, 75379@debbugs.gnu.org
>
> Why not keep user locale setting and keep current grep args, but send a generated file with NUL chars to
> collect the output message and make current check generating the error be dynamically generated?
>
> It would cover both cases at once. And work on Windows, right?
It should, yes.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-05 20:16 ` Eli Zaretskii
@ 2025-01-07 14:17 ` Dmitry Gutov
2025-01-07 14:23 ` Eli Zaretskii
0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-07 14:17 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 75379, orontee
On 05/01/2025 22:16, Eli Zaretskii wrote:
>>> First, the Windows locale-dependent routines don't heed environment
>>> variables, so setting LANG etc. in the environment will only do what
>>> you expect if the program in question was either explicitly programmed
>>> to pay attention to those variables or was linked with Gnulib
>>> replacements for locale functions.
>>>
>>> And second LC_MESSAGES is not supported by Windows locales at all.
>>
>> Okay, but first of all, do Grep or Ripgrep use different localizations
>> on Windows, not just English?
>
> For Grep, it depends on how it was configured when building. The
> default configuration uses gettext to translate messages, and this
> message is marked as translated.
Okay, but if it's not configured to use gettext, would it just use
English, or are there some other mechanisms?
Looking at
https://stackoverflow.com/questions/9268379/non-localized-version-of-mingw-msys2,
there recommendations are along the standard lines of using either LANG
or LC_ALL.
> For Ripgrep, I don't know.
It seems to me Ripgrep is simply not translated, which is just fine for us.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-07 14:17 ` Dmitry Gutov
@ 2025-01-07 14:23 ` Eli Zaretskii
2025-01-07 14:26 ` Dmitry Gutov
0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2025-01-07 14:23 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 75379, orontee
> Date: Tue, 7 Jan 2025 16:17:24 +0200
> Cc: orontee@gmail.com, 75379@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
>
> On 05/01/2025 22:16, Eli Zaretskii wrote:
>
> >>> First, the Windows locale-dependent routines don't heed environment
> >>> variables, so setting LANG etc. in the environment will only do what
> >>> you expect if the program in question was either explicitly programmed
> >>> to pay attention to those variables or was linked with Gnulib
> >>> replacements for locale functions.
> >>>
> >>> And second LC_MESSAGES is not supported by Windows locales at all.
> >>
> >> Okay, but first of all, do Grep or Ripgrep use different localizations
> >> on Windows, not just English?
> >
> > For Grep, it depends on how it was configured when building. The
> > default configuration uses gettext to translate messages, and this
> > message is marked as translated.
>
> Okay, but if it's not configured to use gettext, would it just use
> English, or are there some other mechanisms?
It will output the original English messages unchanged.
> Looking at
> https://stackoverflow.com/questions/9268379/non-localized-version-of-mingw-msys2,
> there recommendations are along the standard lines of using either LANG
> or LC_ALL.
If that's the environment variables, they don't work reliably on
Windows, as I explained.
> > For Ripgrep, I don't know.
>
> It seems to me Ripgrep is simply not translated, which is just fine for us.
Yes.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-07 5:42 ` Matthias Meulien
2025-01-07 12:45 ` Eli Zaretskii
@ 2025-01-07 14:24 ` Dmitry Gutov
1 sibling, 0 replies; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-07 14:24 UTC (permalink / raw)
To: Matthias Meulien; +Cc: Eli Zaretskii, 75379
On 07/01/2025 07:42, Matthias Meulien wrote:
> Why not keep user locale setting and keep current grep args, but send a
> generated file with NUL chars to collect the output message and make
> current check generating the error be dynamically generated?
More complicated. We're not sure which programs will end in the
customization of xref-search-program-alist, and whether each of them
will detect binary files the same way.
Also Ripgrep outputs somewhat varying text like:
test/manual/etags/f-src/entry.strange.gz: binary file matches (found
"\0" byte around offset 23)
test/manual/etags/cp-src/clheir.cpp.gz: binary file matches (found "\0"
byte around offset 20)
Specifying the language from an environment var seems like the most
straightforward approach still.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-07 14:23 ` Eli Zaretskii
@ 2025-01-07 14:26 ` Dmitry Gutov
2025-01-07 14:50 ` Eli Zaretskii
0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-07 14:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 75379, orontee
On 07/01/2025 16:23, Eli Zaretskii wrote:
>> Looking at
>> https://stackoverflow.com/questions/9268379/non-localized-version-of-
>> mingw-msys2,
>> there recommendations are along the standard lines of using either LANG
>> or LC_ALL.
> If that's the environment variables, they don't work reliably on
> Windows, as I explained.
Please correct me if I'm wrong, but it sounds like either gettext is
supported, and these vars can be used to set the locale to English, or
gettext is unsupported, and the output is in English anyway. Both
scenarios are what we want to have in the end.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-07 14:26 ` Dmitry Gutov
@ 2025-01-07 14:50 ` Eli Zaretskii
0 siblings, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2025-01-07 14:50 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 75379, orontee
> Date: Tue, 7 Jan 2025 16:26:46 +0200
> Cc: orontee@gmail.com, 75379@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
>
> On 07/01/2025 16:23, Eli Zaretskii wrote:
> >> Looking at
> >> https://stackoverflow.com/questions/9268379/non-localized-version-of-
> >> mingw-msys2,
> >> there recommendations are along the standard lines of using either LANG
> >> or LC_ALL.
> > If that's the environment variables, they don't work reliably on
> > Windows, as I explained.
>
> Please correct me if I'm wrong, but it sounds like either gettext is
> supported, and these vars can be used to set the locale to English, or
> gettext is unsupported, and the output is in English anyway. Both
> scenarios are what we want to have in the end.
For programs compiled with gettext, I think you are right.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-06 20:33 ` Dmitry Gutov
@ 2025-01-07 17:39 ` Juri Linkov
2025-01-07 19:38 ` Dmitry Gutov
0 siblings, 1 reply; 29+ messages in thread
From: Juri Linkov @ 2025-01-07 17:39 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 75379, Eli Zaretskii, Matthias Meulien
>> Indeed, "Binary file matches" is a very important message that
>> helps not to miss any matches in a text file that happens
>> to accidentally contain a NUL byte. This saved me many times
>> while using rgrep. 'project-find-regexp' could do the same,
>> and show the same messages in the*xref* output buffer.
>> So to not mess with translations, a simpler solution would be
>> just to copy all unhandled messages from grep/ripgrep output
>> to the xref buffer as is.
>
> Good point, maybe we could show different messages this way.
It would be nice to keep all unprocessed lines.
> But I think what I was trying to do there is distinguish between Grep
> succeeding and ending up with an error (which we should report with
> user-error), and the process exit status wasn't enough for that.
>
> Indeed, here's a command to try:
>
> git ls-files -z | xargs -0 grep gtags
>
> In the Emacs repository (among others) it exits with the status 123,
> apparently one or more of the Grep sub-invocations ended up with non-zero
> status (likely 1, indicating "no matches"). Even though the combined search
> finds a bunch of results, that doesn't change xargs's exit status. And we
> can't special-case the status 123 because "if any invocation of the command
> exited with status 1-125" covers both Grep calls that found nothing and
> Grep calls which were done with unrecognized flags (Grep exit status 2,
> IIUC).
This is a known problem. Since the exit status is unreliable,
this is why 'grep-exit-message' has to use such a trick that
no output (i.e. '(not (buffer-modified-p))') indicates no matches:
(if (eq status 'exit)
;; This relies on the fact that `compilation-start'
;; sets buffer-modified to nil before running the command,
;; so the buffer is still unmodified if there is no output.
(cond ((and (zerop code) (buffer-modified-p))
(if (> grep-num-matches-found 0)
(cons (format (ngettext "finished with %d match found\n"
"finished with %d matches found\n"
grep-num-matches-found)
grep-num-matches-found)
"matched")
'("finished with matches found\n" . "matched")))
((not (buffer-modified-p))
'("finished with no matches found\n" . "no match"))
> Also, when we know the format of come messages we can parse the file name
> out of them and create a button in the output buffer. Simply copying any
> unhandled messages removes that possibility.
Can we detect a file name in any message, e.g. by matching a path separator?
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-07 17:39 ` Juri Linkov
@ 2025-01-07 19:38 ` Dmitry Gutov
2025-01-08 7:48 ` Juri Linkov
0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2025-01-07 19:38 UTC (permalink / raw)
To: Juri Linkov; +Cc: 75379, Eli Zaretskii, Matthias Meulien
On 07/01/2025 19:39, Juri Linkov wrote:
> This is a known problem. Since the exit status is unreliable,
> this is why 'grep-exit-message' has to use such a trick that
> no output (i.e. '(not (buffer-modified-p))') indicates no matches:
What about errors, though? Missing programs, unsupported flags, etc.
Maybe Grep gets by without that due to the explicit probing step in
grep-compute-defaults, but I'm not sure it's worth building up its
counterpart in xref.el.
> (if (eq status 'exit)
> ;; This relies on the fact that `compilation-start'
> ;; sets buffer-modified to nil before running the command,
> ;; so the buffer is still unmodified if there is no output.
> (cond ((and (zerop code) (buffer-modified-p))
> (if (> grep-num-matches-found 0)
> (cons (format (ngettext "finished with %d match found\n"
> "finished with %d matches found\n"
> grep-num-matches-found)
> grep-num-matches-found)
> "matched")
> '("finished with matches found\n" . "matched")))
> ((not (buffer-modified-p))
> '("finished with no matches found\n" . "no match"))
>
>> Also, when we know the format of come messages we can parse the file name
>> out of them and create a button in the output buffer. Simply copying any
>> unhandled messages removes that possibility.
> Can we detect a file name in any message, e.g. by matching a path separator?
We use 'grep --null', so the file name separator is a zero byte.
We could scan the buffer to see whether there are any zero bytes (and if
none - that would mean no matches), but the "binary file matches"
message doesn't use that separator ¯\_(ツ)_/¯
Not does it start with a file name, so we have to have a separate
understanding about that message's structure anyway:
grep: test/lisp/gnus/mml-sec-resources/pubring.kbx: binary file matches
grep: test/lisp/gnus/mml-sec-resources/secring.gpg: binary file matches
grep: test/lisp/gnus/mml-sec-resources/trustdb.gpg: binary file matches
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale
2025-01-07 19:38 ` Dmitry Gutov
@ 2025-01-08 7:48 ` Juri Linkov
0 siblings, 0 replies; 29+ messages in thread
From: Juri Linkov @ 2025-01-08 7:48 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 75379, Eli Zaretskii, Matthias Meulien
>> This is a known problem. Since the exit status is unreliable,
>> this is why 'grep-exit-message' has to use such a trick that
>> no output (i.e. '(not (buffer-modified-p))') indicates no matches:
>
> What about errors, though? Missing programs, unsupported flags, etc.
>
> Maybe Grep gets by without that due to the explicit probing step in
> grep-compute-defaults, but I'm not sure it's worth building up its
> counterpart in xref.el.
I see no problem with displaying all error messages as is in the xref buffer.
>> (if (eq status 'exit)
>> ;; This relies on the fact that `compilation-start'
>> ;; sets buffer-modified to nil before running the command,
>> ;; so the buffer is still unmodified if there is no output.
>> (cond ((and (zerop code) (buffer-modified-p))
>> (if (> grep-num-matches-found 0)
>> (cons (format (ngettext "finished with %d match found\n"
>> "finished with %d matches found\n"
>> grep-num-matches-found)
>> grep-num-matches-found)
>> "matched")
>> '("finished with matches found\n" . "matched")))
>> ((not (buffer-modified-p))
>> '("finished with no matches found\n" . "no match"))
>>
>>> Also, when we know the format of come messages we can parse the file name
>>> out of them and create a button in the output buffer. Simply copying any
>>> unhandled messages removes that possibility.
>> Can we detect a file name in any message, e.g. by matching a path separator?
>
> We use 'grep --null', so the file name separator is a zero byte.
>
> We could scan the buffer to see whether there are any zero bytes (and if
> none - that would mean no matches), but the "binary file matches" message
> doesn't use that separator ¯\_(ツ)_/¯
>
> Not does it start with a file name, so we have to have a separate
> understanding about that message's structure anyway:
>
> grep: test/lisp/gnus/mml-sec-resources/pubring.kbx: binary file matches
> grep: test/lisp/gnus/mml-sec-resources/secring.gpg: binary file matches
> grep: test/lisp/gnus/mml-sec-resources/trustdb.gpg: binary file matches
In the worst case there will be no button, and I see no problem with this too.
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2025-01-08 7:48 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-05 10:35 bug#75379: 30.0.93; project-find-regexp expects "C" or "en" locale Matthias Meulien
2025-01-05 18:03 ` Dmitry Gutov
2025-01-05 18:46 ` Eli Zaretskii
2025-01-05 19:35 ` Dmitry Gutov
2025-01-05 20:16 ` Eli Zaretskii
2025-01-07 14:17 ` Dmitry Gutov
2025-01-07 14:23 ` Eli Zaretskii
2025-01-07 14:26 ` Dmitry Gutov
2025-01-07 14:50 ` Eli Zaretskii
2025-01-05 21:22 ` Matthias Meulien
2025-01-05 21:29 ` Matthias Meulien
2025-01-06 13:03 ` Eli Zaretskii
2025-01-06 1:55 ` Dmitry Gutov
2025-01-06 12:36 ` Matthias Meulien
2025-01-06 12:42 ` Matthias Meulien
2025-01-06 14:13 ` Dmitry Gutov
2025-01-06 14:11 ` Dmitry Gutov
2025-01-07 5:42 ` Matthias Meulien
2025-01-07 12:45 ` Eli Zaretskii
2025-01-07 14:24 ` Dmitry Gutov
2025-01-06 17:36 ` Juri Linkov
2025-01-06 20:33 ` Dmitry Gutov
2025-01-07 17:39 ` Juri Linkov
2025-01-07 19:38 ` Dmitry Gutov
2025-01-08 7:48 ` Juri Linkov
2025-01-06 13:02 ` Eli Zaretskii
2025-01-06 14:13 ` Dmitry Gutov
2025-01-05 21:10 ` Matthias Meulien
2025-01-06 1:32 ` Dmitry Gutov
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.