unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* dired-do-find-regexp failure with latin-1 encoding
@ 2020-11-28 18:03 Stephen Berman
  2020-11-28 18:11 ` Eli Zaretskii
  0 siblings, 1 reply; 35+ messages in thread
From: Stephen Berman @ 2020-11-28 18:03 UTC (permalink / raw)
  To: emacs-devel

My system's language encoding is en_US.UTF-8 but I have many files
encoded as iso-8859-1 (latin-1) and containing a mix of ASCII and
non-ASCII characters.  When I use dired-do-find-regexp on such files,
there are no matches in the *xref* buffer for lines containing both the
search string and a non-ASCII character.  If the file is encoded as
utf-8, then dired-do-find-regexp does find such lines.  Here's a minimal
reproducer:

0. echo aä > /tmp/test
1. emacs -Q /tmp/test ; the file encoding is utf-8
2. Type `C-x d RET', mark the file 'test', type `A a RET'
=> *xref* displays the line 'aä'
3. In buffer 'test' type `C-x RET f iso-8859-1 RET' and then `C-x C-s'
4. Repeat step 2
=> user-error: No matches for: a

dired-do-find-regexp calls xref-matches-in-files and that calls grep,
and that's where the failure happens, so strictly speaking this isn't an
Emacs bug, but it is a problem for users of dired-do-find-regexp
(dired-do-search and occur, for example, don't have this problem).  One
workaround is to add the -a option to the grep invocation in
xref-matches-in-files; then the search succeeds and the *xref* buffer
displays 'a\344'.  But this doesn't work if 'ä' is the search term.  For
the latter, I can get the correct output from grep by piping the output
of 'iconv -f ISO-8859-1 -t UTF-8' through to it, and indeed, prepending
'iconv -f ISO-8859-1 -t UTF-8 | ' to the grep invocation in
xref-matches-in-files does give the correct output in both cases.  But
this won't work if the file has a different non-utf-8 encoding, assuming
the issue isn't specific to latin-1.  Is there another alternative
(aside from "Someone™ can implement it in Emacs Lisp")?

Steve Berman



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2020-12-01  0:34 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-28 18:03 dired-do-find-regexp failure with latin-1 encoding Stephen Berman
2020-11-28 18:11 ` Eli Zaretskii
2020-11-28 18:46   ` Stephen Berman
2020-11-28 19:13     ` Eli Zaretskii
2020-11-28 19:44       ` Stephen Berman
2020-11-28 19:49         ` Eli Zaretskii
2020-11-28 20:16       ` Dmitry Gutov
2020-11-28 20:29         ` Eli Zaretskii
2020-11-28 21:04           ` Dmitry Gutov
2020-11-29  0:49             ` Dmitry Gutov
2020-11-29 15:19               ` Eli Zaretskii
2020-11-29 16:27                 ` Dmitry Gutov
2020-11-29 17:18                   ` Eli Zaretskii
2020-11-29 17:32                     ` Dmitry Gutov
2020-11-29 18:42                       ` Eli Zaretskii
2020-11-29 19:48                         ` Dmitry Gutov
2020-11-29 15:06             ` Eli Zaretskii
2020-11-29 15:14               ` Yuri Khan
2020-11-29 15:36                 ` Stephen Berman
2020-11-29 15:50                 ` Eli Zaretskii
2020-11-29 16:07               ` Dmitry Gutov
2020-11-29 17:12                 ` Eli Zaretskii
2020-11-29 17:19                   ` Dmitry Gutov
2020-11-29 17:25                     ` Eli Zaretskii
2020-11-29 17:44                       ` Dmitry Gutov
2020-11-29 18:51                         ` Eli Zaretskii
2020-11-29 19:07                           ` Dmitry Gutov
2020-11-29 19:32                             ` Eli Zaretskii
2020-11-29 19:34                               ` Eli Zaretskii
2020-11-29 19:49                             ` Stephen Berman
2020-11-29 19:49                           ` Gregory Heytings via Emacs development discussions.
2020-11-29 19:37             ` Juri Linkov
2020-11-30  1:08               ` Dmitry Gutov
2020-11-30 20:54                 ` Juri Linkov
2020-12-01  0:34                   ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).