unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Stephen Berman <stephen.berman@gmx.net>
To: emacs-devel@gnu.org
Subject: dired-do-find-regexp failure with latin-1 encoding
Date: Sat, 28 Nov 2020 19:03:17 +0100	[thread overview]
Message-ID: <87blfhjr4q.fsf@gmx.net> (raw)

My system's language encoding is en_US.UTF-8 but I have many files
encoded as iso-8859-1 (latin-1) and containing a mix of ASCII and
non-ASCII characters.  When I use dired-do-find-regexp on such files,
there are no matches in the *xref* buffer for lines containing both the
search string and a non-ASCII character.  If the file is encoded as
utf-8, then dired-do-find-regexp does find such lines.  Here's a minimal
reproducer:

0. echo aä > /tmp/test
1. emacs -Q /tmp/test ; the file encoding is utf-8
2. Type `C-x d RET', mark the file 'test', type `A a RET'
=> *xref* displays the line 'aä'
3. In buffer 'test' type `C-x RET f iso-8859-1 RET' and then `C-x C-s'
4. Repeat step 2
=> user-error: No matches for: a

dired-do-find-regexp calls xref-matches-in-files and that calls grep,
and that's where the failure happens, so strictly speaking this isn't an
Emacs bug, but it is a problem for users of dired-do-find-regexp
(dired-do-search and occur, for example, don't have this problem).  One
workaround is to add the -a option to the grep invocation in
xref-matches-in-files; then the search succeeds and the *xref* buffer
displays 'a\344'.  But this doesn't work if 'ä' is the search term.  For
the latter, I can get the correct output from grep by piping the output
of 'iconv -f ISO-8859-1 -t UTF-8' through to it, and indeed, prepending
'iconv -f ISO-8859-1 -t UTF-8 | ' to the grep invocation in
xref-matches-in-files does give the correct output in both cases.  But
this won't work if the file has a different non-utf-8 encoding, assuming
the issue isn't specific to latin-1.  Is there another alternative
(aside from "Someone™ can implement it in Emacs Lisp")?

Steve Berman



             reply	other threads:[~2020-11-28 18:03 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28 18:03 Stephen Berman [this message]
2020-11-28 18:11 ` dired-do-find-regexp failure with latin-1 encoding Eli Zaretskii
2020-11-28 18:46   ` Stephen Berman
2020-11-28 19:13     ` Eli Zaretskii
2020-11-28 19:44       ` Stephen Berman
2020-11-28 19:49         ` Eli Zaretskii
2020-11-28 20:16       ` Dmitry Gutov
2020-11-28 20:29         ` Eli Zaretskii
2020-11-28 21:04           ` Dmitry Gutov
2020-11-29  0:49             ` Dmitry Gutov
2020-11-29 15:19               ` Eli Zaretskii
2020-11-29 16:27                 ` Dmitry Gutov
2020-11-29 17:18                   ` Eli Zaretskii
2020-11-29 17:32                     ` Dmitry Gutov
2020-11-29 18:42                       ` Eli Zaretskii
2020-11-29 19:48                         ` Dmitry Gutov
2020-11-29 15:06             ` Eli Zaretskii
2020-11-29 15:14               ` Yuri Khan
2020-11-29 15:36                 ` Stephen Berman
2020-11-29 15:50                 ` Eli Zaretskii
2020-11-29 16:07               ` Dmitry Gutov
2020-11-29 17:12                 ` Eli Zaretskii
2020-11-29 17:19                   ` Dmitry Gutov
2020-11-29 17:25                     ` Eli Zaretskii
2020-11-29 17:44                       ` Dmitry Gutov
2020-11-29 18:51                         ` Eli Zaretskii
2020-11-29 19:07                           ` Dmitry Gutov
2020-11-29 19:32                             ` Eli Zaretskii
2020-11-29 19:34                               ` Eli Zaretskii
2020-11-29 19:49                             ` Stephen Berman
2020-11-29 19:49                           ` Gregory Heytings via Emacs development discussions.
2020-11-29 19:37             ` Juri Linkov
2020-11-30  1:08               ` Dmitry Gutov
2020-11-30 20:54                 ` Juri Linkov
2020-12-01  0:34                   ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87blfhjr4q.fsf@gmx.net \
    --to=stephen.berman@gmx.net \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).