* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
@ 2009-04-09 16:28 Markus Triska
2009-04-09 17:22 ` Eli Zaretskii
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Markus Triska @ 2009-04-09 16:28 UTC (permalink / raw)
To: emacs-pretest-bug
With ~/töst.txt existing, when I do:
$ emacs -Q ~/
and press:
C-\ german-postfix RET C-s oe RET
to search for "ö" in the dired buffer (the input method correctly
converts the entered "oe" to "ö" in the minibuffer), I get:
Failing wrapped I-search [DE<]: ö
C-u C-x = on the "ö" in the dired buffer yields:
character: o (111, #o157, #x6f)
preferred charset: ascii (ASCII (ISO646 IRV))
code point: 0x6F
syntax: w which means: word
category: .:Base, a:ASCII, l:Latin, r:Roman
buffer code: #x6F
file code: #x6F (encoded by coding system utf-8-unix)
display: composed to form "ö" (see below)
Composed with the following character(s) "̈" using this font:
xft:-unknown-Cochin-normal-normal-normal-*-20-*-*-*-*-0-iso10646-1
by these glyphs:
[0 1 111 82 11 1 10 8 0 nil]
[0 1 776 235 6 0 6 12 -9 [-9 -1 0]]
Character code properties: customize what to show
name: LATIN SMALL LETTER O
general-category: Ll (Letter, Lowercase)
There are text properties here:
dired-filename t
fontified t
help-echo "mouse-2: visit this file in other window"
mouse-face highlight
C-u C-x = on the first "t" in "töst.txt" yields:
character: t (116, #o164, #x74)
preferred charset: ascii (ASCII (ISO646 IRV))
code point: 0x74
syntax: w which means: word
category: .:Base, a:ASCII, l:Latin, r:Roman
buffer code: #x74
file code: #x74 (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-20-*-*-*-m-0-iso10646-1 (#x57)
Character code properties: customize what to show
name: LATIN SMALL LETTER T
general-category: Ll (Letter, Lowercase)
There are text properties here:
dired-filename t
fontified t
help-echo "mouse-2: visit this file in other window"
mouse-face highlight
In GNU Emacs 23.0.92.3 (i386-apple-darwin9.6.1, GTK+ Version 2.14.7)
of 2009-04-09 on mt-imac.local
Windowing system distributor `The X.Org Foundation', version 11.0.10402000
configured using `configure '--with-tiff=no''
Important settings:
value of $LC_ALL: nil
value of $LC_COLLATE: nil
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: en.UTF-8
value of $XMODIFIERS: nil
locale-coding-system: utf-8-unix
default-enable-multibyte-characters: t
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2009-04-09 16:28 bug#2940: 23.0.92; C-s in dired fails to find files with umlauts Markus Triska
@ 2009-04-09 17:22 ` Eli Zaretskii
2009-04-09 17:33 ` Markus Triska
2011-07-11 22:02 ` Alp Aker
2019-11-02 6:12 ` Stefan Kangas
2 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2009-04-09 17:22 UTC (permalink / raw)
To: Markus Triska, 2940
> From: Markus Triska <markus.triska@gmx.at>
> Date: Thu, 09 Apr 2009 18:28:48 +0200
> Cc:
>
>
> With ~/töst.txt existing, when I do:
>
> $ emacs -Q ~/
>
> and press:
>
> C-\ german-postfix RET C-s oe RET
>
> to search for "ö" in the dired buffer (the input method correctly
> converts the entered "oe" to "ö" in the minibuffer), I get:
>
> Failing wrapped I-search [DE<]: ö
What's your value of file-name-coding-system? Does it help to say
C-x RET c utf-8 RET C-x d
instead of just "C-x d"?
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2009-04-09 17:22 ` Eli Zaretskii
@ 2009-04-09 17:33 ` Markus Triska
0 siblings, 0 replies; 18+ messages in thread
From: Markus Triska @ 2009-04-09 17:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 2940
Eli Zaretskii <eliz@gnu.org> writes:
> What's your value of file-name-coding-system?
It is nil, and default-file-name-coding-system is 'utf-8.
> Does it help to say
>
> C-x RET c utf-8 RET C-x d
>
> instead of just "C-x d"?
No, unfortunately not. Also for C-s it does not seem to make a
difference. When I enter an "ö" in *scratch*, C-u C-x = on it says:
character: ö (246, #o366, #xf6)
preferred charset: unicode (Unicode (ISO10646))
code point: 0xF6
syntax: w which means: word
category: .:Base, j:Japanese, l:Latin
to input: type "oe" with german-postfix
buffer code: #xC3 #xB6
file code: #xC3 #xB6 (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-20-*-*-*-m-0-iso10646-1 (#x7C)
Character code properties: customize what to show
name: LATIN SMALL LETTER O WITH DIAERESIS
old-name: LATIN SMALL LETTER O DIAERESIS
general-category: Ll (Letter, Lowercase)
decomposition: (111 776) ('o' '̈')
There are text properties here:
fontified t
This "ö" is thus also rendered with the expected font, in contrast to
the one in dired.
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2009-04-09 16:28 bug#2940: 23.0.92; C-s in dired fails to find files with umlauts Markus Triska
2009-04-09 17:22 ` Eli Zaretskii
@ 2011-07-11 22:02 ` Alp Aker
2011-07-15 20:38 ` Glenn Morris
2019-11-02 6:12 ` Stefan Kangas
2 siblings, 1 reply; 18+ messages in thread
From: Alp Aker @ 2011-07-11 22:02 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 2940, markus.triska
Eli Zaretskii wrote:
>>> (require 'ucs-normalize)
>>> (setq file-name-coding-system 'utf-8-hfs)
>
> It could be that Emacs should do this on that platform automatically,
> yes. But some Darwin expert should look into this and provide feedback,
> before we decide.
I'm no expert, but it doesn't look as if this is necessary.
/lisp/term/ns-win.el already defines a coding system utf-8-nfd that
performs normalization and it sets that as the value of
file-name-coding-system. This takes care of the fact that the HFS+
filesystem uses decomposed file names, and indeed I can't reproduce (in
either 24.0.50 or 23.3) the behavior described in the original bug report.
OTOH, the code in question has been present in ns-win.el since the NS code
was first merged into the main branch (rev 89434), so I'm not sure how the
OP's problem arose in the first place.
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2011-07-11 22:02 ` Alp Aker
@ 2011-07-15 20:38 ` Glenn Morris
2011-07-16 17:38 ` Alp Aker
0 siblings, 1 reply; 18+ messages in thread
From: Glenn Morris @ 2011-07-15 20:38 UTC (permalink / raw)
To: Alp Aker; +Cc: 2940, markus.triska
Alp Aker wrote:
> OTOH, the code in question has been present in ns-win.el since the NS
> code was first merged into the main branch (rev 89434), so I'm not
> sure how the OP's problem arose in the first place.
IIUC, he's not using a --with-ns build. It's a "normal", gtk build that
happens to be running on a Mac. So ns-win.el isn't in use.
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2011-07-15 20:38 ` Glenn Morris
@ 2011-07-16 17:38 ` Alp Aker
0 siblings, 0 replies; 18+ messages in thread
From: Alp Aker @ 2011-07-16 17:38 UTC (permalink / raw)
To: Glenn Morris; +Cc: 2940, markus.triska
Glenn Morris wrote:
> IIUC, he's not using a --with-ns build. It's a "normal", gtk build that
> happens to be running on a Mac. So ns-win.el isn't in use.
My mistake; since it was running on Darwin I just assumed an NS build, and
didn't look at the build info in the original bug report.
Making this the default behavior for non-NS builds running on a Mac is
probably TRT. It was once possible to use Darwin with UFS, but that
hasn't been true for the last three major versions, so going forward it
will be a vanishingly rare case where (eq system-type 'darwin) doesn't
imply that the file system is a variant of HFS+. And it's reasonable for
users to expect that Emacs will, out of the box, properly handle file
names on the system it was built on.
OTOH, just adding something like:
(when (eq system-type 'darwin)
(require 'ucs-normalize)
(setq file-name-coding-system 'utf-8-hfs))
to x-win.el might not be the best solution. The utf-8-hfs coding system
does both post-read conversion (normalizing to precomposed utf-8) and
pre-write conversion (normalizing to Apple's variant of decomposed utf-8).
The latter is unnecessary: the OS itself will do normalization on any
filename handed to it. (Observe that the coding system defined in
ns-win.el only does post-read conversion.)
For local operations, the redundant pre-write conversion is harmless.
But using decomposed utf-8 might cause trouble when dealing with remote
files. So it's probably more robust to follow ns-win.el's lead and define
a coding system that only does post-read conversion. Thus:
(when (eq system-type 'darwin)
(require 'ucs-normalize)
(define-coding-system 'utf-8-hfs-for-read
"UTF-8 based coding system for HFS+ file names."
:coding-type 'utf-8
:mnemonic ?U
:charset-list '(unicode)
:post-read-conversion 'ucs-normalize-hfs-nfd-post-read-conversion)
(setq file-name-coding-system 'utf-8-hfs-for-read))
would be the addition to make to x-win.el.
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2009-04-09 16:28 bug#2940: 23.0.92; C-s in dired fails to find files with umlauts Markus Triska
2009-04-09 17:22 ` Eli Zaretskii
2011-07-11 22:02 ` Alp Aker
@ 2019-11-02 6:12 ` Stefan Kangas
2019-11-02 9:17 ` bug#2940: Aw: " Markus Triska
2 siblings, 1 reply; 18+ messages in thread
From: Stefan Kangas @ 2019-11-02 6:12 UTC (permalink / raw)
To: Markus Triska; +Cc: 2940
Markus Triska <markus.triska@gmx.at> writes:
> With ~/töst.txt existing, when I do:
>
> $ emacs -Q ~/
>
> and press:
>
> C-\ german-postfix RET C-s oe RET
>
> to search for "ö" in the dired buffer (the input method correctly
> converts the entered "oe" to "ö" in the minibuffer), I get:
>
> Failing wrapped I-search [DE<]: ö
I can't reproduce this on current master. Are you still seeing this
on a modern version of Emacs?
If I don't hear back from you within a couple of weeks, I'll just
close this bug as unreproducible.
Best regards,
Stefan Kangas
^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <mailman.4984.1239295440.31690.bug-gnu-emacs@gnu.org>]
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
[not found] <mailman.4984.1239295440.31690.bug-gnu-emacs@gnu.org>
@ 2009-04-10 2:08 ` Miles Bader
2009-04-10 5:53 ` Stefan Monnier
[not found] ` <mailman.5035.1239343440.31690.bug-gnu-emacs@gnu.org>
0 siblings, 2 replies; 18+ messages in thread
From: Miles Bader @ 2009-04-10 2:08 UTC (permalink / raw)
To: Markus Triska; +Cc: emacs-pretest-bug, 2940
It looks to me like the problem is that you're on a mac, and [some?] mac
filesystems silently convert accented characters in filenames to
"composed form", which is different than the pre-composed characters
people tend to use.
Perhaps the new "ucs-normalize" code (which should be added soon I
think) would help:
> The attached is an Unicode normalization tool contributed by
> Kawabata-san. It performs all the Unicode normalization
> NFC/NFD/NFKD/NFKC, and provides a coding system utf-8-hfs
> that is suitable to be used for Mac OS 8.1's file names.
[Search for "normalize.el" on recent emacs-devel messages]
-Miles
--
A zen-buddhist walked into a pizza shop and
said, "Make me one with everything."
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2009-04-10 2:08 ` bug#2940: " Miles Bader
@ 2009-04-10 5:53 ` Stefan Monnier
[not found] ` <mailman.5035.1239343440.31690.bug-gnu-emacs@gnu.org>
1 sibling, 0 replies; 18+ messages in thread
From: Stefan Monnier @ 2009-04-10 5:53 UTC (permalink / raw)
To: Miles Bader; +Cc: emacs-pretest-bug, Markus Triska, 2940
> It looks to me like the problem is that you're on a mac, and [some?]
> mac filesystems silently convert accented characters in filenames to
> "composed form", which is different than the pre-composed characters
> people tend to use.
Indeed, that looks like the culprit (IIUC it's not done by the
filesystem, but by the OS itself before it passes the file names to the
filesystem, so it applies to all filesystems).
> Perhaps the new "ucs-normalize" code (which should be added soon I
> think) would help:
Rather than "perhaps", it should say "supposedly". Please try out this
new ucs-normalize package and tell us if it solves your problem and/or
suffers from other problems. It likely won't make it for Emacs-23.1 but
should be included in Emacs-23.2.
Stefan
^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <mailman.5035.1239343440.31690.bug-gnu-emacs@gnu.org>]
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
[not found] ` <mailman.5035.1239343440.31690.bug-gnu-emacs@gnu.org>
@ 2009-04-10 10:48 ` Markus Triska
2009-04-10 11:34 ` Kenichi Handa
2009-08-13 12:25 ` Kenichi Handa
0 siblings, 2 replies; 18+ messages in thread
From: Markus Triska @ 2009-04-10 10:48 UTC (permalink / raw)
To: gnu-emacs-bug
Stefan Monnier <monnier@iro.umontreal.ca> writes:
> Please try out this new ucs-normalize package and tell us if it solves
> your problem and/or suffers from other problems.
M-x eval-buffer on the (ucs-)normalize.el file posted at:
http://lists.gnu.org/archive/html/emacs-devel/2009-04/msg00185.html
yields:
End of file during parsing
After I insert an additional closing parenthesis on line 128 (after the
(defconst ucs-normalize-composition-exclusions ...), it yields:
Symbol's value as variable is void: in
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2009-04-10 10:48 ` Markus Triska
@ 2009-04-10 11:34 ` Kenichi Handa
2009-08-13 12:25 ` Kenichi Handa
1 sibling, 0 replies; 18+ messages in thread
From: Kenichi Handa @ 2009-04-10 11:34 UTC (permalink / raw)
To: Markus Triska, 2940; +Cc: gnu-emacs-bug
In article <m27i1syebo.fsf@gmx.at>, Markus Triska <markus.triska@gmx.at> writes:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > Please try out this new ucs-normalize package and tell us if it solves
> > your problem and/or suffers from other problems.
> M-x eval-buffer on the (ucs-)normalize.el file posted at:
> http://lists.gnu.org/archive/html/emacs-devel/2009-04/msg00185.html
> yields:
> End of file during parsing
The above page puts extra ";" at line 127. Why does that happen?
Anyway, it seems that the posted ucs-normalize.el (and the
original normalize.el) has a bug. I'm now asking
Kawabata-san to fix it.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2009-04-10 10:48 ` Markus Triska
2009-04-10 11:34 ` Kenichi Handa
@ 2009-08-13 12:25 ` Kenichi Handa
2011-07-10 18:26 ` Glenn Morris
1 sibling, 1 reply; 18+ messages in thread
From: Kenichi Handa @ 2009-08-13 12:25 UTC (permalink / raw)
To: Markus Triska, 2940; +Cc: gnu-emacs-bug
In article <m27i1syebo.fsf@gmx.at>, Markus Triska <markus.triska@gmx.at> writes:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > Please try out this new ucs-normalize package and tell us if it solves
> > your problem and/or suffers from other problems.
> M-x eval-buffer on the (ucs-)normalize.el file posted at:
> http://lists.gnu.org/archive/html/emacs-devel/2009-04/msg00185.html
> yields:
> End of file during parsing
I've just committed a new version of ucs-normalize.el.
Could you please try it?
By the way, it contains several autoload cookies. Should I
re-generate loaddefs.el, copy it to ldefs-boot.el, and
commit it?
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2009-08-13 12:25 ` Kenichi Handa
@ 2011-07-10 18:26 ` Glenn Morris
2011-07-10 21:11 ` Markus Triska
0 siblings, 1 reply; 18+ messages in thread
From: Glenn Morris @ 2011-07-10 18:26 UTC (permalink / raw)
To: Markus Triska; +Cc: 2940
It was suggested that the ucs-normalize library, which has been part of
Emacs for some time, should fix this. Does it? Please reply and let us
know if you still see this problem in 23.3.
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2011-07-10 18:26 ` Glenn Morris
@ 2011-07-10 21:11 ` Markus Triska
2011-07-11 2:01 ` Glenn Morris
0 siblings, 1 reply; 18+ messages in thread
From: Markus Triska @ 2011-07-10 21:11 UTC (permalink / raw)
To: Glenn Morris; +Cc: 2940
Glenn Morris <rgm@gnu.org> writes:
> It was suggested that the ucs-normalize library, which has been part of
> Emacs for some time, should fix this. Does it?
Thank you, it does if I add this to ~/.emacs:
(require 'ucs-normalize)
(setq file-name-coding-system 'utf-8-hfs)
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2011-07-10 21:11 ` Markus Triska
@ 2011-07-11 2:01 ` Glenn Morris
2011-07-11 2:56 ` Eli Zaretskii
2011-07-11 16:21 ` Markus Triska
0 siblings, 2 replies; 18+ messages in thread
From: Glenn Morris @ 2011-07-11 2:01 UTC (permalink / raw)
To: Markus Triska; +Cc: 2940
Markus Triska wrote:
> Thank you, it does if I add this to ~/.emacs:
>
> (require 'ucs-normalize)
> (setq file-name-coding-system 'utf-8-hfs)
I know nothing about this area: Is this an acceptable solution (in which
case I will close this report); or should it work out-of-the-box with no
configuration?
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2011-07-11 2:01 ` Glenn Morris
@ 2011-07-11 2:56 ` Eli Zaretskii
2011-07-11 16:21 ` Markus Triska
1 sibling, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2011-07-11 2:56 UTC (permalink / raw)
To: Glenn Morris; +Cc: 2940, markus.triska
> From: Glenn Morris <rgm@gnu.org>
> Date: Sun, 10 Jul 2011 22:01:44 -0400
> Cc: 2940@debbugs.gnu.org
>
> Markus Triska wrote:
>
> > Thank you, it does if I add this to ~/.emacs:
> >
> > (require 'ucs-normalize)
> > (setq file-name-coding-system 'utf-8-hfs)
>
> I know nothing about this area: Is this an acceptable solution (in which
> case I will close this report); or should it work out-of-the-box with no
> configuration?
It could be that Emacs should do this on that platform automatically,
yes. But some Darwin expert should look into this and provide
feedback, before we decide.
^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
2011-07-11 2:01 ` Glenn Morris
2011-07-11 2:56 ` Eli Zaretskii
@ 2011-07-11 16:21 ` Markus Triska
1 sibling, 0 replies; 18+ messages in thread
From: Markus Triska @ 2011-07-11 16:21 UTC (permalink / raw)
To: Glenn Morris; +Cc: 2940
Glenn Morris <rgm@gnu.org> writes:
>> (require 'ucs-normalize)
>> (setq file-name-coding-system 'utf-8-hfs)
>
> I know nothing about this area: Is this an acceptable solution (in
> which case I will close this report); or should it work out-of-the-box
> with no configuration?
In my personal use, I expected it to work out of the box. It was also
initially unclear to me that you need to explicitly require
ucs-normalize in order to use utf8-8-hfs. When you do, in "emacs -Q":
(setq file-name-coding-system 'utf-8-hfs)
you cannot do much anymore with that Emacs instance, since you get
"Invalid coding system: utf-8-hfs" on almost all key presses. This is a
general issue though that also happens when you mistype a coding system.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-11-02 9:17 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-09 16:28 bug#2940: 23.0.92; C-s in dired fails to find files with umlauts Markus Triska
2009-04-09 17:22 ` Eli Zaretskii
2009-04-09 17:33 ` Markus Triska
2011-07-11 22:02 ` Alp Aker
2011-07-15 20:38 ` Glenn Morris
2011-07-16 17:38 ` Alp Aker
2019-11-02 6:12 ` Stefan Kangas
2019-11-02 9:17 ` bug#2940: Aw: " Markus Triska
[not found] <mailman.4984.1239295440.31690.bug-gnu-emacs@gnu.org>
2009-04-10 2:08 ` bug#2940: " Miles Bader
2009-04-10 5:53 ` Stefan Monnier
[not found] ` <mailman.5035.1239343440.31690.bug-gnu-emacs@gnu.org>
2009-04-10 10:48 ` Markus Triska
2009-04-10 11:34 ` Kenichi Handa
2009-08-13 12:25 ` Kenichi Handa
2011-07-10 18:26 ` Glenn Morris
2011-07-10 21:11 ` Markus Triska
2011-07-11 2:01 ` Glenn Morris
2011-07-11 2:56 ` Eli Zaretskii
2011-07-11 16:21 ` Markus Triska
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).