unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
@ 2009-04-09 16:28 Markus Triska
  2009-04-09 17:22 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Markus Triska @ 2009-04-09 16:28 UTC (permalink / raw)
  To: emacs-pretest-bug


With ~/töst.txt existing, when I do:

   $ emacs -Q ~/

and press:

   C-\ german-postfix RET C-s oe RET

to search for "ö" in the dired buffer (the input method correctly
converts the entered "oe" to "ö" in the minibuffer), I get:

   Failing wrapped I-search [DE<]: ö

C-u C-x = on the "ö" in the dired buffer yields:

             character: o (111, #o157, #x6f)
     preferred charset: ascii (ASCII (ISO646 IRV))
            code point: 0x6F
                syntax: w 	which means: word
              category: .:Base, a:ASCII, l:Latin, r:Roman
           buffer code: #x6F
             file code: #x6F (encoded by coding system utf-8-unix)
               display: composed to form "ö" (see below)

     Composed with the following character(s) "̈" using this font:
       xft:-unknown-Cochin-normal-normal-normal-*-20-*-*-*-*-0-iso10646-1
     by these glyphs:
       [0 1 111 82 11 1 10 8 0 nil]
       [0 1 776 235 6 0 6 12 -9 [-9 -1 0]]

     Character code properties: customize what to show
       name: LATIN SMALL LETTER O
       general-category: Ll (Letter, Lowercase)

     There are text properties here:
       dired-filename       t
       fontified            t
       help-echo            "mouse-2: visit this file in other window"
       mouse-face           highlight

C-u C-x = on the first "t" in "töst.txt" yields:

             character: t (116, #o164, #x74)
     preferred charset: ascii (ASCII (ISO646 IRV))
            code point: 0x74
                syntax: w 	which means: word
              category: .:Base, a:ASCII, l:Latin, r:Roman
           buffer code: #x74
             file code: #x74 (encoded by coding system utf-8-unix)
               display: by this font (glyph code)
         xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-20-*-*-*-m-0-iso10646-1 (#x57)

     Character code properties: customize what to show
       name: LATIN SMALL LETTER T
       general-category: Ll (Letter, Lowercase)

     There are text properties here:
       dired-filename       t
       fontified            t
       help-echo            "mouse-2: visit this file in other window"
       mouse-face           highlight


In GNU Emacs 23.0.92.3 (i386-apple-darwin9.6.1, GTK+ Version 2.14.7)
 of 2009-04-09 on mt-imac.local
Windowing system distributor `The X.Org Foundation', version 11.0.10402000
configured using `configure  '--with-tiff=no''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t







^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-09 16:28 Markus Triska
@ 2009-04-09 17:22 ` Eli Zaretskii
  2009-04-09 17:33   ` Markus Triska
  2011-07-11 22:02 ` Alp Aker
  2019-11-02  6:12 ` Stefan Kangas
  2 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2009-04-09 17:22 UTC (permalink / raw)
  To: Markus Triska, 2940

> From: Markus Triska <markus.triska@gmx.at>
> Date: Thu, 09 Apr 2009 18:28:48 +0200
> Cc: 
> 
> 
> With ~/töst.txt existing, when I do:
> 
>    $ emacs -Q ~/
> 
> and press:
> 
>    C-\ german-postfix RET C-s oe RET
> 
> to search for "ö" in the dired buffer (the input method correctly
> converts the entered "oe" to "ö" in the minibuffer), I get:
> 
>    Failing wrapped I-search [DE<]: ö

What's your value of file-name-coding-system?  Does it help to say

  C-x RET c utf-8 RET C-x d

instead of just "C-x d"?






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-09 17:22 ` Eli Zaretskii
@ 2009-04-09 17:33   ` Markus Triska
  0 siblings, 0 replies; 17+ messages in thread
From: Markus Triska @ 2009-04-09 17:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 2940

Eli Zaretskii <eliz@gnu.org> writes:

> What's your value of file-name-coding-system?

It is nil, and default-file-name-coding-system is 'utf-8.

> Does it help to say
>
>   C-x RET c utf-8 RET C-x d
>
> instead of just "C-x d"?

No, unfortunately not. Also for C-s it does not seem to make a
difference. When I enter an "ö" in *scratch*, C-u C-x = on it says:

              character: ö (246, #o366, #xf6)
      preferred charset: unicode (Unicode (ISO10646))
             code point: 0xF6
                 syntax: w 	which means: word
               category: .:Base, j:Japanese, l:Latin
               to input: type "oe" with german-postfix
            buffer code: #xC3 #xB6
              file code: #xC3 #xB6 (encoded by coding system utf-8-unix)
                display: by this font (glyph code)
          xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-20-*-*-*-m-0-iso10646-1 (#x7C)

      Character code properties: customize what to show
        name: LATIN SMALL LETTER O WITH DIAERESIS
        old-name: LATIN SMALL LETTER O DIAERESIS
        general-category: Ll (Letter, Lowercase)
        decomposition: (111 776) ('o' '̈')

      There are text properties here:
        fontified            t

This "ö" is thus also rendered with the expected font, in contrast to
the one in dired.






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
       [not found] <mailman.4984.1239295440.31690.bug-gnu-emacs@gnu.org>
@ 2009-04-10  2:08 ` Miles Bader
  2009-04-10  5:53   ` Stefan Monnier
       [not found]   ` <mailman.5035.1239343440.31690.bug-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 17+ messages in thread
From: Miles Bader @ 2009-04-10  2:08 UTC (permalink / raw)
  To: Markus Triska; +Cc: emacs-pretest-bug, 2940

It looks to me like the problem is that you're on a mac, and [some?] mac
filesystems silently convert accented characters in filenames to
"composed form", which is different than the pre-composed characters
people tend to use.

Perhaps the new "ucs-normalize" code (which should be added soon I
think) would help:

> The attached is an Unicode normalization tool contributed by
> Kawabata-san.  It performs all the Unicode normalization
> NFC/NFD/NFKD/NFKC, and provides a coding system utf-8-hfs
> that is suitable to be used for Mac OS 8.1's file names.

[Search for "normalize.el" on recent emacs-devel messages]

-Miles

-- 
A zen-buddhist walked into a pizza shop and
said, "Make me one with everything."






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-10  2:08 ` bug#2940: 23.0.92; C-s in dired fails to find files with umlauts Miles Bader
@ 2009-04-10  5:53   ` Stefan Monnier
       [not found]   ` <mailman.5035.1239343440.31690.bug-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Monnier @ 2009-04-10  5:53 UTC (permalink / raw)
  To: Miles Bader; +Cc: emacs-pretest-bug, Markus Triska, 2940

> It looks to me like the problem is that you're on a mac, and [some?]
> mac filesystems silently convert accented characters in filenames to
> "composed form", which is different than the pre-composed characters
> people tend to use.

Indeed, that looks like the culprit (IIUC it's not done by the
filesystem, but by the OS itself before it passes the file names to the
filesystem, so it applies to all filesystems).

> Perhaps the new "ucs-normalize" code (which should be added soon I
> think) would help:

Rather than "perhaps", it should say "supposedly".   Please try out this
new ucs-normalize package and tell us if it solves your problem and/or
suffers from other problems.  It likely won't make it for Emacs-23.1 but
should be included in Emacs-23.2.


        Stefan






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
       [not found]   ` <mailman.5035.1239343440.31690.bug-gnu-emacs@gnu.org>
@ 2009-04-10 10:48     ` Markus Triska
  2009-04-10 11:34       ` Kenichi Handa
  2009-08-13 12:25       ` Kenichi Handa
  0 siblings, 2 replies; 17+ messages in thread
From: Markus Triska @ 2009-04-10 10:48 UTC (permalink / raw)
  To: gnu-emacs-bug

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Please try out this new ucs-normalize package and tell us if it solves
> your problem and/or suffers from other problems.

M-x eval-buffer on the (ucs-)normalize.el file posted at:

   http://lists.gnu.org/archive/html/emacs-devel/2009-04/msg00185.html

yields:

   End of file during parsing

After I insert an additional closing parenthesis on line 128 (after the
(defconst ucs-normalize-composition-exclusions ...), it yields:

   Symbol's value as variable is void: in








^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-10 10:48     ` Markus Triska
@ 2009-04-10 11:34       ` Kenichi Handa
  2009-08-13 12:25       ` Kenichi Handa
  1 sibling, 0 replies; 17+ messages in thread
From: Kenichi Handa @ 2009-04-10 11:34 UTC (permalink / raw)
  To: Markus Triska, 2940; +Cc: gnu-emacs-bug

In article <m27i1syebo.fsf@gmx.at>, Markus Triska <markus.triska@gmx.at> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > Please try out this new ucs-normalize package and tell us if it solves
> > your problem and/or suffers from other problems.

> M-x eval-buffer on the (ucs-)normalize.el file posted at:

>    http://lists.gnu.org/archive/html/emacs-devel/2009-04/msg00185.html

> yields:

>    End of file during parsing

The above page puts extra ";" at line 127.  Why does that happen?

Anyway, it seems that the posted ucs-normalize.el (and the
original normalize.el) has a bug.  I'm now asking
Kawabata-san to fix it.

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-10 10:48     ` Markus Triska
  2009-04-10 11:34       ` Kenichi Handa
@ 2009-08-13 12:25       ` Kenichi Handa
  2011-07-10 18:26         ` Glenn Morris
  1 sibling, 1 reply; 17+ messages in thread
From: Kenichi Handa @ 2009-08-13 12:25 UTC (permalink / raw)
  To: Markus Triska, 2940; +Cc: gnu-emacs-bug

In article <m27i1syebo.fsf@gmx.at>, Markus Triska <markus.triska@gmx.at> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > Please try out this new ucs-normalize package and tell us if it solves
> > your problem and/or suffers from other problems.

> M-x eval-buffer on the (ucs-)normalize.el file posted at:

>    http://lists.gnu.org/archive/html/emacs-devel/2009-04/msg00185.html

> yields:

>    End of file during parsing

I've just committed a new version of ucs-normalize.el.
Could you please try it?

By the way, it contains several autoload cookies.  Should I
re-generate loaddefs.el, copy it to ldefs-boot.el, and
commit it?

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-08-13 12:25       ` Kenichi Handa
@ 2011-07-10 18:26         ` Glenn Morris
  2011-07-10 21:11           ` Markus Triska
  0 siblings, 1 reply; 17+ messages in thread
From: Glenn Morris @ 2011-07-10 18:26 UTC (permalink / raw)
  To: Markus Triska; +Cc: 2940


It was suggested that the ucs-normalize library, which has been part of
Emacs for some time, should fix this. Does it? Please reply and let us
know if you still see this problem in 23.3.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2011-07-10 18:26         ` Glenn Morris
@ 2011-07-10 21:11           ` Markus Triska
  2011-07-11  2:01             ` Glenn Morris
  0 siblings, 1 reply; 17+ messages in thread
From: Markus Triska @ 2011-07-10 21:11 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 2940

Glenn Morris <rgm@gnu.org> writes:

> It was suggested that the ucs-normalize library, which has been part of
> Emacs for some time, should fix this. Does it?

Thank you, it does if I add this to ~/.emacs:

   (require 'ucs-normalize)
   (setq file-name-coding-system 'utf-8-hfs)





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2011-07-10 21:11           ` Markus Triska
@ 2011-07-11  2:01             ` Glenn Morris
  2011-07-11  2:56               ` Eli Zaretskii
  2011-07-11 16:21               ` Markus Triska
  0 siblings, 2 replies; 17+ messages in thread
From: Glenn Morris @ 2011-07-11  2:01 UTC (permalink / raw)
  To: Markus Triska; +Cc: 2940

Markus Triska wrote:

> Thank you, it does if I add this to ~/.emacs:
>
>    (require 'ucs-normalize)
>    (setq file-name-coding-system 'utf-8-hfs)

I know nothing about this area: Is this an acceptable solution (in which
case I will close this report); or should it work out-of-the-box with no
configuration?





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2011-07-11  2:01             ` Glenn Morris
@ 2011-07-11  2:56               ` Eli Zaretskii
  2011-07-11 16:21               ` Markus Triska
  1 sibling, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2011-07-11  2:56 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 2940, markus.triska

> From: Glenn Morris <rgm@gnu.org>
> Date: Sun, 10 Jul 2011 22:01:44 -0400
> Cc: 2940@debbugs.gnu.org
> 
> Markus Triska wrote:
> 
> > Thank you, it does if I add this to ~/.emacs:
> >
> >    (require 'ucs-normalize)
> >    (setq file-name-coding-system 'utf-8-hfs)
> 
> I know nothing about this area: Is this an acceptable solution (in which
> case I will close this report); or should it work out-of-the-box with no
> configuration?

It could be that Emacs should do this on that platform automatically,
yes.  But some Darwin expert should look into this and provide
feedback, before we decide.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2011-07-11  2:01             ` Glenn Morris
  2011-07-11  2:56               ` Eli Zaretskii
@ 2011-07-11 16:21               ` Markus Triska
  1 sibling, 0 replies; 17+ messages in thread
From: Markus Triska @ 2011-07-11 16:21 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 2940

Glenn Morris <rgm@gnu.org> writes:

>>    (require 'ucs-normalize)
>>    (setq file-name-coding-system 'utf-8-hfs)
>
> I know nothing about this area: Is this an acceptable solution (in
> which case I will close this report); or should it work out-of-the-box
> with no configuration?

In my personal use, I expected it to work out of the box. It was also
initially unclear to me that you need to explicitly require
ucs-normalize in order to use utf8-8-hfs. When you do, in "emacs -Q":

   (setq file-name-coding-system 'utf-8-hfs)

you cannot do much anymore with that Emacs instance, since you get
"Invalid coding system: utf-8-hfs" on almost all key presses. This is a
general issue though that also happens when you mistype a coding system.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-09 16:28 Markus Triska
  2009-04-09 17:22 ` Eli Zaretskii
@ 2011-07-11 22:02 ` Alp Aker
  2011-07-15 20:38   ` Glenn Morris
  2019-11-02  6:12 ` Stefan Kangas
  2 siblings, 1 reply; 17+ messages in thread
From: Alp Aker @ 2011-07-11 22:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 2940, markus.triska

Eli Zaretskii wrote:

>>>    (require 'ucs-normalize)
>>>    (setq file-name-coding-system 'utf-8-hfs)
>
> It could be that Emacs should do this on that platform automatically, 
> yes.  But some Darwin expert should look into this and provide feedback, 
> before we decide.

I'm no expert, but it doesn't look as if this is necessary. 
/lisp/term/ns-win.el already defines a coding system utf-8-nfd that 
performs normalization and it sets that as the value of 
file-name-coding-system.  This takes care of the fact that the HFS+ 
filesystem uses decomposed file names, and indeed I can't reproduce (in 
either 24.0.50 or 23.3) the behavior described in the original bug report.

OTOH, the code in question has been present in ns-win.el since the NS code 
was first merged into the main branch (rev 89434), so I'm not sure how the 
OP's problem arose in the first place.






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2011-07-11 22:02 ` Alp Aker
@ 2011-07-15 20:38   ` Glenn Morris
  2011-07-16 17:38     ` Alp Aker
  0 siblings, 1 reply; 17+ messages in thread
From: Glenn Morris @ 2011-07-15 20:38 UTC (permalink / raw)
  To: Alp Aker; +Cc: 2940, markus.triska

Alp Aker wrote:

> OTOH, the code in question has been present in ns-win.el since the NS
> code was first merged into the main branch (rev 89434), so I'm not
> sure how the OP's problem arose in the first place.

IIUC, he's not using a --with-ns build. It's a "normal", gtk build that
happens to be running on a Mac. So ns-win.el isn't in use.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2011-07-15 20:38   ` Glenn Morris
@ 2011-07-16 17:38     ` Alp Aker
  0 siblings, 0 replies; 17+ messages in thread
From: Alp Aker @ 2011-07-16 17:38 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 2940, markus.triska

Glenn Morris wrote:

> IIUC, he's not using a --with-ns build. It's a "normal", gtk build that
> happens to be running on a Mac. So ns-win.el isn't in use.

My mistake; since it was running on Darwin I just assumed an NS build, and 
didn't look at the build info in the original bug report.

Making this the default behavior for non-NS builds running on a Mac is 
probably TRT.  It was once possible to use Darwin with UFS, but that 
hasn't been true for the last three major versions, so going forward it 
will be a vanishingly rare case where (eq system-type 'darwin) doesn't 
imply that the file system is a variant of HFS+.  And it's reasonable for 
users to expect that Emacs will, out of the box, properly handle file 
names on the system it was built on.

OTOH, just adding something like:

  (when (eq system-type 'darwin)
     (require 'ucs-normalize)
     (setq file-name-coding-system 'utf-8-hfs))

to x-win.el might not be the best solution.  The utf-8-hfs coding system 
does both post-read conversion (normalizing to precomposed utf-8) and 
pre-write conversion (normalizing to Apple's variant of decomposed utf-8). 
The latter is unnecessary:  the OS itself will do normalization on any 
filename handed to it.  (Observe that the coding system defined in 
ns-win.el only does post-read conversion.)

For local operations, the redundant pre-write conversion is harmless. 
But using decomposed utf-8 might cause trouble when dealing with remote 
files.  So it's probably more robust to follow ns-win.el's lead and define 
a coding system that only does post-read conversion.  Thus:

   (when (eq system-type 'darwin)
     (require 'ucs-normalize)
     (define-coding-system 'utf-8-hfs-for-read
       "UTF-8 based coding system for HFS+ file names."
       :coding-type 'utf-8
       :mnemonic ?U
       :charset-list '(unicode)
       :post-read-conversion 'ucs-normalize-hfs-nfd-post-read-conversion)
     (setq file-name-coding-system 'utf-8-hfs-for-read))

would be the addition to make to x-win.el.






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-09 16:28 Markus Triska
  2009-04-09 17:22 ` Eli Zaretskii
  2011-07-11 22:02 ` Alp Aker
@ 2019-11-02  6:12 ` Stefan Kangas
  2 siblings, 0 replies; 17+ messages in thread
From: Stefan Kangas @ 2019-11-02  6:12 UTC (permalink / raw)
  To: Markus Triska; +Cc: 2940

Markus Triska <markus.triska@gmx.at> writes:

> With ~/töst.txt existing, when I do:
>
>    $ emacs -Q ~/
>
> and press:
>
>    C-\ german-postfix RET C-s oe RET
>
> to search for "ö" in the dired buffer (the input method correctly
> converts the entered "oe" to "ö" in the minibuffer), I get:
>
>    Failing wrapped I-search [DE<]: ö

I can't reproduce this on current master.  Are you still seeing this
on a modern version of Emacs?

If I don't hear back from you within a couple of weeks, I'll just
close this bug as unreproducible.

Best regards,
Stefan Kangas





^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-11-02  6:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.4984.1239295440.31690.bug-gnu-emacs@gnu.org>
2009-04-10  2:08 ` bug#2940: 23.0.92; C-s in dired fails to find files with umlauts Miles Bader
2009-04-10  5:53   ` Stefan Monnier
     [not found]   ` <mailman.5035.1239343440.31690.bug-gnu-emacs@gnu.org>
2009-04-10 10:48     ` Markus Triska
2009-04-10 11:34       ` Kenichi Handa
2009-08-13 12:25       ` Kenichi Handa
2011-07-10 18:26         ` Glenn Morris
2011-07-10 21:11           ` Markus Triska
2011-07-11  2:01             ` Glenn Morris
2011-07-11  2:56               ` Eli Zaretskii
2011-07-11 16:21               ` Markus Triska
2009-04-09 16:28 Markus Triska
2009-04-09 17:22 ` Eli Zaretskii
2009-04-09 17:33   ` Markus Triska
2011-07-11 22:02 ` Alp Aker
2011-07-15 20:38   ` Glenn Morris
2011-07-16 17:38     ` Alp Aker
2019-11-02  6:12 ` Stefan Kangas

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).