unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
@ 2009-04-09 16:28 Markus Triska
  2009-04-09 17:22 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Markus Triska @ 2009-04-09 16:28 UTC (permalink / raw)
  To: emacs-pretest-bug


With ~/töst.txt existing, when I do:

   $ emacs -Q ~/

and press:

   C-\ german-postfix RET C-s oe RET

to search for "ö" in the dired buffer (the input method correctly
converts the entered "oe" to "ö" in the minibuffer), I get:

   Failing wrapped I-search [DE<]: ö

C-u C-x = on the "ö" in the dired buffer yields:

             character: o (111, #o157, #x6f)
     preferred charset: ascii (ASCII (ISO646 IRV))
            code point: 0x6F
                syntax: w 	which means: word
              category: .:Base, a:ASCII, l:Latin, r:Roman
           buffer code: #x6F
             file code: #x6F (encoded by coding system utf-8-unix)
               display: composed to form "ö" (see below)

     Composed with the following character(s) "̈" using this font:
       xft:-unknown-Cochin-normal-normal-normal-*-20-*-*-*-*-0-iso10646-1
     by these glyphs:
       [0 1 111 82 11 1 10 8 0 nil]
       [0 1 776 235 6 0 6 12 -9 [-9 -1 0]]

     Character code properties: customize what to show
       name: LATIN SMALL LETTER O
       general-category: Ll (Letter, Lowercase)

     There are text properties here:
       dired-filename       t
       fontified            t
       help-echo            "mouse-2: visit this file in other window"
       mouse-face           highlight

C-u C-x = on the first "t" in "töst.txt" yields:

             character: t (116, #o164, #x74)
     preferred charset: ascii (ASCII (ISO646 IRV))
            code point: 0x74
                syntax: w 	which means: word
              category: .:Base, a:ASCII, l:Latin, r:Roman
           buffer code: #x74
             file code: #x74 (encoded by coding system utf-8-unix)
               display: by this font (glyph code)
         xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-20-*-*-*-m-0-iso10646-1 (#x57)

     Character code properties: customize what to show
       name: LATIN SMALL LETTER T
       general-category: Ll (Letter, Lowercase)

     There are text properties here:
       dired-filename       t
       fontified            t
       help-echo            "mouse-2: visit this file in other window"
       mouse-face           highlight


In GNU Emacs 23.0.92.3 (i386-apple-darwin9.6.1, GTK+ Version 2.14.7)
 of 2009-04-09 on mt-imac.local
Windowing system distributor `The X.Org Foundation', version 11.0.10402000
configured using `configure  '--with-tiff=no''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t







^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-09 16:28 bug#2940: 23.0.92; C-s in dired fails to find files with umlauts Markus Triska
@ 2009-04-09 17:22 ` Eli Zaretskii
  2009-04-09 17:33   ` Markus Triska
  2011-07-11 22:02 ` Alp Aker
  2019-11-02  6:12 ` Stefan Kangas
  2 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2009-04-09 17:22 UTC (permalink / raw)
  To: Markus Triska, 2940

> From: Markus Triska <markus.triska@gmx.at>
> Date: Thu, 09 Apr 2009 18:28:48 +0200
> Cc: 
> 
> 
> With ~/töst.txt existing, when I do:
> 
>    $ emacs -Q ~/
> 
> and press:
> 
>    C-\ german-postfix RET C-s oe RET
> 
> to search for "ö" in the dired buffer (the input method correctly
> converts the entered "oe" to "ö" in the minibuffer), I get:
> 
>    Failing wrapped I-search [DE<]: ö

What's your value of file-name-coding-system?  Does it help to say

  C-x RET c utf-8 RET C-x d

instead of just "C-x d"?






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-09 17:22 ` Eli Zaretskii
@ 2009-04-09 17:33   ` Markus Triska
  0 siblings, 0 replies; 8+ messages in thread
From: Markus Triska @ 2009-04-09 17:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 2940

Eli Zaretskii <eliz@gnu.org> writes:

> What's your value of file-name-coding-system?

It is nil, and default-file-name-coding-system is 'utf-8.

> Does it help to say
>
>   C-x RET c utf-8 RET C-x d
>
> instead of just "C-x d"?

No, unfortunately not. Also for C-s it does not seem to make a
difference. When I enter an "ö" in *scratch*, C-u C-x = on it says:

              character: ö (246, #o366, #xf6)
      preferred charset: unicode (Unicode (ISO10646))
             code point: 0xF6
                 syntax: w 	which means: word
               category: .:Base, j:Japanese, l:Latin
               to input: type "oe" with german-postfix
            buffer code: #xC3 #xB6
              file code: #xC3 #xB6 (encoded by coding system utf-8-unix)
                display: by this font (glyph code)
          xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-20-*-*-*-m-0-iso10646-1 (#x7C)

      Character code properties: customize what to show
        name: LATIN SMALL LETTER O WITH DIAERESIS
        old-name: LATIN SMALL LETTER O DIAERESIS
        general-category: Ll (Letter, Lowercase)
        decomposition: (111 776) ('o' '̈')

      There are text properties here:
        fontified            t

This "ö" is thus also rendered with the expected font, in contrast to
the one in dired.






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-09 16:28 bug#2940: 23.0.92; C-s in dired fails to find files with umlauts Markus Triska
  2009-04-09 17:22 ` Eli Zaretskii
@ 2011-07-11 22:02 ` Alp Aker
  2011-07-15 20:38   ` Glenn Morris
  2019-11-02  6:12 ` Stefan Kangas
  2 siblings, 1 reply; 8+ messages in thread
From: Alp Aker @ 2011-07-11 22:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 2940, markus.triska

Eli Zaretskii wrote:

>>>    (require 'ucs-normalize)
>>>    (setq file-name-coding-system 'utf-8-hfs)
>
> It could be that Emacs should do this on that platform automatically, 
> yes.  But some Darwin expert should look into this and provide feedback, 
> before we decide.

I'm no expert, but it doesn't look as if this is necessary. 
/lisp/term/ns-win.el already defines a coding system utf-8-nfd that 
performs normalization and it sets that as the value of 
file-name-coding-system.  This takes care of the fact that the HFS+ 
filesystem uses decomposed file names, and indeed I can't reproduce (in 
either 24.0.50 or 23.3) the behavior described in the original bug report.

OTOH, the code in question has been present in ns-win.el since the NS code 
was first merged into the main branch (rev 89434), so I'm not sure how the 
OP's problem arose in the first place.






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2011-07-11 22:02 ` Alp Aker
@ 2011-07-15 20:38   ` Glenn Morris
  2011-07-16 17:38     ` Alp Aker
  0 siblings, 1 reply; 8+ messages in thread
From: Glenn Morris @ 2011-07-15 20:38 UTC (permalink / raw)
  To: Alp Aker; +Cc: 2940, markus.triska

Alp Aker wrote:

> OTOH, the code in question has been present in ns-win.el since the NS
> code was first merged into the main branch (rev 89434), so I'm not
> sure how the OP's problem arose in the first place.

IIUC, he's not using a --with-ns build. It's a "normal", gtk build that
happens to be running on a Mac. So ns-win.el isn't in use.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2011-07-15 20:38   ` Glenn Morris
@ 2011-07-16 17:38     ` Alp Aker
  0 siblings, 0 replies; 8+ messages in thread
From: Alp Aker @ 2011-07-16 17:38 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 2940, markus.triska

Glenn Morris wrote:

> IIUC, he's not using a --with-ns build. It's a "normal", gtk build that
> happens to be running on a Mac. So ns-win.el isn't in use.

My mistake; since it was running on Darwin I just assumed an NS build, and 
didn't look at the build info in the original bug report.

Making this the default behavior for non-NS builds running on a Mac is 
probably TRT.  It was once possible to use Darwin with UFS, but that 
hasn't been true for the last three major versions, so going forward it 
will be a vanishingly rare case where (eq system-type 'darwin) doesn't 
imply that the file system is a variant of HFS+.  And it's reasonable for 
users to expect that Emacs will, out of the box, properly handle file 
names on the system it was built on.

OTOH, just adding something like:

  (when (eq system-type 'darwin)
     (require 'ucs-normalize)
     (setq file-name-coding-system 'utf-8-hfs))

to x-win.el might not be the best solution.  The utf-8-hfs coding system 
does both post-read conversion (normalizing to precomposed utf-8) and 
pre-write conversion (normalizing to Apple's variant of decomposed utf-8). 
The latter is unnecessary:  the OS itself will do normalization on any 
filename handed to it.  (Observe that the coding system defined in 
ns-win.el only does post-read conversion.)

For local operations, the redundant pre-write conversion is harmless. 
But using decomposed utf-8 might cause trouble when dealing with remote 
files.  So it's probably more robust to follow ns-win.el's lead and define 
a coding system that only does post-read conversion.  Thus:

   (when (eq system-type 'darwin)
     (require 'ucs-normalize)
     (define-coding-system 'utf-8-hfs-for-read
       "UTF-8 based coding system for HFS+ file names."
       :coding-type 'utf-8
       :mnemonic ?U
       :charset-list '(unicode)
       :post-read-conversion 'ucs-normalize-hfs-nfd-post-read-conversion)
     (setq file-name-coding-system 'utf-8-hfs-for-read))

would be the addition to make to x-win.el.






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
  2009-04-09 16:28 bug#2940: 23.0.92; C-s in dired fails to find files with umlauts Markus Triska
  2009-04-09 17:22 ` Eli Zaretskii
  2011-07-11 22:02 ` Alp Aker
@ 2019-11-02  6:12 ` Stefan Kangas
  2019-11-02  9:17   ` bug#2940: Aw: " Markus Triska
  2 siblings, 1 reply; 8+ messages in thread
From: Stefan Kangas @ 2019-11-02  6:12 UTC (permalink / raw)
  To: Markus Triska; +Cc: 2940

Markus Triska <markus.triska@gmx.at> writes:

> With ~/töst.txt existing, when I do:
>
>    $ emacs -Q ~/
>
> and press:
>
>    C-\ german-postfix RET C-s oe RET
>
> to search for "ö" in the dired buffer (the input method correctly
> converts the entered "oe" to "ö" in the minibuffer), I get:
>
>    Failing wrapped I-search [DE<]: ö

I can't reproduce this on current master.  Are you still seeing this
on a modern version of Emacs?

If I don't hear back from you within a couple of weeks, I'll just
close this bug as unreproducible.

Best regards,
Stefan Kangas





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#2940: Aw: Re: 23.0.92; C-s in dired fails to find files with umlauts
  2019-11-02  6:12 ` Stefan Kangas
@ 2019-11-02  9:17   ` Markus Triska
  0 siblings, 0 replies; 8+ messages in thread
From: Markus Triska @ 2019-11-02  9:17 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 2940

> I can't reproduce this on current master. Are you still seeing this
> on a modern version of Emacs?

Yes, I can reproduce this exact same issue with Emacs 26.1 on OSX,
and also with a recent version of Debian.

All the best,
Markus





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-11-02  9:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-09 16:28 bug#2940: 23.0.92; C-s in dired fails to find files with umlauts Markus Triska
2009-04-09 17:22 ` Eli Zaretskii
2009-04-09 17:33   ` Markus Triska
2011-07-11 22:02 ` Alp Aker
2011-07-15 20:38   ` Glenn Morris
2011-07-16 17:38     ` Alp Aker
2019-11-02  6:12 ` Stefan Kangas
2019-11-02  9:17   ` bug#2940: Aw: " Markus Triska

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).