unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#4157: 23.1.50; faulty character characterisation for ä
@ 2009-08-16  2:19 Peter Dyballa
  2009-08-18  1:09 ` Kenichi Handa
                   ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Peter Dyballa @ 2009-08-16  2:19 UTC (permalink / raw)
  To: emacs-pretest-bug

[-- Attachment #1: Type: text/plain, Size: 911 bytes --]

Hello!

When I launch GNU Emacs in an ISO Latin environment (env  
LC_CTYPE=de_DE.ISO8859-15 LANG=de_DE.ISO8859-15 /usr/local/bin/ 
emacs-23.1.50 -Q &) and display in dired a directory with entries  
from some month of March the "Mär" abbrevation for the German month  
name "März" is displayed as M\344r. C-u C-x = on this \344 reveals:

	        character: \344 (4194276, #o17777744, #x3fffe4)
	preferred charset: eight-bit (Raw bytes 128-255)
	       code point: 0xE4
	           syntax: w 	which means: word
	      buffer code: #xE4
	        file code: not encodable by coding system iso-latin-9-unix
	          display: no font available

The dired buffer has a 0 as encoding indicator. In ISO Latin 1 or 15  
encodings LATIN SMALL LETTER A WITH DIAERESIS is \344 = 228 = 0xE4 = U 
+00E4 a valid character and not some raw "eight-bit" entity. Could be  
this prevents proper display:


[-- Attachment #2: pastedGraphic.tiff --]
[-- Type: image/tiff, Size: 9998 bytes --]

[-- Attachment #3: Type: text/plain, Size: 60 bytes --]



In *shell* buffer both Apple's ls and GNU's gls display:


[-- Attachment #4: pastedGraphic.tiff --]
[-- Type: image/tiff, Size: 9152 bytes --]

[-- Attachment #5: Type: text/plain, Size: 2511 bytes --]



Here the ä is described as:

	        character: ä (228, #o344, #xe4)
	preferred charset: iso-8859-15 (ISO/IEC 8859/15)
	       code point: 0xE4
	           syntax: w 	which means: word
	         category: .:Base, j:Japanese, l:Latin
	      buffer code: #xC3 #xA4
	        file code: #xE4 (encoded by coding system iso-latin-9-unix)
	          display: by this font (glyph code)
	    x:-b&h-lucidatypewriter-medium-r-normal-sans-10-100-75-75-m-60- 
iso10646-1 (#xE4)

The buffer's encoding is "0" as well, i.e., ISO Latin 1 or 15.

BTW, the issue is correct in UTF-8 environment.abbreviation


In GNU Emacs 23.1.50.1 (powerpc-apple-darwin8.11.0, X toolkit, Xaw3d  
scroll bars)
  of 2009-07-30 on Latsche.local
Windowing system distributor `The XFree86 Project, Inc', version  
11.0.40400000
configured using `configure  '--without-sound' '--without-pop' '-- 
with-dbus' '--with-libotf' '--with-x-toolkit=athena' '--x-includes=/ 
usr/X11R6/include' '--x-libraries=/usr/X11R6/lib' '--enable- 
locallisppath=/Library/Application Support/Emacs/calendar23:/Library/ 
Application Support/Emacs' 'CPPFLAGS=-no-cpp-precomp -I/sw/include -I/ 
sw/lib/pango-ft219/include/pango-1.0 -idirafter /usr/X11R6/include'  
'CFLAGS=-ggdb3 -gfull -mtraceback=full -Wno-pointer-sign -H -pipe - 
fPIC -mcpu=7450 -mtune=7450 -fast -mpim-altivec -ftree-vectorize - 
foptimize-register-move -freorder-blocks -fthread-jumps -fpeephole - 
fno-crossjumping' 'LDFLAGS=-dead_strip -multiply_defined suppress -L/ 
sw/lib''

Important settings:
   value of $LC_ALL: nil
   value of $LC_COLLATE: nil
   value of $LC_CTYPE: de_DE.ISO8859-15
   value of $LC_MESSAGES: nil
   value of $LC_MONETARY: nil
   value of $LC_NUMERIC: nil
   value of $LC_TIME: nil
   value of $LANG: de_DE.ISO8859-15
   value of $XMODIFIERS: nil
   locale-coding-system: iso-latin-9-unix
   default-enable-multibyte-characters: t

Major mode: Dired by name

Minor modes in effect:
   shell-dirtrack-mode: t
   show-paren-mode: t
   display-time-mode: t
   tooltip-mode: t
   tool-bar-mode: t
   mouse-wheel-mode: t
   file-name-shadow-mode: t
   global-font-lock-mode: t
   font-lock-mode: t
   blink-cursor-mode: t
   global-auto-composition-mode: t
   auto-composition-mode: t
   auto-encryption-mode: t
   auto-compression-mode: t
   column-number-mode: t
   line-number-mode: t
   transient-mark-mode: t

--
Greetings

   Pete

If you're not confused, you're not paying attention.




^ permalink raw reply	[flat|nested] 47+ messages in thread
* bug#4157: 23.1.50; faulty character characterisation for ä
@ 2009-09-04  5:51 川幡太一
  0 siblings, 0 replies; 47+ messages in thread
From: 川幡太一 @ 2009-09-04  5:51 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: 4157, Peter_Dyballa

Hi,

I'm on the side of keeping the name of the coding-system to be 'utf-8-hfs',
as this coding system is defined by the specification of HFS+
(http://developer.apple.com/technotes/tn/tn1150.html), rather than
MacOS itself.   This implies that if other OS mounts HFS, they should
still apply "modified-NFD" for the file names.

Besides, the other components of MacOS handles UTF-8 as NFC, as seen
by the spotlight, etc.

It is very unfortunate (and possibly flaw) of Carbon API that they do not
care the file system they are accessing.  One must care by himself when
copying files among different file systems.  (For example, when I back-up
files among file systems with "rsync", I usually put some options such
as "--iconv=UTF8-MAC,UTF-8")... sigh....

Cheers,

2009/9/4 Kenichi Handa <handa@m17n.org>:
> In article <0B33C588-C7AD-41D9-8CAC-51AEBD40B264@Freenet.DE>, Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
>
>> My test files were originally on an HFS+ and on an UFS (UNIX File
>> System) volume (partition, slice, ...). This evening I copied them to
>> an MS-DOS FAT16 file system. When I invoke GNU Emacs with -Q I see in
>> all three file systems the decomposed characters in the file names.
>> With ucs-normalize loaded and file-name-coding-system set to utf-8-
>> hfs the look in all three file systems OK. This makes the chosen name
>> utf-8-hfs not the best. Maybe utf-8-osx is more appropriate.
>
> In article <jwv8wgz4jkj.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>> Good point.  Or maybe utf-8-darwin.
>
> Kawabata-san, what do you think?
>
> ---
> Kenichi Handa
> handa@m17n.org
>



-- 
---------------------------------------------------------------------
 川幡 太一 (KAWABATA, Taichi)   E-mail: kawabata@clock.ocn.ne.jp
                  kawabata.taichi@gmail.com





^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2019-11-17 20:58 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-16  2:19 bug#4157: 23.1.50; faulty character characterisation for ä Peter Dyballa
2009-08-18  1:09 ` Kenichi Handa
2009-08-18 13:40   ` Peter Dyballa
2009-08-19  0:23     ` bug#4157: " Kenichi Handa
2009-08-19 22:47       ` Peter Dyballa
2009-08-24 11:30       ` Peter Dyballa
2009-08-24 12:22         ` bug#4157: " Kenichi Handa
2009-08-24 15:21           ` Peter Dyballa
2009-08-25  0:46             ` bug#4157: " Kenichi Handa
2009-08-25  7:51               ` Peter Dyballa
2009-08-25 22:19           ` Peter Dyballa
2009-08-27  6:52             ` bug#4157: " Kenichi Handa
2009-08-27  8:50               ` Peter Dyballa
2009-08-27 11:33                 ` bug#4157: " Kenichi Handa
2009-08-27 12:38                   ` Peter Dyballa
2009-08-28 19:27           ` Peter Dyballa
2009-08-31 21:11           ` Peter Dyballa
2009-09-01  0:04             ` Stefan Monnier
2009-09-04  0:58               ` Kenichi Handa
2009-08-22  4:09 ` Stefan Monnier
2009-08-22  8:50   ` Peter Dyballa
2009-08-23  1:49     ` Stefan Monnier
2009-08-23  9:57       ` Peter Dyballa
2019-10-09 14:29 ` Stefan Kangas
2019-10-09 18:48   ` Eli Zaretskii
2019-10-09 19:47   ` Stefan Monnier
2019-10-09 22:42     ` Peter Dyballa
2019-11-11  1:49       ` Stefan Kangas
2019-11-11 16:36         ` Peter Dyballa
2019-10-10  0:10     ` Stefan Kangas
2019-10-10  7:20       ` Eli Zaretskii
2019-10-10 10:36         ` Stefan Kangas
2019-10-10 11:20           ` Eli Zaretskii
2019-10-10 11:52             ` Stefan Kangas
2019-10-10 12:39               ` Stefan Kangas
2019-10-10 12:41                 ` Stefan Kangas
2019-10-10 18:33             ` Peter Dyballa
2019-10-10 18:57               ` Eli Zaretskii
2019-10-10 21:07                 ` Stefan Monnier
2019-10-11 13:33                   ` Stefan Kangas
2019-10-11  7:10               ` Andreas Schwab
2019-10-11  7:23                 ` Peter Dyballa
2019-10-10  8:15       ` Andreas Schwab
2019-10-10 12:54       ` Stefan Monnier
2019-10-10 13:12         ` Stefan Kangas
2019-11-17 20:58           ` Stefan Kangas
  -- strict thread matches above, loose matches on Subject: below --
2009-09-04  5:51 川幡太一

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).