unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* cvs-quickdir and UTF-8 encoded file names
@ 2003-08-16 16:33 Karl Eichwalder
  2003-08-21  1:36 ` Kenichi Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Karl Eichwalder @ 2003-08-16 16:33 UTC (permalink / raw)


cvs-quickdir does not work properly for UTF-8 encoded file names
containing umlauts like ä, ö, ü, etc.  The names are displayed like this

    Albrecht D\303\274rer

instead of

    Albrecht Dürer

And they are marked as "missing".

Platform: SuSE Linux 8.2 (x86)
locale  :
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE=C
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

-- 
                                                         |      ,__o
http://www.gnu.franken.de/ke/                            |    _-\_<,
ke@suse.de (work) / keichwa@gmx.net (home)               |   (*)/'(*)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cvs-quickdir and UTF-8 encoded file names
  2003-08-16 16:33 cvs-quickdir and UTF-8 encoded file names Karl Eichwalder
@ 2003-08-21  1:36 ` Kenichi Handa
  2003-08-21  4:47   ` Karl Eichwalder
  0 siblings, 1 reply; 6+ messages in thread
From: Kenichi Handa @ 2003-08-21  1:36 UTC (permalink / raw)
  Cc: emacs-devel

In article <sh8ypt7ebc.fsf@tux.gnu.franken.de>, Karl Eichwalder <keichwa@gmx.net> writes:
> cvs-quickdir does not work properly for UTF-8 encoded file names
> containing umlauts like ä, ö, ü, etc.  The names are displayed like this

>     Albrecht D\303\274rer

> instead of

>     Albrecht Dürer

> And they are marked as "missing".

> Platform: SuSE Linux 8.2 (x86)
> locale  :
> LANG=de_DE.UTF-8
> LC_CTYPE="de_DE.UTF-8"

Please show me the result of C-h C RET and the values of
these variables:
    default-enable-multibyte-characters
    enable-multibyte-characters
    default-file-name-coding-system
    file-name-coding-system

And, when you read CVS/Entries directly, how those file
names are decoded?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cvs-quickdir and UTF-8 encoded file names
  2003-08-21  1:36 ` Kenichi Handa
@ 2003-08-21  4:47   ` Karl Eichwalder
  2003-08-21  6:26     ` Kenichi Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Karl Eichwalder @ 2003-08-21  4:47 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> Please show me the result of C-h C RET and the values of
> these variables:
>     default-enable-multibyte-characters
>     enable-multibyte-characters
>     default-file-name-coding-system
>     file-name-coding-system

Thanks for asking:

Coding system for saving this buffer:
  Not set locally, use the default.
Default coding system (for new files):
  u -- mule-utf-8 (alias: utf-8)

Coding system for keyboard input:
  nil
Coding system for terminal output:
  u -- mule-utf-8 (alias: utf-8)

Defaults for subprocess I/O:
  decoding: u -- mule-utf-8 (alias: utf-8)

  encoding: u -- mule-utf-8 (alias: utf-8)


Priority order for recognizing coding systems when reading files:
  1. mule-utf-8 (alias: utf-8)
  2. iso-latin-1 (alias: iso-8859-1 latin-1)
  3. mule-utf-16be-with-signature (alias: utf-16be-with-signature mule-utf-16-be utf-16-be)
  4. mule-utf-16le-with-signature (alias: utf-16le-with-signature mule-utf-16-le utf-16-le)
  5. iso-2022-jp (alias: junet)
  6. iso-2022-7bit 
  7. iso-2022-7bit-lock (alias: iso-2022-int-1)
  8. iso-2022-8bit-ss2 
  9. emacs-mule 
  10. raw-text 
  11. japanese-shift-jis (alias: shift_jis sjis)
  12. chinese-big5 (alias: big5 cn-big5)
  13. no-conversion 

  Other coding systems cannot be distinguished automatically
  from these, and therefore cannot be recognized automatically
  with the present coding system priorities.

  The following are decoded correctly but recognized as iso-2022-7bit-lock:
    iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-ext
    iso-2022-jp-2 iso-2022-kr

Particular coding systems specified for certain file names:

  OPERATION	TARGET PATTERN		CODING SYSTEM(s)
  ---------	--------------		----------------
  File I/O	"ChangeLog"		(utf-8 . utf-8)
		"\\.g?z\\(~\\|\\.~[0-9]+~\\)?\\'"
					(no-conversion . no-conversion)
		"\\.tgz\\'"		(no-conversion . no-conversion)
		"\\.bz2\\'"		(no-conversion . no-conversion)
		"\\.Z\\(~\\|\\.~[0-9]+~\\)?\\'"
					(no-conversion . no-conversion)
		"\\.elc\\'"		(emacs-mule . emacs-mule)
		"\\.utf\\(-8\\)?\\'"	utf-8
		"\\(\\`\\|/\\)loaddefs.el\\'"
					(raw-text . raw-text-unix)
		"\\.tar\\'"		(no-conversion . no-conversion)
		"\\.po[tx]?\\'\\|\\.po\\."
					po-find-file-coding-system
		""			(undecided)
  Process I/O	nothing specified
  Network I/O	nothing specified

default-enable-multibyte-characters's value is t

enable-multibyte-characters's value is t
Local in buffer *cvs*; global value is t

default-file-name-coding-system's value is mule-utf-8
file-name-coding-system's value is nil

> And, when you read CVS/Entries directly, how those file
> names are decoded?

Is this the value you want to know?

Coding system for saving this buffer:
  t -- raw-text-unix

To see this value I did:

C-x C-f CVS/Entries RET
M-x describe-coding-system RET

Thanks for your help.

-- 
                                                         |      ,__o
http://www.gnu.franken.de/ke/                            |    _-\_<,
ke@suse.de (work) / keichwa@gmx.net (home)               |   (*)/'(*)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cvs-quickdir and UTF-8 encoded file names
  2003-08-21  4:47   ` Karl Eichwalder
@ 2003-08-21  6:26     ` Kenichi Handa
       [not found]       ` <shada37yfp.fsf@tux.gnu.franken.de>
  0 siblings, 1 reply; 6+ messages in thread
From: Kenichi Handa @ 2003-08-21  6:26 UTC (permalink / raw)
  Cc: emacs-devel

In article <shu18bty54.fsf@tux.gnu.franken.de>, Karl Eichwalder <keichwa@gmx.net> writes:
>>  And, when you read CVS/Entries directly, how those file
>>  names are decoded?

> Is this the value you want to know?

> Coding system for saving this buffer:
>   t -- raw-text-unix

> To see this value I did:

> C-x C-f CVS/Entries RET
> M-x describe-coding-system RET

Thank you for the info.  Somehow, Emacs fails to detect the
encoding of this file.  Please send me that file by some
8-bit transparent way (e.g. uuencode, base64-encoding).

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cvs-quickdir and UTF-8 encoded file names
       [not found]       ` <shada37yfp.fsf@tux.gnu.franken.de>
@ 2003-08-25  1:14         ` Kenichi Handa
  2003-08-25  4:16           ` Karl Eichwalder
  0 siblings, 1 reply; 6+ messages in thread
From: Kenichi Handa @ 2003-08-25  1:14 UTC (permalink / raw)
  Cc: emacs-devel

In article <shada37yfp.fsf@tux.gnu.franken.de>, Karl Eichwalder <keichwa@gmx.net> writes:
>>  Thank you for the info.  Somehow, Emacs fails to detect the
>>  encoding of this file.  Please send me that file by some
>>  8-bit transparent way (e.g. uuencode, base64-encoding).

> Here it comes:

I found an invalid UTF-8 sequence at 279th line.  It seems
that the file name on this line is in ISO-8859-1, not UTF-8.
Thus, Emacs failed to detect it as utf-8, and decoded it by
raw-text.

Perhaps the file Entries should be read by:

(let ((coding-system-for-read (or default-file-name-coding-system
				  file-name-coding-system)))
   ...)

But, that file also contains "date" string.  Does Emacs uses
that part of information too?  If so, how is it encoded?
Does it contain only ASCII characters?  Or, is it encoded in
users locale?  Are there any possibility that the encoding
of file name is different from the encoding of date string
in a normal situation?

As my knowlege about CVS (and CVS handling code of emacs) is
limitted, I'd like to ask some other person to fix this
problem.  Of course, I'll answer any Mule-related questions.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cvs-quickdir and UTF-8 encoded file names
  2003-08-25  1:14         ` Kenichi Handa
@ 2003-08-25  4:16           ` Karl Eichwalder
  0 siblings, 0 replies; 6+ messages in thread
From: Karl Eichwalder @ 2003-08-25  4:16 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> I found an invalid UTF-8 sequence at 279th line.  It seems
> that the file name on this line is in ISO-8859-1, not UTF-8.

Thanks for tracking down my error; the file name on my disk is also
ISO-8859-1 encoded.  I might have slipped in when I have started Emacs
within an ISO-8859-1 environment by accident.

> Thus, Emacs failed to detect it as utf-8, and decoded it by
> raw-text.
>
> Perhaps the file Entries should be read by:
>
> (let ((coding-system-for-read (or default-file-name-coding-system
> 				  file-name-coding-system)))
>    ...)

If it is possible to detect some such encoding mismatch, Emacs should
raise an exception telling the user the how to solve the problem:

    Convert file name

    Use raw text

    Use UTF-8

Sorry, I cannot answer the other questions.  Thanks again for your
help.

-- 
                                                         |      ,__o
http://www.gnu.franken.de/ke/                            |    _-\_<,
ke@suse.de (work) / keichwa@gmx.net (home)               |   (*)/'(*)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-08-25  4:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-16 16:33 cvs-quickdir and UTF-8 encoded file names Karl Eichwalder
2003-08-21  1:36 ` Kenichi Handa
2003-08-21  4:47   ` Karl Eichwalder
2003-08-21  6:26     ` Kenichi Handa
     [not found]       ` <shada37yfp.fsf@tux.gnu.franken.de>
2003-08-25  1:14         ` Kenichi Handa
2003-08-25  4:16           ` Karl Eichwalder

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).