unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* ISO-8859-1  encoded file names and UTF-8
@ 2003-03-08  6:15 Karl Eichwalder
  2003-03-08  9:16 ` Eli Zaretskii
  2003-03-19 13:33 ` Kenichi Handa
  0 siblings, 2 replies; 23+ messages in thread
From: Karl Eichwalder @ 2003-03-08  6:15 UTC (permalink / raw)


UTF-8 Emacs sometimes fails to manage ISO-8859-1 file names.  I do not
know how you can rename or convert these file name from ISO-8859-1 to
UTF-8 encoded names using Emacs.

I created some ISO-8859-1 encoded file names (on top of the ext2 file
system).  Start a UTF-8 Emacs:

    LANG=de_DE.UTF-8 emacs

Call dired on the directory containing these files; you will see
something like (attention, Emacs/Gnus will normalize the names when I
send the mail!):

  /home/ke/Texte/wikipedia:
  insgesamt 100
  -rw-r--r--    1 ke       users        1291 2003-03-06 08:35 Brücke (Bauwerk)
  drwxr-xr-x    2 ke       users        4096 2003-03-07 22:29 CVS
  -rw-r--r--    1 ke       users        5836 2003-03-07 07:38 Deutschland
  -rw-r--r--    1 ke       users        1796 2003-03-07 20:44 Fürth (Bayern)
  -rw-r--r--    1 ke       users         450 2003-03-07 22:28 Ludwigs-Kanal
  -rw-r--r--    1 ke       users         259 2003-03-07 05:41 Malaiische Sprache
  -rw-r--r--    1 ke       users        1584 2003-03-07 22:33 Malaysia
  -rw-r--r--    1 ke       users        1266 2003-03-06 08:47 Malta
  -rw-r--r--    1 ke       users         865 2003-03-06 21:24 Maltesische Sprache
  -rw-r--r--    1 ke       users         520 2003-03-05 23:22 Nördliches Sotho
  -rw-r--r--    1 ke       users        1269 2003-03-07 20:04 Pegnitz (Fluss)
  -rw-r--r--    1 ke       users         682 2003-03-07 20:58 Regnitz
  -rw-r--r--    1 ke       users         668 2003-03-07 07:19 Rosmarin
  -rw-r--r--    1 ke       users         593 2003-03-05 23:23 Sesotho

Note, the buffer is marked "-u" in the modeline.

Now go to a file name with umlauts and press 'f' to visit the file:

Debugger entered--Lisp error: (error "File no longer exists; type `g' to update Dired buffer")
  signal(error ("File no longer exists; type `g' to update Dired buffer"))
  error("File no longer exists; type `g' to update Dired buffer")
  dired-get-file-for-visit()
  dired-find-file()
  call-interactively(dired-find-file)

If you will need more info, please ask.  I can also send a tar archive
containing those stupid file names.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.gnu.franken.de/ke/                            |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-08  6:15 ISO-8859-1 encoded file names and UTF-8 Karl Eichwalder
@ 2003-03-08  9:16 ` Eli Zaretskii
  2003-03-08 10:05   ` Karl Eichwalder
  2003-03-19 13:33 ` Kenichi Handa
  1 sibling, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2003-03-08  9:16 UTC (permalink / raw)
  Cc: emacs-devel

> From: Karl Eichwalder <keichwa@gmx.net>
> Date: Sat, 08 Mar 2003 07:15:56 +0100
> 
> UTF-8 Emacs sometimes fails to manage ISO-8859-1 file names.

What is "UTF-8 Emacs"?

Also, what is your value of file-name-coding-system?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-08  9:16 ` Eli Zaretskii
@ 2003-03-08 10:05   ` Karl Eichwalder
  2003-03-08 17:06     ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Karl Eichwalder @ 2003-03-08 10:05 UTC (permalink / raw)
  Cc: emacs-devel

"Eli Zaretskii" <eliz@elta.co.il> writes:

>> UTF-8 Emacs sometimes fails to manage ISO-8859-1 file names.
>
> What is "UTF-8 Emacs"?

An Emacs (21.3.50) started this way

    LANG=de_DE.UTF-8 emacs

Sorry for my private terminology.

> Also, what is your value of file-name-coding-system?

file-name-coding-system's value is nil

*Coding system for encoding file names.
If it is nil, `default-file-name-coding-system' (which see) is used.

I did not set it.

-=-=-=-=-=-=-=-=-=-=-=-=-=- cut here -=-=-=-=-=-=-=-=-=-=-=-=-=-
Additional remark (I'll file a more detailed report later, if the
problem persists):

Some days before I observed that Emacs "auto-corrects" broken .po
files; the broken files are declared as UTF-8 and containing those
codes and additionally some iso-8859-1 got mixed in by accident.  Emacs
displays those wrong characters "correctly" -- this is somehow
"user-friendly" but nervertheless highly confusing.  At least please
add a special background to those auto-corrected characters.

Is it only me who observed such a behavior?  If this is desired
behavior please point me to the documentation of this feature.

I think both problems are related.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.gnu.franken.de/ke/                            |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-08 10:05   ` Karl Eichwalder
@ 2003-03-08 17:06     ` Eli Zaretskii
  2003-03-08 18:25       ` Karl Eichwalder
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2003-03-08 17:06 UTC (permalink / raw)
  Cc: emacs-devel

> From: Karl Eichwalder <keichwa@gmx.net>
> Date: Sat, 08 Mar 2003 11:05:41 +0100
> 
> file-name-coding-system's value is nil
> 
> *Coding system for encoding file names.
> If it is nil, `default-file-name-coding-system' (which see) is used.
> 
> I did not set it.

And what is the value of `default-file-name-coding-system'?  If it's
anything but `utf-8', please try setting `file-name-coding-system' to
`utf-8' and see if that helps.

> Some days before I observed that Emacs "auto-corrects" broken .po
> files; the broken files are declared as UTF-8 and containing those
> codes and additionally some iso-8859-1 got mixed in by accident.

Sorry, I'm not sure I understand the last part of this sentence
correctly; if I didn't, what's below might not make any sense.

IIUC, Emacs sometimes decides that *.po files which contain characters
from different encodings are encoded in UTF-8.  If that's so, I think
it's because you made utf-8 your preferred encoding (IIRC, that's what
Emacs does when it sees that your locale uses UTF-8).

> Emacs
> displays those wrong characters "correctly" -- this is somehow
> "user-friendly" but nervertheless highly confusing.

What does Emacs say if you go to one of those ``wrong'' characters
and type "C-u C-x ="?  Are they treated as eight-bit-* characters?
If so, Emacs displays them with the proper glyphs because your fonts
are set in a way that fits Latin-1.

> At least please add a special background to those auto-corrected
> characters.

This would contradict the whole purpose of a multilingual Emacs: it is
meant to seamlessly display characters from different character sets
without any special effects.  How can Emacs know that in this
particular case, you want it to display different character sets
differently?

I believe that if such a feature is added, it must be driven by
user-level settings.  For example, users could define a list of
character sets or codepoints which they don't expect to see in their
buffers, and Emacs will then flag characters from those sets with some
visual cue.  It's even possible that you can do that yourself right
now by using hi-lock.el or something similar, since IIRC regular
expressions can be used to express character categories.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-08 17:06     ` Eli Zaretskii
@ 2003-03-08 18:25       ` Karl Eichwalder
  2003-03-08 22:35         ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Karl Eichwalder @ 2003-03-08 18:25 UTC (permalink / raw)
  Cc: emacs-devel

"Eli Zaretskii" <eliz@elta.co.il> writes:

> And what is the value of `default-file-name-coding-system'?  If it's
> anything but `utf-8', please try setting `file-name-coding-system' to
> `utf-8' and see if that helps.

I did ask for help; I reported a problem worth fixing ;)  It is a
reather serious problem, IMO.

> This would contradict the whole purpose of a multilingual Emacs: it is
> meant to seamlessly display characters from different character sets
> without any special effects.

This might be appropriate when you use Emacs for reading mail and
news.  But it's a wrong behavior when it come to "source code"

> How can Emacs know that in this particular case, you want it to
> display different character sets differently?

When a XML file or a PO file (that's a message file used by gettext) is
declared as UTF-8 encoded and Emacs detects an error within such a
file, it's surely worth notifying the user.

> I believe that if such a feature is added, it must be driven by
> user-level settings.  For example, users could define a list of
> character sets or codepoints which they don't expect to see in their
> buffers, and Emacs will then flag characters from those sets with some
> visual cue.

Yes, approximately.  But instead of a list of character sets it must
depend on a list of file types (.xml, .po, .java, etc.).

> It's even possible that you can do that yourself right now by using
> hi-lock.el or something similar, since IIRC regular expressions can be
> used to express character categories.

Thanks for advice :) I know how to work around the problem; for checking
.po files one can use 'msgfmt' etc.  I'd rather vote to change Emacs to
make users happy -- that's probably not that urgent, but it should
happen for the next major release coming from CVS HEAD.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.gnu.franken.de/ke/                            |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-08 18:25       ` Karl Eichwalder
@ 2003-03-08 22:35         ` Eli Zaretskii
  2003-03-09  4:38           ` Karl Eichwalder
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2003-03-08 22:35 UTC (permalink / raw)
  Cc: emacs-devel

> From: Karl Eichwalder <keichwa@gmx.net>
> Date: Sat, 08 Mar 2003 19:25:25 +0100
> 
> > And what is the value of `default-file-name-coding-system'?  If it's
> > anything but `utf-8', please try setting `file-name-coding-system' to
> > `utf-8' and see if that helps.
> 
> I did ask for help; I reported a problem worth fixing ;)  It is a
> reather serious problem, IMO.

This must be some kind of misunderstanding: I was trying to help you,
not mock your report in any way.

So please do tell what is the value of `default-file-name-coding-system'
and please do try setting `file-name-coding-system' to `utf-8'.  It
might get your problem solved.

> > It's even possible that you can do that yourself right now by using
> > hi-lock.el or something similar, since IIRC regular expressions can be
> > used to express character categories.
> 
> Thanks for advice :) I know how to work around the problem; for checking
> .po files one can use 'msgfmt' etc.  I'd rather vote to change Emacs to
> make users happy -- that's probably not that urgent, but it should
> happen for the next major release coming from CVS HEAD.

Again a misunderstanding, I hope.  I was trying to point to a possible
way of designing such a feature, not tell that I think the issue
should be dismissed.

If my response doesn't help you, simply forget I ever spoke.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-08 22:35         ` Eli Zaretskii
@ 2003-03-09  4:38           ` Karl Eichwalder
  0 siblings, 0 replies; 23+ messages in thread
From: Karl Eichwalder @ 2003-03-09  4:38 UTC (permalink / raw)
  Cc: emacs-devel

"Eli Zaretskii" <eliz@elta.co.il> writes:

>> From: Karl Eichwalder <keichwa@gmx.net>
>> Date: Sat, 08 Mar 2003 19:25:25 +0100
>> 
>> > And what is the value of `default-file-name-coding-system'?  If it's
>> > anything but `utf-8', please try setting `file-name-coding-system' to
>> > `utf-8' and see if that helps.
>> 
>> I did ask for help; I reported a problem worth fixing ;)  It is a
>> reather serious problem, IMO.
>
> This must be some kind of misunderstanding: I was trying to help you,
> not mock your report in any way.

Sorry.

> So please do tell what is the value of `default-file-name-coding-system'
> and please do try setting `file-name-coding-system' to `utf-8'.  It
> might get your problem solved.

It's set to mule-utf-8.  In the meantime I converted all file names to
UTF-8; now, Emacs works as expected (name are properly decoded and dired
is able to work on them).  Then I added another ISO-8859-1
encoded file name and all the UTF-8 encoded names are displayed wrongly:

  /home/ke/Texte/wikipedia:
  insgesamt 128
[...]
  -rw-r--r--    1 ke       users        2769 2003-03-08 21:46 Fürth (Bayern)
  -rw-r--r--    1 ke       users         903 2003-03-08 10:44 Internationalisierung
  -rw-r--r--    1 ke       users        1327 2003-03-08 17:42 Konrad Duden
  -rw-r--r--    1 ke       users        2117 2003-03-08 17:46 Mittelfranken
  -rw-r--r--    1 ke       users        1150 2003-03-08 19:08 Nachschlagewerk
  -rw-r--r--    1 ke       users         520 2003-03-05 23:24 Nördliches Sotho
  -rw-r--r--    1 ke       users           0 2003-03-09 05:30 Nürnberg
  -rw-r--r--    1 ke       users        1269 2003-03-08 10:44 Pegnitz (Fluss)
[...]

"Nürnberg" is ISO-8859-1 encoded.

> Again a misunderstanding, I hope.  I was trying to point to a possible
> way of designing such a feature, not tell that I think the issue
> should be dismissed.

Point taken.  Thanks for helping to track down the issue.  I hope my
reports are helpful.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.gnu.franken.de/ke/                            |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-08  6:15 ISO-8859-1 encoded file names and UTF-8 Karl Eichwalder
  2003-03-08  9:16 ` Eli Zaretskii
@ 2003-03-19 13:33 ` Kenichi Handa
  2003-03-19 16:15   ` Karl Eichwalder
                     ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Kenichi Handa @ 2003-03-19 13:33 UTC (permalink / raw)
  Cc: emacs-devel

I'm sorry for the late response.  Before providing a solution, I had
to fix some fundamental problems about filename handling.

In article <shfzpy1izn.fsf@tux.gnu.franken.de>, Karl Eichwalder <keichwa@gmx.net> writes:

> UTF-8 Emacs sometimes fails to manage ISO-8859-1 file names.  I do not
> know how you can rename or convert these file name from ISO-8859-1 to
> UTF-8 encoded names using Emacs.

Hmmm, I think it's a completely overlooked but important feature.

> I created some ISO-8859-1 encoded file names (on top of the ext2 file
> system).  Start a UTF-8 Emacs:

>     LANG=de_DE.UTF-8 emacs

> Call dired on the directory containing these files; you will see
> something like (attention, Emacs/Gnus will normalize the names when I
> send the mail!):
[...]
> Note, the buffer is marked "-u" in the modeline.

> Now go to a file name with umlauts and press 'f' to visit the file:

> Debugger entered--Lisp error: (error "File no longer exists; type `g' to update Dired buffer")

This problem should be fixed now in HEAD.  Please try again after
updating from CVS.

Anyway, how about the attached function for changing the encoding of a
filename.  I have not yet installed it because I have not yet found an
answer to this question.

Should the recoding of filename regarded as a kind of file name
changing?  If so, perhaps we should make the function rename-file to
handle also recoding.   In that case, how should we tell rename-file
to actually recode filename encoding?

---
Ken'ichi HANDA
handa@m17n.org

(defun recode-file (file coding new-coding &optional ok-if-already-exists)
  (interactive
   (let* ((default
	    (or file-name-coding-system default-file-name-coding-system))
	  (filename
	   (read-file-name "Recode file: " nil nil t))
	  (from-coding
	   (if (and default
		    ;; We provide the default coding only when it
		    ;; seems that the filename is correctly decoded by
		    ;; the default coding.
		    (let ((charsets (find-charset-string filename)))
		      (and (not (memq 'eight-bit-control charsets))
			   (not (memq 'eight-bit-graphic charsets)))))
	       (read-coding-system
		(format "Recode file %s from coding (default %s): "
			filename default)
		default)
	     (read-coding-system
	      (format "Recode file %s from coding: " filename))))
	  (to-coding
	   ;; We provide the default coding only when a user is going
	   ;; to change the encoding not from the default coding.
	   (if (eq from-coding default)
	       (read-coding-system
		(format "Recode file %s from coding %s to coding: "
			filename from-coding))
	     (read-coding-system
	      (format "Recode file %s from coding %s to coding (default %s): "
		      filename from-coding default)
	      default))))
     (list filename from-coding to-coding)))
  (let* ((default
	   (or file-name-coding-system default-file-name-coding-system))
	 (encoded (encode-coding-string file default))
	 (new-encoded (encode-coding-string 
		       (decode-coding-string encoded coding) new-coding))
	 (file-name-coding-system nil)
	 (default-file-name-coding-system nil)
	 (locale-coding-system nil))
    (rename-file encoded new-encoded ok-if-already-exists)))

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-19 13:33 ` Kenichi Handa
@ 2003-03-19 16:15   ` Karl Eichwalder
  2003-03-19 23:52     ` Kenichi Handa
  2003-03-20  8:46   ` Richard Stallman
  2003-04-01 21:17   ` etags and UTF-8 encoded file names (Re: ISO-8859-1 encoded file names and UTF-8) Karl Eichwalder
  2 siblings, 1 reply; 23+ messages in thread
From: Karl Eichwalder @ 2003-03-19 16:15 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> I'm sorry for the late response.  Before providing a solution, I had
> to fix some fundamental problems about filename handling.

Yes, I already saw you working on stuff like that the last days; thanks
a lot!

>> Now go to a file name with umlauts and press 'f' to visit the file:
>
>> Debugger entered--Lisp error: (error "File no longer exists; type `g' to update Dired buffer")
>
> This problem should be fixed now in HEAD.  Please try again after
> updating from CVS.

I think there is still a subtle bug left; in a ISO-8859-1 locale do:

touch "Maler Müller"

Then call emacs:

LANG=de_DE.UTF-8 emacs -q --no-site --no-splash .

In dired you can see:

  -rw-r--r--  1 ke  users   0 2003-03-19 16:10 Maler M\374ller\374rle
                                     good part ^^^^^^^^^^^^^^^|||||||
                              trailing garbage ------------>>>^^^^^^^

There is trailing garbage (otherwise the escape sequence is okay for
me!).

> Anyway, how about the attached function for changing the encoding of a
> filename.  I have not yet installed it because I have not yet found an
> answer to this question.

The function works for me.

> Should the recoding of filename regarded as a kind of file name
> changing?  If so, perhaps we should make the function rename-file to
> handle also recoding.   In that case, how should we tell rename-file
> to actually recode filename encoding?

If the user calls rename-file it should be up to him to specify a proper
file name.  In other words I vote to provide a separate function like
convert-file-name to do the right thing; by default convert-file-name
should try to convert the file name to the user's locale.

Hope this helps.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.gnu.franken.de/ke/                            |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-19 16:15   ` Karl Eichwalder
@ 2003-03-19 23:52     ` Kenichi Handa
  2003-03-20 17:32       ` Karl Eichwalder
  2003-03-21 19:06       ` Richard Stallman
  0 siblings, 2 replies; 23+ messages in thread
From: Kenichi Handa @ 2003-03-19 23:52 UTC (permalink / raw)
  Cc: emacs-devel

In article <shn0jr2uz5.fsf@tux.gnu.franken.de>, Karl Eichwalder <keichwa@gmx.net> writes:
> I think there is still a subtle bug left; in a ISO-8859-1 locale do:

> touch "Maler Müller"

> Then call emacs:

> LANG=de_DE.UTF-8 emacs -q --no-site --no-splash .

> In dired you can see:

>   -rw-r--r--  1 ke  users   0 2003-03-19 16:10 Maler M\374ller\374rle
>                                      good part ^^^^^^^^^^^^^^^|||||||
>                               trailing garbage ------------>>>^^^^^^^

Ah!  That's a bug of utf-8 decoder.  I've just installed the
attached fix.

>>  Should the recoding of filename regarded as a kind of file name
>>  changing?  If so, perhaps we should make the function rename-file to
>>  handle also recoding.   In that case, how should we tell rename-file
>>  to actually recode filename encoding?

> If the user calls rename-file it should be up to him to specify a proper
> file name.  In other words I vote to provide a separate function like
> convert-file-name to do the right thing; by default convert-file-name
> should try to convert the file name to the user's locale.

As we already have the function convert-standard-filename, I
think the name convert-file-name is confusing.  So, I prefer
the name recode-file-name if we'll have a separate function.

---
Ken'ichi HANDA
handa@m17n.org

*** utf-8.el.~1.26.~	Tue Mar 18 09:09:15 2003
--- utf-8.el	Thu Mar 20 08:22:42 2003
***************
*** 479,497 ****
  			 (write-multibyte-character r5 r3))
  		     (write-multibyte-character r6 r3))
  		   (if (r0 >= #xf8)	; 5- or 6-byte encoding
! 		       ((read r1)
! 			(if (r1 < #xa0)
! 			    (if (r1 < #x80) ; invalid byte
! 				(write r1)
! 			      (write-multibyte-character r5 r1))
! 			  (write-multibyte-character r6 r1))
  			(if (r0 >= #xfc) ; 6-byte
! 			    ((read r1)
! 			     (if (r1 < #xa0)
! 				 (if (r1 < #x80) ; invalid byte
! 				     (write r1)
! 				   (write-multibyte-character r5 r1))
! 			       (write-multibyte-character r6 r1)))))))
  		;; else invalid byte >= #xfe
  		(write-multibyte-character r6 r0))))))
        (repeat)))
--- 479,499 ----
  			 (write-multibyte-character r5 r3))
  		     (write-multibyte-character r6 r3))
  		   (if (r0 >= #xf8)	; 5- or 6-byte encoding
! 		       ((r0 = -1)
! 			(read r0)
! 			(if (r0 < #xa0)
! 			    (if (r0 < #x80) ; invalid byte
! 				(write r0)
! 			      (write-multibyte-character r5 r0))
! 			  (write-multibyte-character r6 r0))
  			(if (r0 >= #xfc) ; 6-byte
! 			    ((r0 = -1)
! 			     (read r0)
! 			     (if (r0 < #xa0)
! 				 (if (r0 < #x80) ; invalid byte
! 				     (write r0)
! 				   (write-multibyte-character r5 r0))
! 			       (write-multibyte-character r6 r0)))))))
  		;; else invalid byte >= #xfe
  		(write-multibyte-character r6 r0))))))
        (repeat)))

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-19 13:33 ` Kenichi Handa
  2003-03-19 16:15   ` Karl Eichwalder
@ 2003-03-20  8:46   ` Richard Stallman
  2003-03-20  9:11     ` Kenichi Handa
  2003-04-01 21:17   ` etags and UTF-8 encoded file names (Re: ISO-8859-1 encoded file names and UTF-8) Karl Eichwalder
  2 siblings, 1 reply; 23+ messages in thread
From: Richard Stallman @ 2003-03-20  8:46 UTC (permalink / raw)
  Cc: emacs-devel

    Should the recoding of filename regarded as a kind of file name
    changing?

I don't understand the question.  What would it mean to say this is
true?

      If so, perhaps we should make the function rename-file to
    handle also recoding.

I don't think so.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-20  8:46   ` Richard Stallman
@ 2003-03-20  9:11     ` Kenichi Handa
  2003-03-23  2:52       ` Richard Stallman
  0 siblings, 1 reply; 23+ messages in thread
From: Kenichi Handa @ 2003-03-20  9:11 UTC (permalink / raw)
  Cc: emacs-devel

In article <E18vvgp-0005Bq-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     Should the recoding of filename regarded as a kind of file name
>     changing?

> I don't understand the question.  What would it mean to say this is
> true?

Changing the encoding of filename means that chaning the
byte sequence of the name.  For OS, that is just chaning the
filename.  And, my sample code recode-file actually calls
rename-file internally.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-19 23:52     ` Kenichi Handa
@ 2003-03-20 17:32       ` Karl Eichwalder
  2003-03-21  6:01         ` Kenichi Handa
  2003-03-21 19:06       ` Richard Stallman
  1 sibling, 1 reply; 23+ messages in thread
From: Karl Eichwalder @ 2003-03-20 17:32 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> Ah!  That's a bug of utf-8 decoder.  I've just installed the
> attached fix.

The problem still occurs, sorry.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.gnu.franken.de/ke/                            |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-20 17:32       ` Karl Eichwalder
@ 2003-03-21  6:01         ` Kenichi Handa
  2003-03-21 19:53           ` Karl Eichwalder
  0 siblings, 1 reply; 23+ messages in thread
From: Kenichi Handa @ 2003-03-21  6:01 UTC (permalink / raw)
  Cc: emacs-devel

In article <shwuiu9c5r.fsf@tux.gnu.franken.de>, Karl Eichwalder <keichwa@gmx.net> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>  Ah!  That's a bug of utf-8 decoder.  I've just installed the
>>  attached fix.

> The problem still occurs, sorry.

??? Have you re-built emacs?  Just byte-compiling utf-8.el
is not enough because it is preloaded.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-19 23:52     ` Kenichi Handa
  2003-03-20 17:32       ` Karl Eichwalder
@ 2003-03-21 19:06       ` Richard Stallman
  1 sibling, 0 replies; 23+ messages in thread
From: Richard Stallman @ 2003-03-21 19:06 UTC (permalink / raw)
  Cc: emacs-devel

    As we already have the function convert-standard-filename, I
    think the name convert-file-name is confusing.  So, I prefer
    the name recode-file-name if we'll have a separate function.

"recode" seems like a good term for "decode and encode".

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-21  6:01         ` Kenichi Handa
@ 2003-03-21 19:53           ` Karl Eichwalder
  0 siblings, 0 replies; 23+ messages in thread
From: Karl Eichwalder @ 2003-03-21 19:53 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> ??? Have you re-built emacs?

Sorry, something went wrong by my site.  Yes, your patch fixed the
problem; thanks a lot!

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.gnu.franken.de/ke/                            |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-20  9:11     ` Kenichi Handa
@ 2003-03-23  2:52       ` Richard Stallman
  2003-03-24  0:28         ` Kenichi Handa
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Stallman @ 2003-03-23  2:52 UTC (permalink / raw)
  Cc: emacs-devel

    >     Should the recoding of filename regarded as a kind of file name
    >     changing?

    > I don't understand the question.  What would it mean to say this is
    > true?

    Changing the encoding of filename means that chaning the
    byte sequence of the name.  For OS, that is just chaning the
    filename.  And, my sample code recode-file actually calls
    rename-file internally.

Ok, I guess this is a special case of changing the file name.

Why do you ask?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-23  2:52       ` Richard Stallman
@ 2003-03-24  0:28         ` Kenichi Handa
  2003-03-24 19:27           ` Richard Stallman
  0 siblings, 1 reply; 23+ messages in thread
From: Kenichi Handa @ 2003-03-24  0:28 UTC (permalink / raw)
  Cc: emacs-devel

In article <E18wvbC-0000a0-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>>  I don't understand the question.  What would it mean to say this is
>>  true?

>     Changing the encoding of filename means that chaning the
>     byte sequence of the name.  For OS, that is just chaning the
>     filename.  And, my sample code recode-file actually calls
>     rename-file internally.

> Ok, I guess this is a special case of changing the file name.

> Why do you ask?

The above is an OS's point of view.  Users may not consider
that changing filename encoding is a special case of
rename-file.  Even for you, the explanation was necessary.

So, I wanted to hear other people's opinions about which is
better; enhancing the existing rename-file directly (perhaps
by making use of the prefix argument), or making a new
command recode-file-name.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-24  0:28         ` Kenichi Handa
@ 2003-03-24 19:27           ` Richard Stallman
  2003-03-26  4:47             ` Kenichi Handa
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Stallman @ 2003-03-24 19:27 UTC (permalink / raw)
  Cc: emacs-devel

    So, I wanted to hear other people's opinions about which is
    better; enhancing the existing rename-file directly (perhaps
    by making use of the prefix argument), or making a new
    command recode-file-name.

It should be a new command.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ISO-8859-1  encoded file names and UTF-8
  2003-03-24 19:27           ` Richard Stallman
@ 2003-03-26  4:47             ` Kenichi Handa
  0 siblings, 0 replies; 23+ messages in thread
From: Kenichi Handa @ 2003-03-26  4:47 UTC (permalink / raw)
  Cc: emacs-devel

In article <E18xXbY-0005rn-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     So, I wanted to hear other people's opinions about which is
>     better; enhancing the existing rename-file directly (perhaps
>     by making use of the prefix argument), or making a new
>     command recode-file-name.

> It should be a new command.

I've just installed the attached change.

---
Ken'ichi HANDA
handa@m17n.org
2003-03-26  Kenichi Handa  <handa@etlken2>

	* files.el (recode-file-name): New function.

Index: files.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/files.el,v
retrieving revision 1.643
retrieving revision 1.644
diff -u -c -r1.643 -r1.644
cvs server: conflicting specifications of output style
*** files.el	14 Mar 2003 22:36:57 -0000	1.643
--- files.el	26 Mar 2003 04:42:29 -0000	1.644
***************
*** 766,771 ****
--- 766,823 ----
  	(setq newname (expand-file-name tem (file-name-directory newname)))
  	(setq count (1- count))))
      newname))
+ 
+ (defun recode-file-name (file coding new-coding &optional ok-if-already-exists)
+   "Change the encoding of FILE's name from CODING to NEW-CODING.
+ The value is a new name of FILE.
+ Signals a `file-already-exists' error if a file of the new name
+ already exists unless optional third argument OK-IF-ALREADY-EXISTS
+ is non-nil.  A number as third arg means request confirmation if
+ the new name already exists.  This is what happens in interactive
+ use with M-x."
+   (interactive
+    (let ((default-coding (or file-name-coding-system
+ 			     default-file-name-coding-system))
+ 	 (filename (read-file-name "Recode filename: " nil nil t))
+ 	 from-coding to-coding)
+      (if (and default-coding
+ 	      ;; We provide the default coding only when it seems that
+ 	      ;; the filename is correctly decoded by the default
+ 	      ;; coding.
+ 	      (let ((charsets (find-charset-string filename)))
+ 		(and (not (memq 'eight-bit-control charsets))
+ 		     (not (memq 'eight-bit-graphic charsets)))))
+ 	 (setq from-coding (read-coding-system
+ 			    (format "Recode filename %s from (default %s): "
+ 				    filename default-coding)
+ 			    default-coding))
+        (setq from-coding (read-coding-system
+ 			  (format "Recode filename %s from: " filename))))
+      
+      ;; We provide the default coding only when a user is going to
+      ;; change the encoding not from the default coding.
+      (if (eq from-coding default-coding)
+ 	 (setq to-coding (read-coding-system
+ 			  (format "Recode filename %s from %s to: "
+ 				  filename from-coding)))
+        (setq to-coding (read-coding-system
+ 			(format "Recode filename %s from %s to (default %s): "
+ 				filename from-coding default-coding)
+ 			default-coding)))
+      (list filename from-coding to-coding)))
+ 
+   (let* ((default-coding (or file-name-coding-system
+ 			     default-file-name-coding-system))
+ 	 ;; FILE should have been decoded by DEFAULT-CODING.
+ 	 (encoded (encode-coding-string file default-coding))
+ 	 (newname (decode-coding-string encoded coding))
+ 	 (new-encoded (encode-coding-string newname new-coding))
+ 	 ;; Suppress further encoding.
+ 	 (file-name-coding-system nil)
+ 	 (default-file-name-coding-system nil)
+ 	 (locale-coding-system nil))
+     (rename-file encoded new-encoded ok-if-already-exists)
+     newname))
  \f
  (defun switch-to-buffer-other-window (buffer &optional norecord)
    "Select buffer BUFFER in another window.
Index: NEWS
===================================================================
RCS file: /cvsroot/emacs/emacs/etc/NEWS,v
retrieving revision 1.799
retrieving revision 1.800
diff -u -c -r1.799 -r1.800
cvs server: conflicting specifications of output style
*** NEWS	8 Mar 2003 02:13:59 -0000	1.799
--- NEWS	26 Mar 2003 04:45:00 -0000	1.800
***************
*** 138,143 ****
--- 138,146 ----
  ** The new command `revert-buffer-with-coding-system' (C-x RET r)
  revisits the current file using a coding system that you specify.
  
+ ** The new command `recode-file-name' changes the encoding of the name
+ of a file.
+ 
  ---
  ** `ps-print' can now print characters from the mule-unicode charsets.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* etags and UTF-8 encoded file names (Re: ISO-8859-1  encoded file names and UTF-8)
  2003-03-19 13:33 ` Kenichi Handa
  2003-03-19 16:15   ` Karl Eichwalder
  2003-03-20  8:46   ` Richard Stallman
@ 2003-04-01 21:17   ` Karl Eichwalder
  2003-04-02  1:34     ` Kenichi Handa
  2 siblings, 1 reply; 23+ messages in thread
From: Karl Eichwalder @ 2003-04-01 21:17 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

>> UTF-8 Emacs sometimes fails to manage ISO-8859-1 file names.  I do not
>> know how you can rename or convert these file name from ISO-8859-1 to
>> UTF-8 encoded names using Emacs.
>
> Hmmm, I think it's a completely overlooked but important feature.

Now the next one: `tags-query-replace' does not work properly when file
names are UTF-8 encoded.  First run `etags *' on the files and then
call `tags-query-replace'.

If required I can send a more detailed report.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.gnu.franken.de/ke/                            |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: etags and UTF-8 encoded file names (Re: ISO-8859-1  encoded file names and UTF-8)
  2003-04-01 21:17   ` etags and UTF-8 encoded file names (Re: ISO-8859-1 encoded file names and UTF-8) Karl Eichwalder
@ 2003-04-02  1:34     ` Kenichi Handa
  2003-04-02 19:26       ` Richard Stallman
  0 siblings, 1 reply; 23+ messages in thread
From: Kenichi Handa @ 2003-04-02  1:34 UTC (permalink / raw)
  Cc: emacs-devel

In article <shpto57wa4.fsf_-_@tux.gnu.franken.de>, Karl Eichwalder <keichwa@gmx.net> writes:
> Now the next one: `tags-query-replace' does not work properly when file
> names are UTF-8 encoded.  First run `etags *' on the files and then
> call `tags-query-replace'.

This is the same type of bug (but more difficult) as what I
posted to emacs-devel by the subjest "bad interaction with
C-x RET c and vc-cvs-registered".

A tag file contains file names plus parts of source code.
The former must be decoded by file-name-coding-system, but
the latter must be decoded by the coding system of each
file.  It's very hard to decided a coding system for the
latter without actually reading the file.

Perhaps, a tag file must be read as raw-text (thus in a
unibyte buffer), and if one gives a non-ASCII TAGNAME to
`find-tag', it must be encoded by the
buffer-file-coding-system of the current buffer.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: etags and UTF-8 encoded file names (Re: ISO-8859-1  encoded file names and UTF-8)
  2003-04-02  1:34     ` Kenichi Handa
@ 2003-04-02 19:26       ` Richard Stallman
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Stallman @ 2003-04-02 19:26 UTC (permalink / raw)
  Cc: emacs-devel

    Perhaps, a tag file must be read as raw-text (thus in a
    unibyte buffer), and if one gives a non-ASCII TAGNAME to
    `find-tag', it must be encoded by the
    buffer-file-coding-system of the current buffer.

That seems like a good approach.  Would someone like to implement it?

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-04-02 19:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-08  6:15 ISO-8859-1 encoded file names and UTF-8 Karl Eichwalder
2003-03-08  9:16 ` Eli Zaretskii
2003-03-08 10:05   ` Karl Eichwalder
2003-03-08 17:06     ` Eli Zaretskii
2003-03-08 18:25       ` Karl Eichwalder
2003-03-08 22:35         ` Eli Zaretskii
2003-03-09  4:38           ` Karl Eichwalder
2003-03-19 13:33 ` Kenichi Handa
2003-03-19 16:15   ` Karl Eichwalder
2003-03-19 23:52     ` Kenichi Handa
2003-03-20 17:32       ` Karl Eichwalder
2003-03-21  6:01         ` Kenichi Handa
2003-03-21 19:53           ` Karl Eichwalder
2003-03-21 19:06       ` Richard Stallman
2003-03-20  8:46   ` Richard Stallman
2003-03-20  9:11     ` Kenichi Handa
2003-03-23  2:52       ` Richard Stallman
2003-03-24  0:28         ` Kenichi Handa
2003-03-24 19:27           ` Richard Stallman
2003-03-26  4:47             ` Kenichi Handa
2003-04-01 21:17   ` etags and UTF-8 encoded file names (Re: ISO-8859-1 encoded file names and UTF-8) Karl Eichwalder
2003-04-02  1:34     ` Kenichi Handa
2003-04-02 19:26       ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).