From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Eli Zaretskii" Newsgroups: gmane.emacs.devel Subject: Re: ISO-8859-1 encoded file names and UTF-8 Date: Sat, 08 Mar 2003 19:06:54 +0200 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <8582-Sat08Mar2003190654+0200-eliz@elta.co.il> References: <7704-Sat08Mar2003111630+0200-eliz@elta.co.il> Reply-To: Eli Zaretskii NNTP-Posting-Host: main.gmane.org X-Trace: main.gmane.org 1047143530 18043 80.91.224.249 (8 Mar 2003 17:12:10 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sat, 8 Mar 2003 17:12:10 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Sat Mar 08 18:12:09 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18rhrl-0004gt-00 for ; Sat, 08 Mar 2003 18:12:09 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18riD8-0006Zy-00 for ; Sat, 08 Mar 2003 18:34:14 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18rhsD-0001LP-06 for emacs-devel@quimby.gnus.org; Sat, 08 Mar 2003 12:12:37 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18rhri-0001FK-00 for emacs-devel@gnu.org; Sat, 08 Mar 2003 12:12:06 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18rhrf-0001Ck-00 for emacs-devel@gnu.org; Sat, 08 Mar 2003 12:12:04 -0500 Original-Received: from gandalf.inter.net.il ([192.114.186.22]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18rhrN-000140-00 for emacs-devel@gnu.org; Sat, 08 Mar 2003 12:11:45 -0500 Original-Received: from zaretsky ([80.230.235.19]) by gandalf.inter.net.il (Mirapoint Messaging Server MOS 3.2.2-GA) with ESMTP id AIB41581; Sat, 8 Mar 2003 19:11:39 +0200 (IST) Original-To: keichwa@gmx.net X-Mailer: emacs 21.3.50 (via feedmail 8 I) and Blat ver 1.8.9 In-reply-to: (message from Karl Eichwalder on Sat, 08 Mar 2003 11:05:41 +0100) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:12179 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:12179 > From: Karl Eichwalder > Date: Sat, 08 Mar 2003 11:05:41 +0100 > > file-name-coding-system's value is nil > > *Coding system for encoding file names. > If it is nil, `default-file-name-coding-system' (which see) is used. > > I did not set it. And what is the value of `default-file-name-coding-system'? If it's anything but `utf-8', please try setting `file-name-coding-system' to `utf-8' and see if that helps. > Some days before I observed that Emacs "auto-corrects" broken .po > files; the broken files are declared as UTF-8 and containing those > codes and additionally some iso-8859-1 got mixed in by accident. Sorry, I'm not sure I understand the last part of this sentence correctly; if I didn't, what's below might not make any sense. IIUC, Emacs sometimes decides that *.po files which contain characters from different encodings are encoded in UTF-8. If that's so, I think it's because you made utf-8 your preferred encoding (IIRC, that's what Emacs does when it sees that your locale uses UTF-8). > Emacs > displays those wrong characters "correctly" -- this is somehow > "user-friendly" but nervertheless highly confusing. What does Emacs say if you go to one of those ``wrong'' characters and type "C-u C-x ="? Are they treated as eight-bit-* characters? If so, Emacs displays them with the proper glyphs because your fonts are set in a way that fits Latin-1. > At least please add a special background to those auto-corrected > characters. This would contradict the whole purpose of a multilingual Emacs: it is meant to seamlessly display characters from different character sets without any special effects. How can Emacs know that in this particular case, you want it to display different character sets differently? I believe that if such a feature is added, it must be driven by user-level settings. For example, users could define a list of character sets or codepoints which they don't expect to see in their buffers, and Emacs will then flag characters from those sets with some visual cue. It's even possible that you can do that yourself right now by using hi-lock.el or something similar, since IIRC regular expressions can be used to express character categories.