From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Peter Dyballa Newsgroups: gmane.emacs.help Subject: Re: UTF-8 in path / filename Date: Sun, 27 Aug 2006 15:12:15 +0200 Message-ID: <25A143BA-4E99-4FF9-B6C0-A8F42146D0C9@Web.DE> References: <7D07BEAB-2279-48C5-BB9A-3FF3A15D0FED@Web.DE> <20060826000627.b8b44e95.gregory.schmitt@free.fr> <87odu8ct0a.fsf@catnip.gol.com> <0C15C504-B711-403E-B8D1-F03234C453E3@Web.DE> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=UTF-8; delsp=yes; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1156684364 13589 80.91.229.2 (27 Aug 2006 13:12:44 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 27 Aug 2006 13:12:44 +0000 (UTC) Cc: help-gnu-emacs@gnu.org, Miles Bader Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sun Aug 27 15:12:43 2006 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GHKRI-0001PG-Ta for geh-help-gnu-emacs@m.gmane.org; Sun, 27 Aug 2006 15:12:37 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GHKRI-00011A-Bj for geh-help-gnu-emacs@m.gmane.org; Sun, 27 Aug 2006 09:12:36 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GHKR5-0000yn-6M for help-gnu-emacs@gnu.org; Sun, 27 Aug 2006 09:12:23 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GHKR2-0000yb-LV for help-gnu-emacs@gnu.org; Sun, 27 Aug 2006 09:12:21 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GHKR2-0000yY-Ep for help-gnu-emacs@gnu.org; Sun, 27 Aug 2006 09:12:20 -0400 Original-Received: from [217.72.192.221] (helo=fmmailgate01.web.de) by monty-python.gnu.org with esmtp (Exim 4.52) id 1GHKZk-0008Mu-Pp; Sun, 27 Aug 2006 09:21:21 -0400 Original-Received: from smtp07.web.de (fmsmtp07.dlan.cinetic.de [172.20.5.215]) by fmmailgate01.web.de (Postfix) with ESMTP id 69E3417D1D48; Sun, 27 Aug 2006 15:12:18 +0200 (CEST) Original-Received: from [84.245.179.30] (helo=[192.168.1.2]) by smtp07.web.de with asmtp (TLSv1:RC4-SHA:128) (WEB.DE 4.107 #114) id 1GHKR0-00050i-00; Sun, 27 Aug 2006 15:12:18 +0200 In-Reply-To: X-Image-Url: http://homepage.mac.com/sparifankal/.cv/thumbs/me.thumbnail Original-To: James Cloos X-Mailer: Apple Mail (2.752.2) X-Sender: Peter_Dyballa@web.de X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:36964 Archived-At: Am 27.08.2006 um 00:13 schrieb James Cloos: > Peter> Files with UTF-8 characters in them are shown in dired (has -=20= > u: in > Peter> mode-line, i.e. uses UTF-8) =C3=A0 la . Some =20= > UTF-8 > Peter> characters like =C3=9F or =C3=9B show up as themselves. > > Doesn't apple by default use NFD (Normalizaion Form Decomposed) for > filenames? That would explain the sequences. Yes, that's the correct term for the way file names are recorded in =20 HFS+. The font file, LucidaTypewriterRegular.ttf, has no combining =20 diacritical marks defined (only some modifiers), so these empty boxes =20= are displayed instead. > > Can you get at the actual octet-sequence of the filenames? Do you know a tool that can do that? I can only think of a C =20 programme that reads the inode and than outputs the octets. Doing the =20= same as Harald did I get in Terminal different output (because UTF-8 =20 characters are substituted with question marks, for example: pete 140 /\ l -1 | grep .txt | grep ' ' | grep -v Mac RGB a=CC=88o=CC=88u=CC=88=C3=A6=C3=86U=CC=88O=CC=88A=CC=88.txt pete 141 /\ l -1 | grep .txt | grep ' ' | grep -v Mac | od -t a R G B sp a ? 88 o ? 88 u ? 88 ? ? = ? 86 U ? 88 O ? 88 A ? 88 . t x t nl In Emacsen' shells I get: R G B sp a \314 88 o \314 88 u \314 88 = =20 \303 \246 \303 86 U \314 88 O \314 88 A \314 88 . t x = t nl The file name a=CC=81U=CC=82i=CC=88=C7=93a=CC=80.txt is interpreted as: a \314 81 U \314 82 i \314 88 U \314 8c = a =20 \314 80 . t x t nl -- Greetings Pete "Isn't vi that text editor with two modes... one that beeps and one that corrupts your file?" -- Dan Jacobson, on comp.os.linux.advocacy