From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Mathias Dahl Newsgroups: gmane.emacs.help Subject: Re: opening files with unicode characters in the file name on windows Date: 04 Aug 2004 16:27:05 +0200 Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Message-ID: References: <410E6877.7010001@yahoo.com> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1091629913 17770 80.91.224.253 (4 Aug 2004 14:31:53 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 4 Aug 2004 14:31:53 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Aug 04 16:31:38 2004 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BsMnp-0007zH-00 for ; Wed, 04 Aug 2004 16:31:37 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BsMrJ-0007Yc-Nz for geh-help-gnu-emacs@m.gmane.org; Wed, 04 Aug 2004 10:35:13 -0400 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!fu-berlin.de!uni-berlin.de!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 113 Original-X-Trace: news.uni-berlin.de bfLxQW1Mzm5H5H9ShSYICgD8aPFi2tlJ2Flp80QxsZelbrPIaI User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50 Original-Xref: shelby.stanford.edu gnu.emacs.help:124649 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.help:19982 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:19982 "Eli Zaretskii" writes: > Your original message said ``file names with Unicode characters''. > Can you tell what characters are those, and why do you think they > are encoded in some Unicode-related encoding, like UTF-16? Can you > look at the file's name as recorded in the directory with some > low-level tool that actually shows the byte values that encode the > file's name? I have done some investigation and I am pretty sure UTF-16 is the encoding used. The following VBScript program (sorry for pasting non-emacs related stuff here) loops through all files in a folder and if the file names contain character values > 255 displays a list with unicode code point values: ' -- TestUnicoceFileNames.vbs --- Option Explicit ' --------- Main program starts Dim sFileName Dim oFSO Dim oFile Set oFSO = CreateObject("Scripting.FileSystemObject") For Each oFile In oFSO.GetFolder("c:\document\my docs").Files checkUnicodeFileName(oFile.Name) Next Set oFSO = Nothing ' --------- Main program ends Private Sub checkUnicodeFileName(fileName) Dim i Dim c Dim n For i = 1 to Len(fileName) c = Mid(fileName, i, 1) n = AscW(c) If n > 255 Then MsgBox "File name contains unicode characters: " & _ Chr(10) & Chr(10) & _ "File name: " & fileName & _ Chr(10) & Chr(10) & _ "Characters and their unicode code points:" & _ Chr(10) & Chr(10) & _ getStringInfo(fileName) Exit Sub End If Next End Sub Private Function getStringInfo(s) Dim i Dim n Dim c Dim h Dim result result = "Char" & Chr(9) & "U+NNNN" & Chr(10) & Chr(10) For i = 1 to Len(s) c = Mid(s, i, 1) n = AscW(c) h = Hex(n) result = result & c & Chr(9) & Right("0000" & h, 4) & Chr(10) Next getStringInfo = result End Function ' -- TestUnicoceFileNames.vbs end here--- The output looks like this (you do not see the actual characters which I do if I use a "unicode font" for message boxes): File name contains unicode characters: File name: pravda_правда.txt Characters and their unicode code points: Char U+NNNN p 0070 r 0072 a 0061 v 0076 d 0064 a 0061 _ 005F п 043F р 0440 а 0430 в 0432 д 0434 а 0430 . 002E t 0074 x 0078 t 0074 /Mathias