From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Multibyte and unibyte file names Date: Wed, 23 Jan 2013 10:08:25 -0800 Message-ID: <51002719.3080805@cs.ucla.edu> References: <83ehhbn680.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1358964531 17338 80.91.229.3 (23 Jan 2013 18:08:51 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 23 Jan 2013 18:08:51 +0000 (UTC) Cc: Kazuhiro Ito , Michael Albinus , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jan 23 19:09:08 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Ty4l1-0000A3-Ck for ged-emacs-devel@m.gmane.org; Wed, 23 Jan 2013 19:09:07 +0100 Original-Received: from localhost ([::1]:43495 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ty4kj-00022c-Lu for ged-emacs-devel@m.gmane.org; Wed, 23 Jan 2013 13:08:49 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:38919) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ty4kg-00022U-Kh for emacs-devel@gnu.org; Wed, 23 Jan 2013 13:08:47 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ty4kc-0008SG-UM for emacs-devel@gnu.org; Wed, 23 Jan 2013 13:08:46 -0500 Original-Received: from smtp.cs.ucla.edu ([131.179.128.62]:33669) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ty4kb-0008Ly-8N; Wed, 23 Jan 2013 13:08:41 -0500 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id C9512A60001; Wed, 23 Jan 2013 10:08:33 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Original-Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oasAYVWEvGfo; Wed, 23 Jan 2013 10:08:33 -0800 (PST) Original-Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 487E539E8106; Wed, 23 Jan 2013 10:08:33 -0800 (PST) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 In-Reply-To: <83ehhbn680.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 131.179.128.62 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:156602 Archived-At: On 01/23/13 09:45, Eli Zaretskii wrote: > if (srclen > 1 > && IS_DIRECTORY_SEP (dst[srclen - 1])) > { > dst[srclen - 1] = 0; > srclen--; > } > > If dst[] is an encoded string that uses a multibyte encoding, it is > wrong to look at just the last byte of the string, because it could be > a trailing byte of some multibyte sequence, right? If memory serves, the answer to that question is different for GNU / POSIX / etc (GNUish) systems than for MS-Windows systems. On GNUish systems, the kernel doesn't know about encodings, so the above code is correct for the file system even if it produces a byte string that is not properly encoded for the file name coding system. On MS-Windows systems, as I understand it, the operating system is cognizant of which file name encoding you're using, so the above is indeed an error. In practice nobody in the GNUish world uses encodings that are unsafe for '/', so to some extent this is just a theoretical issue in the GNUish world -- it just doesn't come up. Unfortunately I don't understand the ins and outs of the MSish side, or of the Tramp side, so I can't speak to how that should work.