From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Why does dired go through extra efforts to avoid unibyte names Date: Wed, 03 Jan 2018 17:10:35 +0200 Message-ID: <83tvw3asgk.fsf@gnu.org> References: <83lghlfinq.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1514992172 25573 195.159.176.226 (3 Jan 2018 15:09:32 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 3 Jan 2018 15:09:32 +0000 (UTC) Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jan 03 16:09:27 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eWkfW-0006L2-Oi for ged-emacs-devel@m.gmane.org; Wed, 03 Jan 2018 16:09:26 +0100 Original-Received: from localhost ([::1]:55172 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eWkhU-0003HU-23 for ged-emacs-devel@m.gmane.org; Wed, 03 Jan 2018 10:11:28 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57792) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eWkgh-0003Gu-Tc for emacs-devel@gnu.org; Wed, 03 Jan 2018 10:10:40 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eWkge-000563-PG for emacs-devel@gnu.org; Wed, 03 Jan 2018 10:10:39 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:59906) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eWkge-00055t-Lj; Wed, 03 Jan 2018 10:10:36 -0500 Original-Received: from [176.228.60.248] (port=2142 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1eWkgd-0007id-Tp; Wed, 03 Jan 2018 10:10:36 -0500 In-reply-to: (message from Stefan Monnier on Tue, 02 Jan 2018 23:14:20 -0500) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:221549 Archived-At: > From: Stefan Monnier > Date: Tue, 02 Jan 2018 23:14:20 -0500 > > >> I bumped into the following code in dired-get-filename: > >> > >> ;; The above `read' will return a unibyte string if FILE > >> ;; contains eight-bit-control/graphic characters. > >> (if (and enable-multibyte-characters > >> (not (multibyte-string-p file))) > >> (setq file (string-to-multibyte file))) > >> > >> and I'm wondering why we don't want a unibyte string here. > >> `vc-region-history` told me this comes from the commit appended below, > >> which seems to indicate that we're worried about a subsequent encoding, > >> but AFAIK unibyte file names are not (re)encoded, and passing them > >> through string-to-multibyte would actually make things worse in this > >> respect (since it might cause the kind of (re)encoding this is > >> supposedly trying to avoid). > >> > >> What am I missing? > > > > Why does it matter whether eight-bit-* characters are encoded one more > > or one less time? > > That's part of the question, indeed. The question was meant to be rhetorical ;-) Eight-bit-* characters are not in general modified by encoding them, so you could encode them any number of times and still get the same bytes as result. > > As for the reason for using string-to-multibyte: maybe it's because we > > use concat further down in the function, which will determine whether > > the result will be unibyte or multibyte according to its own ideas of > > what's TRT? > > But `concat` will do a string-to-multibyte for us, if needed Not if the other concatenated parts are ASCII (which tend to be unibyte strings). > This said, when that code was written, maybe `concat` used > string-make-multibyte internally instead, so this call to > string-to-multibyte might have been added to avoid using > string-make-multibyte inside `concat`? Could be. I tried to look for relevant discussions around the time of the commit, but couldn't find anything that would explain the reason. > It would be good to have a concrete case that needed the above code, to > see if the problem still exists. Yep.