From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Why does dired go through extra efforts to avoid unibyte names Date: Fri, 29 Dec 2017 09:34:53 -0500 Message-ID: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1514558564 10636 195.159.176.226 (29 Dec 2017 14:42:44 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 29 Dec 2017 14:42:44 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Dec 29 15:42:39 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eUvrp-0002FM-DT for ged-emacs-devel@m.gmane.org; Fri, 29 Dec 2017 15:42:37 +0100 Original-Received: from localhost ([::1]:43015 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eUvto-0002xt-5N for ged-emacs-devel@m.gmane.org; Fri, 29 Dec 2017 09:44:40 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40758) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eUvkQ-0003iZ-RY for emacs-devel@gnu.org; Fri, 29 Dec 2017 09:34:59 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eUvkN-0001D3-Lr for emacs-devel@gnu.org; Fri, 29 Dec 2017 09:34:58 -0500 Original-Received: from pmta31.teksavvy.com ([76.10.157.38]:16820) by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71) (envelope-from ) id 1eUvkN-0001CB-Fe for emacs-devel@gnu.org; Fri, 29 Dec 2017 09:34:55 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2G3EQCoUUZa/2Obs2tcg2+BWieJIoYHj?= =?us-ascii?q?iQBgjMBSZhBChOFIgQChDBEFAEBAQEBAQEBAQNoKIVSLyMvICYYDSQTii6qSjq?= =?us-ascii?q?KNTKEDIVRgniDZodqBYd0iiqBFJAagXuNaYdoiXE1h0CXA4E8NiOBTzIaCDCCZ?= =?us-ascii?q?oR1IzeJFAEBAQ?= X-IPAS-Result: =?us-ascii?q?A2G3EQCoUUZa/2Obs2tcg2+BWieJIoYHjiQBgjMBSZhBChO?= =?us-ascii?q?FIgQChDBEFAEBAQEBAQEBAQNoKIVSLyMvICYYDSQTii6qSjqKNTKEDIVRgniDZ?= =?us-ascii?q?odqBYd0iiqBFJAagXuNaYdoiXE1h0CXA4E8NiOBTzIaCDCCZoR1IzeJFAEBAQ?= X-IronPort-AV: E=Sophos;i="5.45,475,1508817600"; d="scan'208";a="15760304" Original-Received: from 107-179-155-99.cpe.teksavvy.com (HELO ceviche.home) ([107.179.155.99]) by smtp.teksavvy.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Dec 2017 09:34:53 -0500 Original-Received: by ceviche.home (Postfix, from userid 20848) id 6D7DF6638C; Fri, 29 Dec 2017 09:34:53 -0500 (EST) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 76.10.157.38 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:221454 Archived-At: I bumped into the following code in dired-get-filename: ;; The above `read' will return a unibyte string if FILE ;; contains eight-bit-control/graphic characters. (if (and enable-multibyte-characters (not (multibyte-string-p file))) (setq file (string-to-multibyte file))) and I'm wondering why we don't want a unibyte string here. `vc-region-history` told me this comes from the commit appended below, which seems to indicate that we're worried about a subsequent encoding, but AFAIK unibyte file names are not (re)encoded, and passing them through string-to-multibyte would actually make things worse in this respect (since it might cause the kind of (re)encoding this is supposedly trying to avoid). What am I missing? Stefan commit 038b550196d92b9844a4efecf1c2ded0f920e957 Author: Kenichi Handa Date: Wed Mar 19 11:58:25 2003 +0000 * dired.el (dired-get-filename): Pay attention to the case that `read' returns a unibyte string. Don't encode the file name by buffer-file-coding-system. diff --git a/lisp/dired.el b/lisp/dired.el --- a/lisp/dired.el +++ b/lisp/dired.el @@ -1455,11 +1455,16 @@ ;; Using read to unquote is much faster than substituting ;; \007 (4 chars) -> ^G (1 char) etc. in a lisp loop. (setq file (read (concat "\"" ;; Some ls -b don't escape quotes, argh! ;; This is not needed for GNU ls, though. (or (dired-string-replace-match "\\([^\\]\\|\\`\\)\"" file "\\1\\\\\"" nil t) file) - "\""))))) + "\""))) + ;; The above `read' will return a unibyte string if FILE + ;; contains eight-bit-control/graphic characters. + (if (and enable-multibyte-characters + (not (multibyte-string-p file))) + (setq file (string-to-multibyte file)))))