From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Date: Sun, 10 Jul 2022 17:32:17 +0300 Message-ID: <83bktx11ji.fsf@gnu.org> References: <83y1x2177x.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11665"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 56469@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Jul 10 16:33:10 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oAXzh-0002tr-FU for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 10 Jul 2022 16:33:09 +0200 Original-Received: from localhost ([::1]:45520 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oAXzg-0003Cy-38 for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 10 Jul 2022 10:33:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60828) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAXza-0003Cc-LK for bug-gnu-emacs@gnu.org; Sun, 10 Jul 2022 10:33:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:43797) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oAXza-0002tV-Cf for bug-gnu-emacs@gnu.org; Sun, 10 Jul 2022 10:33:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oAXza-0003u9-A1 for bug-gnu-emacs@gnu.org; Sun, 10 Jul 2022 10:33:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Jul 2022 14:33:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.165746356314979 (code B ref 56469); Sun, 10 Jul 2022 14:33:02 +0000 Original-Received: (at 56469) by debbugs.gnu.org; 10 Jul 2022 14:32:43 +0000 Original-Received: from localhost ([127.0.0.1]:37694 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAXzH-0003tX-2n for submit@debbugs.gnu.org; Sun, 10 Jul 2022 10:32:43 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:46282) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAXzF-0003tL-16 for 56469@debbugs.gnu.org; Sun, 10 Jul 2022 10:32:41 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:37098) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAXz9-0002sD-Os; Sun, 10 Jul 2022 10:32:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=buhzpHNZwFpUBzAFFFIN0Knxa+H5RTeWA2hjpx7jDKQ=; b=AsPx4CJ9VPOI MpWySNkJtHa5im+er8XKeOuwqH9iF/om4ssEf5zFxCMwrb588qjh+2ewOtJ03UcgvjbPRcKjFYcgz /3IYvS9SJlTpSHZ4BWe/c3O5fYs4nuCz9Qs9quJMv69qlFjKO9UDtUiv4S2gQpjjxGoDKyVRTSZ/8 pc1osCJ+p9kXBgME9B1kGTkICREYDqifjw40E3D5LUqA6nfuWmz/WD4zdA2vIUx/VkBzniLPUUXP9 wkZhKEcHFqSyqeua7BjsIcn0or/wqJJSziHDY7awzurmA3oq8ePE9JKtZPCu0GN9WpsTm3I3dfThX /PO1bFRtbfubG+yjVNq/7A==; Original-Received: from [87.69.77.57] (port=2440 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAXz9-0007wO-4B; Sun, 10 Jul 2022 10:32:35 -0400 In-Reply-To: (message from Stefan Monnier on Sun, 10 Jul 2022 10:23:28 -0400) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:236581 Archived-At: > From: Stefan Monnier > Cc: 56469@debbugs.gnu.org > Date: Sun, 10 Jul 2022 10:23:28 -0400 > > W.r.t to the comment, it's indeed unrelated to the patch (other than > the fact that it touches the same code). The question is when we do: > > finalname = (nchars == nbytes) > ? make_uninit_string (nbytes) > : make_uninit_multibyte_string (nchars, nbytes); > > the actual bytes are "decoded" (i.e. in our internal UTF-8 encoding), so > (nchars == nbytes) checks whether its "pure ASCII" or not and if it's > pure ASCII we return a unibyte string. I don't think this is true, because early during startup we don't yet have the coding-systems set up, and so the file names are unibyte and undecoded. So that place in dired.c doesn't only handle ASCII when it sees that ncahrs == nbytes. > So in the above code snippet, when the string is all-ASCII, we actually > have a choice, and both a unibyte string and a multibyte string should > work. Currently in that case we return a unibyte string, but I think in > such cases we're better off returning a multibyte string because the > subsequent "all-ASCII" test (that DE/ENCODE_FILE will perform when we > pass that filename to some further operation) will be more efficient > (it's a constant-time (nchars == nbytes) test whereas when the string is > unibyte it requires looking at each and every byte). > > IOW, while it makes sense to return a "decoded unibyte" string from > DECODE_FILE in order to avoid an allocation, I don't think it makes > sense to return such a "decoded unibyte" string when we have to allocate > a new string anyway. I'm not necessarily opposed to decide that ASCII strings should be multibyte, but doing so for file names will need careful auditing of the sources with the startup process in mind.