From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#65305: 29.1; archive-mode can not handle subfile names encoded with utf-8 Date: Tue, 15 Aug 2023 17:50:03 +0300 Message-ID: <83bkf89x7o.fsf@gnu.org> References: <83sf8ka6b2.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="10195"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 65305@debbugs.gnu.org To: awrhygty@outlook.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Aug 15 16:51:33 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qVvOO-0002TW-RR for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 15 Aug 2023 16:51:32 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qVvNx-0005c0-4X; Tue, 15 Aug 2023 10:51:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qVvNv-0005bm-5g for bug-gnu-emacs@gnu.org; Tue, 15 Aug 2023 10:51:03 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qVvNu-00066q-Rh for bug-gnu-emacs@gnu.org; Tue, 15 Aug 2023 10:51:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qVvNu-0008Dr-64 for bug-gnu-emacs@gnu.org; Tue, 15 Aug 2023 10:51:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 15 Aug 2023 14:51:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 65305 X-GNU-PR-Package: emacs Original-Received: via spool by 65305-submit@debbugs.gnu.org id=B65305.169211101131533 (code B ref 65305); Tue, 15 Aug 2023 14:51:02 +0000 Original-Received: (at 65305) by debbugs.gnu.org; 15 Aug 2023 14:50:11 +0000 Original-Received: from localhost ([127.0.0.1]:36444 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qVvN5-0008CX-FP for submit@debbugs.gnu.org; Tue, 15 Aug 2023 10:50:11 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:42152) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qVvN0-0008By-Tx for 65305@debbugs.gnu.org; Tue, 15 Aug 2023 10:50:09 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qVvMv-0005jd-4b; Tue, 15 Aug 2023 10:50:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=BOR48TN+6O3Ch68NTo8BxYNPI6jkZS26Hah0cJUn/jQ=; b=YzOn5JoQfl119fCwZbaU 6/ykTeE6nno60JiV7oz1CsCQ3H9aRljcgOxsk7jgKDyQG0wcAP6VYRYFxjyzendamZO8RLkLDXg0e DuOTVb4VvP/2YyiTCUK4bM8j5TDoRSiA49NX4VR0OOzC6mxXhcZyOlV3qL8NjRuiH+lV4c4Ios/4r O1x7lWNv+3v4kUXBZWHLM3Lgh9Z6zNT7HVycbjJB+wUZWkBzwUOjHlGsEyk6o8DxHwADq3MBmLemM 6mQxBww1bZomCFVEOYkT6JZZVEoev2AEH1hVmMucyo8KmVc+xUIzl+WMCA6a0wDcmReKbBQaXN1po 4YKI2wv9qR92Pg==; In-Reply-To: (awrhygty@outlook.com) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:267497 Archived-At: > From: awrhygty@outlook.com > Cc: 65305@debbugs.gnu.org > Date: Tue, 15 Aug 2023 22:53:01 +0900 > > Eli Zaretskii writes: > > > Is there any way of distinguishing these Python-created ZIP archives > > from ZIP archives created by other Windows programs? > > > > Emacs by default assumes that file names in a ZIP archive created by a > > Windows program are encoded in the console codepage, and it enforces > > using that encoding for file names when the "creator" of the ZIP > > archive indicates the archive was created by Windows programs such as > > InfoZip's zip.exe and the File Explorer. In my testing, zip archives > > created by Python as above record the "creator" as number 0 (zero), > > which is identical to what InfoZip does. So, unless someone explains > > how to distinguish these zip archives from those created by InfoZip, I > > don't see how can Emacs know whether to use the InfoZip heuristics or > > the Python heuristics. Without the InfoZip/File Explorer heuristics > > we have in arc-mode.el today, Emacs on Windows would be completely > > unable to support non-ASCII file names in ZIP archives. > > There is a bit flag indicating that the subfile name is encoded with > utf-8. Bytes 6-7 in local file header or bytes 8-9 in central directory > header are general purpose bit flag. And bit 11 of the flag represents > file encoding flag(1 for utf-8 encoding). Thanks, please try the patch below. If it gives good results, I will install it. > I guess unzip.exe does not support utf-8 encoded subfile name. > Writing batch file with utf-8 encoding: > c:\Emacs\emacs-29.1\bin\unzip.exe test.zip 一.txt > and run with chcp 932, 荳\200.txt is extracted. > With chcp 65001, extraction failed. > > Writing batch file with cp932 encoding:(same as above) > c:\Emacs\emacs-29.1\bin\unzip.exe test.zip 一.txt > and run with chcp 65001, 荳\200.txt is extracted. > With chcp 932, extraction failed. > This is not an ideal behavior, but extraction to STDOUT may work. > > To the contrary, 7z.exe extracts 一.txt correctly. > If batch file is encoded with utf-8, it works with chcp 65001. > If batch file is encoded with cp932, it works with chcp 932. Like I said: support for UTF-8 encoded file names on Windows is sporadic and incomplete. It will remain so until Windows file-related APIs support UTF-8 encoded file names. diff --git a/lisp/arc-mode.el b/lisp/arc-mode.el index 5e696c0..05a71fb 100644 --- a/lisp/arc-mode.el +++ b/lisp/arc-mode.el @@ -1990,6 +1990,7 @@ archive-zip-summarize (setq p (+ p (point-min))) (while (string= "PK\001\002" (buffer-substring p (+ p 4))) (let* ((creator (get-byte (+ p 5))) + (gpflags (archive-l-e (+ p 8) 2)) ;; (method (archive-l-e (+ p 10) 2)) (modtime (archive-l-e (+ p 12) 2)) (moddate (archive-l-e (+ p 14) 2)) @@ -2001,7 +2002,12 @@ archive-zip-summarize (efnname (let ((str (buffer-substring (+ p 46) (+ p 46 fnlen)))) (decode-coding-string str - (or (if (and w32-fname-encoding + ;; Bit 11 of general purpose bit flags (bytes + ;; 8-9) of Central Directory: 1 means UTF-8 + ;; encoded file names. + (or (if (/= 0 (logand gpflags #x0800)) + 'utf-8-unix) + (if (and w32-fname-encoding (memq creator ;; This should be just 10 and ;; 14, but InfoZip uses 0 and