From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: awrhygty@outlook.com Newsgroups: gmane.emacs.bugs Subject: bug#65305: 29.1; archive-mode can not handle subfile names encoded with utf-8 Date: Wed, 16 Aug 2023 12:47:14 +0900 Message-ID: References: <83sf8ka6b2.fsf@gnu.org> <83bkf89x7o.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37289"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: 65305@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Aug 16 05:48:24 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qW7WB-0009YJ-KF for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 16 Aug 2023 05:48:23 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qW7Vs-0008Ah-LC; Tue, 15 Aug 2023 23:48:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qW7Vr-000891-Dg for bug-gnu-emacs@gnu.org; Tue, 15 Aug 2023 23:48:03 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qW7Vr-0000Tt-4R for bug-gnu-emacs@gnu.org; Tue, 15 Aug 2023 23:48:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qW7Vq-0005Fo-Ge for bug-gnu-emacs@gnu.org; Tue, 15 Aug 2023 23:48:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: awrhygty@outlook.com Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 16 Aug 2023 03:48:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 65305 X-GNU-PR-Package: emacs Original-Received: via spool by 65305-submit@debbugs.gnu.org id=B65305.169215765120047 (code B ref 65305); Wed, 16 Aug 2023 03:48:02 +0000 Original-Received: (at 65305) by debbugs.gnu.org; 16 Aug 2023 03:47:31 +0000 Original-Received: from localhost ([127.0.0.1]:38540 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qW7VK-0005DH-Id for submit@debbugs.gnu.org; Tue, 15 Aug 2023 23:47:31 -0400 Original-Received: from mail-sgaapc01olkn2015.outbound.protection.outlook.com ([40.92.53.15]:7521 helo=APC01-SG2-obe.outbound.protection.outlook.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qW7VF-0005Cz-60 for 65305@debbugs.gnu.org; Tue, 15 Aug 2023 23:47:27 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Iif1FIpAI585q0wUTQEgfTMqputuakCn03vIx9hZxMTXRAREv4xlT15lw/6X3snZ/z4a3AYaF0Z1kOnRY/ljCnYGJuEVEQxqaXOEN73cjvohMu72PCSPvFKQgt6XXGJFXR8kSLpwtVSSv99VD9CBnM8foyIO8gTx66NCbt7V82fSZDDd3Gw5QVmog7QE+IpG5D9VdrDMG11hFOezAg2fvU7kSs1t/Th1P4Ujnp4hnpC0A735DWgyhtS5myb8kTNkO34Xinpt1AgWiY4/hBh2B6PHiqlt+MAkeojeYr+1DG9dBexqnv1hQPvuXSYgxMi1Tgb0OFl/gIrdCQ1b4TCTVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pwpYQP+IORUx6LfGV6+7fgvlzmeItAdlSkqZqFQBQJI=; b=FSkc1o9St3SnjKuAeVa3gm6duvFzdtnQ2wNUGc0uxeTL7O2INi7pPVINs2LcihU0FpvYxPgUBmoKlfI844fj8m+UWsXlo71c3gzztatbxX3ENVYzMAs/xMUu5m55koqmMxx6zGK21fO+fiaiBnHeW+Y89yrhRIXGOFyMJUAKuPIwDH9N3MlBxojbb3b11yGrsQOKOlBvui6n3adFf7f57bc3sEfbuFD/LHjneBg6C8F9j69HsEZqOFMwJ2opW5iy33CmlnA1jAN4jYQj1LPN9vELqAMsnaKpqpETg8eHhtGEBpSmE6izEl7qrbS7J/WLRBPo166nnoyz4mHSBs6Vvg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pwpYQP+IORUx6LfGV6+7fgvlzmeItAdlSkqZqFQBQJI=; b=CogzAlT7CBXFH6Kf+HrxuyyBSscYOFzJOJqSNeLlE2W3gbPDcNExbYD3QKPpDHpglqDMI8CZnm5Kd+SiNyLf45OHBS7C0MME85xu0Gu1k47zmlqV7bkhJxEgminuXgmSDi+oxlZNeMumHuzUNL2GDFxO+Un4DND/bECI79shcKEfwdR63nJupnyqWZwshayfcsnAhxCQ5bp6e4cj5hABQ3cdCLXJ08F+P6i6GbZnBjVoIvHj5t82kz5v1Vq77Zuu5eyE19xw1OFkT2hvZ8l8Wnn+oliI9CddroQGzY1aglu3XcW1Y+frn4q9dOZnZ1ytiGCWxp0mM9kXAFF5fIpx+w== Original-Received: from TYZPR01MB3920.apcprd01.prod.exchangelabs.com (2603:1096:400:30::11) by SEZPR01MB5462.apcprd01.prod.exchangelabs.com (2603:1096:101:133::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6652.27; Wed, 16 Aug 2023 03:47:17 +0000 Original-Received: from TYZPR01MB3920.apcprd01.prod.exchangelabs.com ([fe80::72cf:3224:cab3:a133]) by TYZPR01MB3920.apcprd01.prod.exchangelabs.com ([fe80::72cf:3224:cab3:a133%4]) with mapi id 15.20.6678.022; Wed, 16 Aug 2023 03:47:17 +0000 In-Reply-To: <83bkf89x7o.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 15 Aug 2023 10:50:01 -0400") X-TMN: [/05hXKFl8pFPnE3SVGlrihdqflRyPRPQ] X-ClientProxiedBy: OS3P301CA0014.JPNP301.PROD.OUTLOOK.COM (2603:1096:604:21d::20) To TYZPR01MB3920.apcprd01.prod.exchangelabs.com (2603:1096:400:30::11) X-Microsoft-Original-Message-ID: <86bkf7vebh.fsf@outlook.com> X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYZPR01MB3920:EE_|SEZPR01MB5462:EE_ X-MS-Office365-Filtering-Correlation-Id: 89a5e032-3cc8-4a98-1668-08db9e0b7c26 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: FW+0epceInvIp843YfuneMsOn/sOS9030Ml8cB6NpmAKLPR4aSZkjCllju8sc2/iDVDKUeSATG/nB//tPioimmAN8/xQG5LOu8RdFlafxas9j1OkCq/B51+QVBivJHFBnOuaC7WPLu8FLJJgMH8Yc0zCPVVsqH5wezbOCyvz5GMzFYnMerzS8K/GUWx7nP0yPh95vyRZn/eUQemA53Kyqjef3C3OtJKlYov2blsn0Fs7dl0e6ozEyJC4bb6h//PMBi++ZKQn42DMTK0U/XjRHOnE6C8Sl8lY1PVLfAfQrWSEzzc5xB19jAKbrgNeDbSyQWlGrj9P/53Q4FC2oXcftdgeTTgpbcEmpM0LowkRAkKWMkr/q+OJWKuGkOXpMuaroin8HvZFeocDsYypYrnSmzt9dSRkwbk0Em8noWzM2zLdsHiXDX3tviqsjFG3XOhnxU7MUhc58ag/DMuG4BL5N1jiRdyO92N0Qu04L9urmEj7VH6ImHBnljUj7iBVY8iKTaqhq2oOBoTTiADiWZiAdvh2OrG+lkLlJYqXLUq+JK8= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: OWBRkqQB0Jd4gUBpNHwn2oqExfaad55qefhMHyzNvGMjaeZrDLWsNSZEX++tjXcfGNHk2/ubioqA0x8Ge2c9FyggMw/DFET+JA4yQbE+6ntFHXVHHALyhW3ZQhVG0UoICmB132IuRDtGZ+5kLkVK2Coebx5w8GMbu1VvE/uy7WRpAY8w7TxzRnZ22tbtyp9z5kcUH3Z7ArO5FESeD6+EZ4VlDZ1W5vc4b6sUUnFVseitK1L7AZpAEgDkN7NVM9FDxw5KgStbxLVIKvwMNXQ9wLKLkZt4Lx/oZKeluZPY69qatgH6tfxhVVBn5B8Jxzaa7s79O2QjeBHUkriXC/T4AQeWE9CKZa4WoAvjHZxq1d1SlUWOcIGnSrwiNgXSSgVJQEYoFpzGaBQbWqV6icHxoIcpu3XHCxs3ziHFhhQRrJTdkXJ/h6dfcE1DAfdsrpizWL6Y1FYq4O7HTcUdZlKwjVowzvRJ3s84Je/Bqhk1RTm/ltJgt7ETP1frmib6qk36umLnY+SJXwhR7K6NtUNAOS5vci2fteTjGkwYg4Qazps9kFfI01x9NOAnQV/ttUbkFErzc/Q+ERoRRlH5Vu39nPPlAM6OePuomNrDc6Fcy9+Ip78M5Q5yR45CjeLXZtqG50yZEh49z7u6DMAEBuV6JyiMnrHfbqwVlGvPO+zUWznLsF0+NPzkgnHlxggXH46xUrIl2UVxuAjJes548bJnLd+vTVPbt9q20AT1oGJptbioEoA4Ux1PJRGopN ERQpEmkLFSran2LC8BRkUsztZjBnUQbhCGeQoEwp6ltXmton64bDkF9YT6D3p0DHefEPNacbypSwWW4bZN96QSIb/Ij7SpS6dw X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 89a5e032-3cc8-4a98-1668-08db9e0b7c26 X-MS-Exchange-CrossTenant-AuthSource: TYZPR01MB3920.apcprd01.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Aug 2023 03:47:17.0215 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SEZPR01MB5462 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:267548 Archived-At: Eli Zaretskii writes: >> From: awrhygty@outlook.com >> Cc: 65305@debbugs.gnu.org >> Date: Tue, 15 Aug 2023 22:53:01 +0900 >> >> Eli Zaretskii writes: >> >> > Is there any way of distinguishing these Python-created ZIP archives >> > from ZIP archives created by other Windows programs? >> > >> > Emacs by default assumes that file names in a ZIP archive created by a >> > Windows program are encoded in the console codepage, and it enforces >> > using that encoding for file names when the "creator" of the ZIP >> > archive indicates the archive was created by Windows programs such as >> > InfoZip's zip.exe and the File Explorer. In my testing, zip archives >> > created by Python as above record the "creator" as number 0 (zero), >> > which is identical to what InfoZip does. So, unless someone explains >> > how to distinguish these zip archives from those created by InfoZip, I >> > don't see how can Emacs know whether to use the InfoZip heuristics or >> > the Python heuristics. Without the InfoZip/File Explorer heuristics >> > we have in arc-mode.el today, Emacs on Windows would be completely >> > unable to support non-ASCII file names in ZIP archives. >> >> There is a bit flag indicating that the subfile name is encoded with >> utf-8. Bytes 6-7 in local file header or bytes 8-9 in central directory >> header are general purpose bit flag. And bit 11 of the flag represents >> file encoding flag(1 for utf-8 encoding). > > Thanks, please try the patch below. If it gives good results, I will > install it. > >> I guess unzip.exe does not support utf-8 encoded subfile name. >> Writing batch file with utf-8 encoding: >> c:\Emacs\emacs-29.1\bin\unzip.exe test.zip 一.txt >> and run with chcp 932, 荳\200.txt is extracted. >> With chcp 65001, extraction failed. >> >> Writing batch file with cp932 encoding:(same as above) >> c:\Emacs\emacs-29.1\bin\unzip.exe test.zip 一.txt >> and run with chcp 65001, 荳\200.txt is extracted. >> With chcp 932, extraction failed. >> This is not an ideal behavior, but extraction to STDOUT may work. >> >> To the contrary, 7z.exe extracts 一.txt correctly. >> If batch file is encoded with utf-8, it works with chcp 65001. >> If batch file is encoded with cp932, it works with chcp 932. > > Like I said: support for UTF-8 encoded file names on Windows is > sporadic and incomplete. It will remain so until Windows file-related > APIs support UTF-8 encoded file names. > > diff --git a/lisp/arc-mode.el b/lisp/arc-mode.el > index 5e696c0..05a71fb 100644 > --- a/lisp/arc-mode.el > +++ b/lisp/arc-mode.el > @@ -1990,6 +1990,7 @@ archive-zip-summarize > (setq p (+ p (point-min))) > (while (string= "PK\001\002" (buffer-substring p (+ p 4))) > (let* ((creator (get-byte (+ p 5))) > + (gpflags (archive-l-e (+ p 8) 2)) > ;; (method (archive-l-e (+ p 10) 2)) > (modtime (archive-l-e (+ p 12) 2)) > (moddate (archive-l-e (+ p 14) 2)) > @@ -2001,7 +2002,12 @@ archive-zip-summarize > (efnname (let ((str (buffer-substring (+ p 46) (+ p 46 fnlen)))) > (decode-coding-string > str > - (or (if (and w32-fname-encoding > + ;; Bit 11 of general purpose bit flags (bytes > + ;; 8-9) of Central Directory: 1 means UTF-8 > + ;; encoded file names. > + (or (if (/= 0 (logand gpflags #x0800)) > + 'utf-8-unix) > + (if (and w32-fname-encoding > (memq creator > ;; This should be just 10 and > ;; 14, but InfoZip uses 0 and The patch works to list entries, and the contents can be extracted with 7z.exe. unzip.exe does not work well. I tried the settings below, but rewriting entries does not work. (archive-zip-* variables' values are default if archive-7z-program is set and zip.exe/unzip.exe are non-existent) (setq archive-7z-program "c:/Program Files/7-Zip/7z.exe" archive-zip-extract '("c:/Program Files/7-Zip/7z.exe" "x" "-so") archive-zip-expunge '("c:/Program Files/7-Zip/7z.exe" "d") archive-zip-update '("c:/Program Files/7-Zip/7z.exe" "u") archive-zip-update-case archive-zip-update) It is because update command needs "-si" option followed by an entry name. It should be one argument like (format "-si%s" name).