unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
@ 2023-12-20 11:23 awrhygty
  2023-12-23 10:16 ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: awrhygty @ 2023-12-20 11:23 UTC (permalink / raw)
  To: 67926


If a ZIP archive has a subfile named 'file[abc].txt',
RET(archive-extract) in an archive-mode buffer fails to extract with
message: 
  caution: filename not matched:  file[abc].txt

Unfortunately if there are filea.txt, fileb.txt and filec.txt,
extraction does not report errors and the buffer of 'file[abc].txt'
contains all contents of filea.txt, fileb.txt and filec.txt, but does
not contains the contents of 'file[abc].txt'. 

This is because 'unzip.exe' treats subfilename arguments containing
'[...]' as subfilename patterns. This does not occur with '7z.exe'.


In GNU Emacs 29.1 (build 2, x86_64-w64-mingw32) of 2023-08-02 built on
 AVALON
Windowing system distributor 'Microsoft Corp.', version 10.0.19045
System Description: Microsoft Windows 10 Pro (v10.0.2009.19045.3803)

Configured using:
 'configure --with-modules --without-dbus --with-native-compilation=aot
 --without-compress-install --with-tree-sitter CFLAGS=-O2'

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NATIVE_COMP
NOTIFY W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB

(NATIVE_COMP present but libgccjit not available)

Important settings:
  value of $LANG: JPN
  locale-coding-system: cp932

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils term/bobcat japan-util rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win
w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp
files window text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget keymap hashtable-print-readable backquote
threads w32notify w32 lcms2 multi-tty make-network-process
native-compile emacs)

Memory information:
((conses 16 51611 10241)
 (symbols 48 5198 0)
 (strings 32 15199 1603)
 (string-bytes 1 409290)
 (vectors 16 10773)
 (vector-slots 8 335141 17930)
 (floats 8 35 38)
 (intervals 56 228 9)
 (buffers 984 10))





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-20 11:23 bug#67926: 29.1; fail to extract ZIP subfile named with [...] awrhygty
@ 2023-12-23 10:16 ` Eli Zaretskii
  2023-12-23 11:47   ` Andreas Schwab
  2023-12-26 14:51   ` awrhygty
  0 siblings, 2 replies; 17+ messages in thread
From: Eli Zaretskii @ 2023-12-23 10:16 UTC (permalink / raw)
  To: awrhygty; +Cc: 67926

> From: awrhygty@outlook.com
> Date: Wed, 20 Dec 2023 20:23:38 +0900
> 
> 
> If a ZIP archive has a subfile named 'file[abc].txt',
> RET(archive-extract) in an archive-mode buffer fails to extract with
> message: 
>   caution: filename not matched:  file[abc].txt
> 
> Unfortunately if there are filea.txt, fileb.txt and filec.txt,
> extraction does not report errors and the buffer of 'file[abc].txt'
> contains all contents of filea.txt, fileb.txt and filec.txt, but does
> not contains the contents of 'file[abc].txt'. 
> 
> This is because 'unzip.exe' treats subfilename arguments containing
> '[...]' as subfilename patterns. This does not occur with '7z.exe'.

Is there any way of making 'unzip' extract file[abc].txt by name,  by
some kind of escaping or protecting the [...] wildcard from expansion?
If there is such a way, we could try using it (maybe); if there's no
such way, I will tag this bug "wontfix", since it isn't a problem with
Emacs, but with the Windows build of 'unzip'.

Thanks.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-23 10:16 ` Eli Zaretskii
@ 2023-12-23 11:47   ` Andreas Schwab
  2023-12-23 11:58     ` Eli Zaretskii
  2023-12-26 14:51   ` awrhygty
  1 sibling, 1 reply; 17+ messages in thread
From: Andreas Schwab @ 2023-12-23 11:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 67926, awrhygty

On Dez 23 2023, Eli Zaretskii wrote:

> Is there any way of making 'unzip' extract file[abc].txt by name,  by
> some kind of escaping or protecting the [...] wildcard from expansion?

$ unzip foo a\[bc\].txt
Archive:  foo.zip
caution: filename not matched:  a[bc].txt
$ unzip foo 'a\[bc\].txt'
Archive:  foo.zip
 extracting: a[bc].txt               
$ unzip foo 'a\*.txt'
Archive:  foo.zip
caution: filename not matched:  a\*.txt

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-23 11:47   ` Andreas Schwab
@ 2023-12-23 11:58     ` Eli Zaretskii
  0 siblings, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2023-12-23 11:58 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 67926, awrhygty

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: awrhygty@outlook.com,  67926@debbugs.gnu.org
> Date: Sat, 23 Dec 2023 12:47:31 +0100
> 
> On Dez 23 2023, Eli Zaretskii wrote:
> 
> > Is there any way of making 'unzip' extract file[abc].txt by name,  by
> > some kind of escaping or protecting the [...] wildcard from expansion?
> 
> $ unzip foo a\[bc\].txt
> Archive:  foo.zip
> caution: filename not matched:  a[bc].txt
> $ unzip foo 'a\[bc\].txt'
> Archive:  foo.zip
>  extracting: a[bc].txt               

Thanks, but this doesn't seem to work on Windows, likely because unzip
converts backslashes into forward slashes (or something), and because
quoting 'like this' is not supported on Windows:

  D:\usr\eli>unzip wild.zip 'file\[abc\].txt'
  Archive:  wild.zip
  caution: filename not matched:  'file/[abc/].txt'

  D:\usr\eli>unzip wild.zip "file\[abc\].txt"
  Archive:  wild.zip
  caution: filename not matched:  file/[abc/].txt

  D:\usr\eli>unzip wild.zip "file\\[abc\\].txt"
  Archive:  wild.zip
  caution: filename not matched:  file//[abc//].txt

The OP's report was specifically about Windows.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-23 10:16 ` Eli Zaretskii
  2023-12-23 11:47   ` Andreas Schwab
@ 2023-12-26 14:51   ` awrhygty
  2023-12-26 17:25     ` Eli Zaretskii
  1 sibling, 1 reply; 17+ messages in thread
From: awrhygty @ 2023-12-26 14:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 67926

Eli Zaretskii <eliz@gnu.org> writes:
>> This is because 'unzip.exe' treats subfilename arguments containing
>> '[...]' as subfilename patterns. This does not occur with '7z.exe'.
>
> Is there any way of making 'unzip' extract file[abc].txt by name,  by
> some kind of escaping or protecting the [...] wildcard from expansion?
> If there is such a way, we could try using it (maybe); if there's no
> such way, I will tag this bug "wontfix", since it isn't a problem with
> Emacs, but with the Windows build of 'unzip'.

There is a tricky way to specify "file[[]abc].txt".

I think that avoiding the use of unzip.exe/zip.exe solves problems about
directory names, archive names, subfile names.
Replacing #'archive-zip-extract with the form below,
ZIP subfiles can be extracted without unzip.exe.

(defun archive-zip-extract (archive name)
  (let* ((desc archive-subfile-mode)
         (buf (current-buffer))
         (bufname (buffer-file-name)))
    (set-buffer archive-superior-buffer)
    (save-restriction
      (widen)
      (let* ((file-beg archive-proper-file-start)
             (p0 (+ file-beg (archive--file-desc-pos desc)))
             (p  (+ file-beg (archive-l-e (+ p0 42) 4)))
             (bitflags (archive-l-e (+ p  6) 2))
             (method   (archive-l-e (+ p  8) 2))
             (compsize (archive-l-e (+ p0 20) 4))
             (fn-len   (archive-l-e (+ p 26) 2))
             (ex-len   (archive-l-e (+ p 28) 2))
             (data-beg (+ p 30 fn-len ex-len))
             (data-end (+ data-beg compsize))
             (coding-system-for-read  'no-conversion)
             (coding-system-for-write 'no-conversion)
             (default-directory temporary-file-directory))
        (cond ((/= 0 (logand bitflags 1))
               (message "Subfile is encrypted"))
              ((= method 0)
               (with-current-buffer buf
                 (insert-buffer-substring archive-superior-buffer
                                          data-beg data-end)))
              ((eq method 8)
               (let ((crc-32    (buffer-substring (+ p0 16) (+ p0 20)))
                     (orig-size (buffer-substring (+ p0 24) (+ p0 28)))
                     (proc (start-process "gzip" buf "gzip" "-cd"))
                     (header "\x1f\x8b\x08\0\0\0\0\0\0\0"))
                 (set-process-sentinel proc #'ignore)
                 (process-send-string proc header)
                 (process-send-region proc data-beg data-end)
                 (process-send-string proc crc-32)
                 (process-send-string proc orig-size)
                 (process-send-eof proc)
                 (accept-process-output proc nil nil t)
                 (delete-process proc)))
              ((eq method 12)
               (call-process-region data-beg data-end
                                    "bzip2" nil buf nil "-cd"))
              (t (message "Unknown compression method")))))
    (set-buffer buf)))





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-26 14:51   ` awrhygty
@ 2023-12-26 17:25     ` Eli Zaretskii
  2023-12-27 14:36       ` awrhygty
  2023-12-28 14:56       ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: Eli Zaretskii @ 2023-12-26 17:25 UTC (permalink / raw)
  To: awrhygty; +Cc: 67926

> From: awrhygty@outlook.com
> Cc: 67926@debbugs.gnu.org
> Date: Tue, 26 Dec 2023 23:51:01 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> >> This is because 'unzip.exe' treats subfilename arguments containing
> >> '[...]' as subfilename patterns. This does not occur with '7z.exe'.
> >
> > Is there any way of making 'unzip' extract file[abc].txt by name,  by
> > some kind of escaping or protecting the [...] wildcard from expansion?
> > If there is such a way, we could try using it (maybe); if there's no
> > such way, I will tag this bug "wontfix", since it isn't a problem with
> > Emacs, but with the Windows build of 'unzip'.
> 
> There is a tricky way to specify "file[[]abc].txt".

That could be a good solution if it works reliably.

> I think that avoiding the use of unzip.exe/zip.exe solves problems about
> directory names, archive names, subfile names.
> Replacing #'archive-zip-extract with the form below,
> ZIP subfiles can be extracted without unzip.exe.
> 
> (defun archive-zip-extract (archive name)
>   (let* ((desc archive-subfile-mode)
>          (buf (current-buffer))
>          (bufname (buffer-file-name)))
>     (set-buffer archive-superior-buffer)
>     (save-restriction
>       (widen)
>       (let* ((file-beg archive-proper-file-start)
>              (p0 (+ file-beg (archive--file-desc-pos desc)))
>              (p  (+ file-beg (archive-l-e (+ p0 42) 4)))
>              (bitflags (archive-l-e (+ p  6) 2))
>              (method   (archive-l-e (+ p  8) 2))
>              (compsize (archive-l-e (+ p0 20) 4))
>              (fn-len   (archive-l-e (+ p 26) 2))
>              (ex-len   (archive-l-e (+ p 28) 2))
>              (data-beg (+ p 30 fn-len ex-len))
>              (data-end (+ data-beg compsize))
>              (coding-system-for-read  'no-conversion)
>              (coding-system-for-write 'no-conversion)
>              (default-directory temporary-file-directory))
>         (cond ((/= 0 (logand bitflags 1))
>                (message "Subfile is encrypted"))
>               ((= method 0)
>                (with-current-buffer buf
>                  (insert-buffer-substring archive-superior-buffer
>                                           data-beg data-end)))
>               ((eq method 8)
>                (let ((crc-32    (buffer-substring (+ p0 16) (+ p0 20)))
>                      (orig-size (buffer-substring (+ p0 24) (+ p0 28)))
>                      (proc (start-process "gzip" buf "gzip" "-cd"))
>                      (header "\x1f\x8b\x08\0\0\0\0\0\0\0"))
>                  (set-process-sentinel proc #'ignore)
>                  (process-send-string proc header)
>                  (process-send-region proc data-beg data-end)
>                  (process-send-string proc crc-32)
>                  (process-send-string proc orig-size)
>                  (process-send-eof proc)
>                  (accept-process-output proc nil nil t)
>                  (delete-process proc)))
>               ((eq method 12)
>                (call-process-region data-beg data-end
>                                     "bzip2" nil buf nil "-cd"))
>               (t (message "Unknown compression method")))))
>     (set-buffer buf)))

Thanks, but I don't think it's a good idea.  There are more
compression methods than just those 3, and some of them aren't
documented.  unzip.exe itself supports 17 methods.  So I'd rather stay
with unzip.exe than invent our own wheel.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-26 17:25     ` Eli Zaretskii
@ 2023-12-27 14:36       ` awrhygty
  2023-12-27 16:48         ` Eli Zaretskii
  2023-12-28 14:56       ` Eli Zaretskii
  1 sibling, 1 reply; 17+ messages in thread
From: awrhygty @ 2023-12-27 14:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 67926

Eli Zaretskii <eliz@gnu.org> writes:

>> I think that avoiding the use of unzip.exe/zip.exe solves problems about
>> directory names, archive names, subfile names.
>> Replacing #'archive-zip-extract with the form below,
>> ZIP subfiles can be extracted without unzip.exe.
>> 
>> (defun archive-zip-extract (archive name)
>>   (let* ((desc archive-subfile-mode)
>>          (buf (current-buffer))
>>          (bufname (buffer-file-name)))
>>     (set-buffer archive-superior-buffer)
>>     (save-restriction
>>       (widen)
>>       (let* ((file-beg archive-proper-file-start)
>>              (p0 (+ file-beg (archive--file-desc-pos desc)))
>>              (p  (+ file-beg (archive-l-e (+ p0 42) 4)))
>>              (bitflags (archive-l-e (+ p  6) 2))
>>              (method   (archive-l-e (+ p  8) 2))
>>              (compsize (archive-l-e (+ p0 20) 4))
>>              (fn-len   (archive-l-e (+ p 26) 2))
>>              (ex-len   (archive-l-e (+ p 28) 2))
>>              (data-beg (+ p 30 fn-len ex-len))
>>              (data-end (+ data-beg compsize))
>>              (coding-system-for-read  'no-conversion)
>>              (coding-system-for-write 'no-conversion)
>>              (default-directory temporary-file-directory))
>>         (cond ((/= 0 (logand bitflags 1))
>>                (message "Subfile is encrypted"))
>>               ((= method 0)
>>                (with-current-buffer buf
>>                  (insert-buffer-substring archive-superior-buffer
>>                                           data-beg data-end)))
>>               ((eq method 8)
>>                (let ((crc-32    (buffer-substring (+ p0 16) (+ p0 20)))
>>                      (orig-size (buffer-substring (+ p0 24) (+ p0 28)))
>>                      (proc (start-process "gzip" buf "gzip" "-cd"))
>>                      (header "\x1f\x8b\x08\0\0\0\0\0\0\0"))
>>                  (set-process-sentinel proc #'ignore)
>>                  (process-send-string proc header)
>>                  (process-send-region proc data-beg data-end)
>>                  (process-send-string proc crc-32)
>>                  (process-send-string proc orig-size)
>>                  (process-send-eof proc)
>>                  (accept-process-output proc nil nil t)
>>                  (delete-process proc)))
>>               ((eq method 12)
>>                (call-process-region data-beg data-end
>>                                     "bzip2" nil buf nil "-cd"))
>>               (t (message "Unknown compression method")))))
>>     (set-buffer buf)))
>
> Thanks, but I don't think it's a good idea.  There are more
> compression methods than just those 3, and some of them aren't
> documented.  unzip.exe itself supports 17 methods.  So I'd rather stay
> with unzip.exe than invent our own wheel.

If unzip.exe(or an alternative external program) is necessary,
I want emacs not to load contents of archive files into archive-mode
buffer. It is waste of time and memory.
I never opened large ZIP archives of Giga Byte size.
But I would be glad to open such files with archive-mode
in a short time and with a small memory.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-27 14:36       ` awrhygty
@ 2023-12-27 16:48         ` Eli Zaretskii
  2023-12-28  0:38           ` awrhygty
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2023-12-27 16:48 UTC (permalink / raw)
  To: awrhygty; +Cc: 67926

> From: awrhygty@outlook.com
> Cc: 67926@debbugs.gnu.org
> Date: Wed, 27 Dec 2023 23:36:32 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Thanks, but I don't think it's a good idea.  There are more
> > compression methods than just those 3, and some of them aren't
> > documented.  unzip.exe itself supports 17 methods.  So I'd rather stay
> > with unzip.exe than invent our own wheel.
> 
> If unzip.exe(or an alternative external program) is necessary,
> I want emacs not to load contents of archive files into archive-mode
> buffer. It is waste of time and memory.

unzip is necessary to extract files, but not to display the archive's
contents.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-27 16:48         ` Eli Zaretskii
@ 2023-12-28  0:38           ` awrhygty
  2023-12-28  6:31             ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: awrhygty @ 2023-12-28  0:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 67926

Eli Zaretskii <eliz@gnu.org> writes:

>> If unzip.exe(or an alternative external program) is necessary,
>> I want emacs not to load contents of archive files into archive-mode
>> buffer. It is waste of time and memory.
>
> unzip is necessary to extract files, but not to display the archive's
> contents.

If users are expected to have unzip.exe, emacs can list subfiles without
examining archive contents as a binary file.
Users with unzip.exe don't care about whether subfiles are listed with
unzip.exe or not.

If users are not expected to have unzip.exe, they feel convenient if
subfiles are extracted without unzip.exe.
In this case, it is better archive-zip-extract's value as variable can
be a lisp function to be called in the archive-zip-extract function.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-28  0:38           ` awrhygty
@ 2023-12-28  6:31             ` Eli Zaretskii
  2023-12-28 13:09               ` awrhygty
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2023-12-28  6:31 UTC (permalink / raw)
  To: awrhygty; +Cc: 67926

> From: awrhygty@outlook.com
> Cc: 67926@debbugs.gnu.org
> Date: Thu, 28 Dec 2023 09:38:57 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> If unzip.exe(or an alternative external program) is necessary,
> >> I want emacs not to load contents of archive files into archive-mode
> >> buffer. It is waste of time and memory.
> >
> > unzip is necessary to extract files, but not to display the archive's
> > contents.
> 
> If users are expected to have unzip.exe, emacs can list subfiles without
> examining archive contents as a binary file.
> Users with unzip.exe don't care about whether subfiles are listed with
> unzip.exe or not.

I see your point.  However, those decisions were made many years ago,
and have withstood the test of time since then.  So I see no reason to
make drastic changes in how we support zip archives, just because we
can, or just because other arrangements are possible.

> If users are not expected to have unzip.exe, they feel convenient if
> subfiles are extracted without unzip.exe.
> In this case, it is better archive-zip-extract's value as variable can
> be a lisp function to be called in the archive-zip-extract function.

We could consider extracting using our own code if someone writes the
code to support all the 17 methods that unzip.exe supports.
Otherwise, we would introduce a regression, and someone somewhere will
rightfully complain.

Btw, your suggested changes required gzip and bunzip2 as external
programs to support the 2 most popular compression methods.  Why
should we assume these are available more widely than unzip,
especially on Windows?





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-28  6:31             ` Eli Zaretskii
@ 2023-12-28 13:09               ` awrhygty
  2023-12-28 14:06                 ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: awrhygty @ 2023-12-28 13:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 67926

Eli Zaretskii <eliz@gnu.org> writes:

>> If users are not expected to have unzip.exe, they feel convenient if
>> subfiles are extracted without unzip.exe.
>> In this case, it is better archive-zip-extract's value as variable can
>> be a lisp function to be called in the archive-zip-extract function.
>
> We could consider extracting using our own code if someone writes the
> code to support all the 17 methods that unzip.exe supports.
> Otherwise, we would introduce a regression, and someone somewhere will
> rightfully complain.
>
> Btw, your suggested changes required gzip and bunzip2 as external
> programs to support the 2 most popular compression methods.  Why
> should we assume these are available more widely than unzip,
> especially on Windows?

When I installed UnxUtils years ago, it had bzip2 and gzip, but not
unzip nor zip. Now I download it again, it has unzip and zip.

My interest is how to avoid naming problems.
There are more difficulties in Japanese.
Japanese characters in file names are normally encoded in cp932.
Encoded characters may have '[', '\' or ']' as a second byte.
  (encode-coding-string "ゼソゾ" 'cp932)
  => "\203[\203\\\203]"
Subfiles of such names can not be extracted normally.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-28 13:09               ` awrhygty
@ 2023-12-28 14:06                 ` Eli Zaretskii
  2024-01-03 19:53                   ` awrhygty
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2023-12-28 14:06 UTC (permalink / raw)
  To: awrhygty; +Cc: 67926

> From: awrhygty@outlook.com
> Cc: 67926@debbugs.gnu.org
> Date: Thu, 28 Dec 2023 22:09:23 +0900
> 
> > Btw, your suggested changes required gzip and bunzip2 as external
> > programs to support the 2 most popular compression methods.  Why
> > should we assume these are available more widely than unzip,
> > especially on Windows?
> 
> When I installed UnxUtils years ago, it had bzip2 and gzip, but not
> unzip nor zip. Now I download it again, it has unzip and zip.

Windows systems don't come with UnxUtils installed anyway.

> My interest is how to avoid naming problems.
> There are more difficulties in Japanese.
> Japanese characters in file names are normally encoded in cp932.
> Encoded characters may have '[', '\' or ']' as a second byte.
>   (encode-coding-string "ゼソゾ" 'cp932)
>   => "\203[\203\\\203]"
> Subfiles of such names can not be extracted normally.

I don't think we can solve this in Emacs: non-ASCII file names in zip
archives are a mess, even before you consider the fact that zip
archives are frequently moved between systems.  For starters, how can
one know in advance what is the encoding of file names in an arbitrary
zip archive?  This will bite you even if we do everything in Emacs,
and even if someone does submit patches to implement all the
compression methods.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-26 17:25     ` Eli Zaretskii
  2023-12-27 14:36       ` awrhygty
@ 2023-12-28 14:56       ` Eli Zaretskii
  2024-01-04 10:46         ` Eli Zaretskii
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2023-12-28 14:56 UTC (permalink / raw)
  To: awrhygty; +Cc: 67926

> Cc: 67926@debbugs.gnu.org
> Date: Tue, 26 Dec 2023 19:25:39 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > From: awrhygty@outlook.com
> > Cc: 67926@debbugs.gnu.org
> > Date: Tue, 26 Dec 2023 23:51:01 +0900
> > 
> > Eli Zaretskii <eliz@gnu.org> writes:
> > >> This is because 'unzip.exe' treats subfilename arguments containing
> > >> '[...]' as subfilename patterns. This does not occur with '7z.exe'.
> > >
> > > Is there any way of making 'unzip' extract file[abc].txt by name,  by
> > > some kind of escaping or protecting the [...] wildcard from expansion?
> > > If there is such a way, we could try using it (maybe); if there's no
> > > such way, I will tag this bug "wontfix", since it isn't a problem with
> > > Emacs, but with the Windows build of 'unzip'.
> > 
> > There is a tricky way to specify "file[[]abc].txt".
> 
> That could be a good solution if it works reliably.

I've now verified that it works reliably, and replaced
shell-quote-argument with this special quoting in archive-zip-extract
(but only when the program used to extract files is "unzip").

So the original problem of this bug report is now fixed, and I think
we can close this bug.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-28 14:06                 ` Eli Zaretskii
@ 2024-01-03 19:53                   ` awrhygty
  2024-01-03 20:00                     ` Eli Zaretskii
  2024-01-03 20:02                     ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: awrhygty @ 2024-01-03 19:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 67926

Eli Zaretskii <eliz@gnu.org> writes:

>> My interest is how to avoid naming problems.
>> There are more difficulties in Japanese.
>> Japanese characters in file names are normally encoded in cp932.
>> Encoded characters may have '[', '\' or ']' as a second byte.
>>   (encode-coding-string "ゼソゾ" 'cp932)
>>   => "\203[\203\\\203]"
>> Subfiles of such names can not be extracted normally.
>
> I don't think we can solve this in Emacs: non-ASCII file names in zip
> archives are a mess, even before you consider the fact that zip
> archives are frequently moved between systems.  For starters, how can
> one know in advance what is the encoding of file names in an arbitrary
> zip archive?  This will bite you even if we do everything in Emacs,
> and even if someone does submit patches to implement all the
> compression methods.

So I need a extractor without subfile names.
It is more usefull to extract contents with broken names than unable to
extract contents at all.

And I found my unzip.exe cannot extract BZIP2 or LZMA compressed
subfiles created by python zipfile module. I doubt unzip.exe does not
work for all compression methods.

By the way, I didn't know zlib-decompress-region function.
Now subfiles compressed with deflate method can be extracted
only with elisp program.

(advice-add #'archive-zip-extract :override
            #'archive-zip-decompress-content)

(defun archive-zip-decompress-content (archive name)
  (let* ((desc archive-subfile-mode)
         (buf (current-buffer))
         (bufname (buffer-file-name)))
    (set-buffer archive-superior-buffer)
    (save-restriction
      (widen)
      (let* ((file-beg archive-proper-file-start)
             (p0 (+ file-beg (archive--file-desc-pos desc)))
             (p  (+ file-beg (archive-l-e (+ p0 42) 4)))
             (bitflags (archive-l-e (+ p  6) 2))
             (method   (archive-l-e (+ p  8) 2))
             (compsize (archive-l-e (+ p0 20) 4))
             (fn-len   (archive-l-e (+ p 26) 2))
             (ex-len   (archive-l-e (+ p 28) 2))
             (data-beg (+ p 30 fn-len ex-len))
             (data-end (+ data-beg compsize))
             (coding-system-for-read  'no-conversion)
             (coding-system-for-write 'no-conversion)
             (default-directory temporary-file-directory))
        (cond ((/= 0 (logand bitflags 1))
               (message "Subfile is encrypted"))
              ((= method 0)
               (with-current-buffer buf
                 (insert-buffer-substring archive-superior-buffer
                                          data-beg data-end)))
              ((eq method 8)
               (let ((crc-32    (buffer-substring (+ p0 16) (+ p0 20)))
                     (orig-size (buffer-substring (+ p0 24) (+ p0 28)))
                     (header "\x1f\x8b\x08\0\0\0\0\0\0\0"))
                 (with-current-buffer buf
                   (set-buffer-multibyte nil)
                   (insert header)
                   (insert-buffer-substring archive-superior-buffer
                                            data-beg data-end)
                   (insert crc-32 orig-size)
                   (zlib-decompress-region (point-min) (point-max))
                   (set-buffer-multibyte 'to))))
              ((eq method 12)
               (call-process-region data-beg data-end
                                    "bzip2" nil buf nil "-cd"))
              (t (message "Unknown compression method")))))
    (set-buffer buf)))





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2024-01-03 19:53                   ` awrhygty
@ 2024-01-03 20:00                     ` Eli Zaretskii
  2024-01-03 20:02                     ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2024-01-03 20:00 UTC (permalink / raw)
  To: awrhygty; +Cc: 67926

> From: awrhygty@outlook.com
> Cc: 67926@debbugs.gnu.org
> Date: Thu, 04 Jan 2024 04:53:26 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I don't think we can solve this in Emacs: non-ASCII file names in zip
> > archives are a mess, even before you consider the fact that zip
> > archives are frequently moved between systems.  For starters, how can
> > one know in advance what is the encoding of file names in an arbitrary
> > zip archive?  This will bite you even if we do everything in Emacs,
> > and even if someone does submit patches to implement all the
> > compression methods.
> 
> So I need a extractor without subfile names.
> It is more usefull to extract contents with broken names than unable to
> extract contents at all.

Feel free to do it, for you personally.  But most people have other
needs: they need to extract files from zip archives like unzip program
does, and that's what Emacs gives them.  Your personal needs can be
solved with Lisp programs you write for your own use.  Here we are
talking about what arc-mode.el should do for everyone, not just for
you.  And your special needs don't necessarily mean others have the
same needs.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2024-01-03 19:53                   ` awrhygty
  2024-01-03 20:00                     ` Eli Zaretskii
@ 2024-01-03 20:02                     ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2024-01-03 20:02 UTC (permalink / raw)
  To: awrhygty; +Cc: 67926

> From: awrhygty@outlook.com
> Cc: 67926@debbugs.gnu.org
> Date: Thu, 04 Jan 2024 04:53:26 +0900
> 
> By the way, I didn't know zlib-decompress-region function.
> Now subfiles compressed with deflate method can be extracted
> only with elisp program.

zlib-decompress-region is only available if Emacs was built with the
zlib library, which is an optional dependency.  We prefer not to rely
on optional libraries for features that could be useful even when the
optional dependency is not available.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#67926: 29.1; fail to extract ZIP subfile named with [...]
  2023-12-28 14:56       ` Eli Zaretskii
@ 2024-01-04 10:46         ` Eli Zaretskii
  0 siblings, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2024-01-04 10:46 UTC (permalink / raw)
  To: awrhygty; +Cc: 67926-done

> Cc: 67926@debbugs.gnu.org
> Date: Thu, 28 Dec 2023 16:56:51 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> 
> So the original problem of this bug report is now fixed, and I think
> we can close this bug.

Now done.





^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-01-04 10:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-20 11:23 bug#67926: 29.1; fail to extract ZIP subfile named with [...] awrhygty
2023-12-23 10:16 ` Eli Zaretskii
2023-12-23 11:47   ` Andreas Schwab
2023-12-23 11:58     ` Eli Zaretskii
2023-12-26 14:51   ` awrhygty
2023-12-26 17:25     ` Eli Zaretskii
2023-12-27 14:36       ` awrhygty
2023-12-27 16:48         ` Eli Zaretskii
2023-12-28  0:38           ` awrhygty
2023-12-28  6:31             ` Eli Zaretskii
2023-12-28 13:09               ` awrhygty
2023-12-28 14:06                 ` Eli Zaretskii
2024-01-03 19:53                   ` awrhygty
2024-01-03 20:00                     ` Eli Zaretskii
2024-01-03 20:02                     ` Eli Zaretskii
2023-12-28 14:56       ` Eli Zaretskii
2024-01-04 10:46         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).