* bug#33133: 26.1.50; zlib-decompress-region too rigid @ 2018-10-23 23:07 Katsumi Yamaoka 2018-10-24 0:26 ` Noam Postavsky 0 siblings, 1 reply; 8+ messages in thread From: Katsumi Yamaoka @ 2018-10-23 23:07 UTC (permalink / raw) To: 33133; +Cc: Kevin Ryde [-- Attachment #1: Type: text/plain, Size: 1663 bytes --] Hi, Whereas `gzip -d' does, zlib-decompress-region doesn't decompress corrupted data of a certain kind. For instance, visiting https://www.gutenberg.org/no-such-page-exists using eww shows raw gzipped data. The data extracted is attached. As for `gzip -d', it says "unexpected end of file" in stderr. Here is a recipe to reproduce zlib-decompress-region not working: (let ((buffer (get-buffer-create "*testing*")) (coding-system-for-read 'binary) (cw (selected-window)) jka-compr-compression-info-list format-alist) (switch-to-buffer-other-window buffer) (erase-buffer) (set-buffer-multibyte nil) (insert-file-contents "/TEMP/corrupted-data.gz") (sit-for 1) (prog1 (zlib-decompress-region (point-min) (point-max)) (select-window cw))) If there is no prospect to improve zlib-decompress-region, how about this workaround? --- url-http.el~ 2018-09-12 07:48:16.110765500 +0000 +++ url-http.el 2018-10-23 23:04:48.060829900 +0000 @@ -951,7 +951,12 @@ (widen) (goto-char (point-min)) (when (search-forward "\n\n") - (zlib-decompress-region (point) (point-max))))))) + (or (zlib-decompress-region (point) (point-max)) + (let ((coding-system-for-write 'binary) + (coding-system-for-read 'binary) + (default-process-coding-system (cons 'binary 'binary))) + (zerop (call-process-region (point) (point-max) "gzip" + t '(t nil) nil "-d"))))))))) ;; Miscellaneous (defun url-http-activate-callback () In GNU Emacs 26.1.50 (build 1, x86_64-unknown-cygwin, GTK+ Version 3.22.28) of 2018-10-22 built on localhost Windowing system distributor 'The Cygwin/X Project', version 11.0.12001000 [-- Attachment #2: corrupted-data.gz --] [-- Type: application/octet-stream, Size: 1667 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#33133: 26.1.50; zlib-decompress-region too rigid 2018-10-23 23:07 bug#33133: 26.1.50; zlib-decompress-region too rigid Katsumi Yamaoka @ 2018-10-24 0:26 ` Noam Postavsky 2018-10-24 1:16 ` Katsumi Yamaoka 0 siblings, 1 reply; 8+ messages in thread From: Noam Postavsky @ 2018-10-24 0:26 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: Kevin Ryde, 33133 Katsumi Yamaoka <yamaoka@jpl.org> writes: > Whereas `gzip -d' does, zlib-decompress-region doesn't decompress > corrupted data of a certain kind. For instance, visiting > > https://www.gutenberg.org/no-such-page-exists > > using eww shows raw gzipped data. The data extracted is attached. > As for `gzip -d', it says "unexpected end of file" in stderr. > If there is no prospect to improve zlib-decompress-region, how > about this workaround? We could have zlib-decompress-region ignore unexpected eof as well, e.g., the below (though it should obviously be enhanced to depend on a new parameter): --- i/src/decompress.c +++ w/src/decompress.c @@ -206,7 +206,7 @@ DEFUN ("zlib-decompress-region", Fzlib_decompress_region, } while (inflate_status == Z_OK); - if (inflate_status != Z_STREAM_END) + if (inflate_status != Z_STREAM_END && inflate_status != Z_BUF_ERROR) return unbind_to (count, Qnil); unwind_data.start = 0; ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#33133: 26.1.50; zlib-decompress-region too rigid 2018-10-24 0:26 ` Noam Postavsky @ 2018-10-24 1:16 ` Katsumi Yamaoka 2018-10-27 21:48 ` Noam Postavsky 0 siblings, 1 reply; 8+ messages in thread From: Katsumi Yamaoka @ 2018-10-24 1:16 UTC (permalink / raw) To: Noam Postavsky; +Cc: Kevin Ryde, 33133 On Tue, 23 Oct 2018 20:26:59 -0400, Noam Postavsky wrote: > We could have zlib-decompress-region ignore unexpected eof as well, > e.g., the below (though it should obviously be enhanced to depend on a > new parameter): > --- i/src/decompress.c > +++ w/src/decompress.c > @@ -206,7 +206,7 @@ DEFUN ("zlib-decompress-region", Fzlib_decompress_region, > while (inflate_status == Z_OK); > - if (inflate_status != Z_STREAM_END) > + if (inflate_status != Z_STREAM_END && inflate_status != Z_BUF_ERROR) > return unbind_to (count, Qnil); > unwind_data.start = 0; I confirmed that it makes it work for the corrupted web site in question. Thank you! ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#33133: 26.1.50; zlib-decompress-region too rigid 2018-10-24 1:16 ` Katsumi Yamaoka @ 2018-10-27 21:48 ` Noam Postavsky 2018-10-28 15:41 ` Eli Zaretskii 0 siblings, 1 reply; 8+ messages in thread From: Noam Postavsky @ 2018-10-27 21:48 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: Kevin Ryde, 33133 [-- Attachment #1: Type: text/plain, Size: 608 bytes --] tags 33133 + patch quit Katsumi Yamaoka <yamaoka@jpl.org> writes: > On Tue, 23 Oct 2018 20:26:59 -0400, Noam Postavsky wrote: >> --- i/src/decompress.c >> +++ w/src/decompress.c >> @@ -206,7 +206,7 @@ DEFUN ("zlib-decompress-region", Fzlib_decompress_region, > >> while (inflate_status == Z_OK); > >> - if (inflate_status != Z_STREAM_END) >> + if (inflate_status != Z_STREAM_END && inflate_status != Z_BUF_ERROR) >> return unbind_to (count, Qnil); > >> unwind_data.start = 0; > > I confirmed that it makes it work for the corrupted web site in > question. Thank you! Here's a proper patch. [-- Attachment #2: patch --] [-- Type: text/plain, Size: 4856 bytes --] From 430ebd936b0bc41bd3e33e171938161846597196 Mon Sep 17 00:00:00 2001 From: Noam Postavsky <npostavs@gmail.com> Date: Sat, 27 Oct 2018 17:45:52 -0400 Subject: [PATCH v1] Allow partial decompression (Bug#33133) * src/decompress.c (Fzlib_decompress_region): Add optional ALLOW-PARTIAL parameter. * lisp/url/url-http.el (url-handle-content-transfer-encoding): Use it. * doc/lispref/text.texi (Decompression): Document it. * etc/NEWS: Announce it. --- doc/lispref/text.texi | 10 ++++++---- etc/NEWS | 6 ++++++ lisp/url/url-http.el | 5 +++-- src/decompress.c | 22 +++++++++++++++++----- 4 files changed, 32 insertions(+), 11 deletions(-) diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi index 6c38d8eed0..e39ba6a192 100644 --- a/doc/lispref/text.texi +++ b/doc/lispref/text.texi @@ -4462,14 +4462,16 @@ Decompression available. @end defun -@defun zlib-decompress-region start end +@defun zlib-decompress-region start end &optional allow-partial This function decompresses the region between @var{start} and @var{end}, using built-in zlib decompression. The region should contain data that were compressed with gzip or zlib. On success, the function replaces the contents of the region with the decompressed -data. On failure, the function leaves the region unchanged and -returns @code{nil}. This function can be called only in unibyte -buffers. +data. If @var{allow-partial} is @code{nil}, on failure, the function +leaves the region unchanged and returns @code{nil}. Otherwise, it +returns the number of bytes that were not decompressed and replaces +the region text by whatever data was successfully decompressed. This +function can be called only in unibyte buffers. @end defun diff --git a/etc/NEWS b/etc/NEWS index 3f86195695..395169253d 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1159,6 +1159,12 @@ to mean that it is not known whether DST is in effect. 'json-insert', 'json-parse-string', and 'json-parse-buffer'. These are implemented in C using the Jansson library. ++++ +** 'zlib-decompress-region' can partially decompress corrupted data. +If the new optional ALLOW-PARTIAL argument is passed, then the data +that was decompressed successfully before failing will be inserted +into the buffer. + ** Mailcap --- diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el index 6b5749e1bc..94ac660fcf 100644 --- a/lisp/url/url-http.el +++ b/lisp/url/url-http.el @@ -939,7 +939,8 @@ url-http-parse-headers (goto-char (point-min)) success)) -(declare-function zlib-decompress-region "decompress.c" (start end)) +(declare-function zlib-decompress-region "decompress.c" + (start end &optional allow-partial)) (defun url-handle-content-transfer-encoding () (let ((encoding (mail-fetch-field "content-encoding"))) @@ -951,7 +952,7 @@ url-handle-content-transfer-encoding (widen) (goto-char (point-min)) (when (search-forward "\n\n") - (zlib-decompress-region (point) (point-max))))))) + (zlib-decompress-region (point) (point-max) t)))))) ;; Miscellaneous (defun url-http-activate-callback () diff --git a/src/decompress.c b/src/decompress.c index 2836338216..3872014739 100644 --- a/src/decompress.c +++ b/src/decompress.c @@ -120,12 +120,18 @@ DEFUN ("zlib-available-p", Fzlib_available_p, Szlib_available_p, 0, 0, 0, DEFUN ("zlib-decompress-region", Fzlib_decompress_region, Szlib_decompress_region, - 2, 2, 0, + 2, 3, 0, doc: /* Decompress a gzip- or zlib-compressed region. Replace the text in the region by the decompressed data. -On failure, return nil and leave the data in place. + +If optional parameter ALLOW-PARTIAL is nil or omitted, on failure, +return nil and leave the data in place. Otherwise, return the number +of bytes that were not decompressed and replace the region text by +whatever data was successfully decompressed. If decompression is +completely successful return t. + This function can be called only in unibyte buffers. */) - (Lisp_Object start, Lisp_Object end) + (Lisp_Object start, Lisp_Object end, Lisp_Object allow_partial) { ptrdiff_t istart, iend, pos_byte; z_stream stream; @@ -206,8 +212,14 @@ DEFUN ("zlib-decompress-region", Fzlib_decompress_region, } while (inflate_status == Z_OK); + Lisp_Object ret = Qt; if (inflate_status != Z_STREAM_END) - return unbind_to (count, Qnil); + { + if (!NILP (allow_partial)) + ret = make_int (iend - pos_byte); + else + return unbind_to (count, Qnil); + } unwind_data.start = 0; @@ -218,7 +230,7 @@ DEFUN ("zlib-decompress-region", Fzlib_decompress_region, signal_after_change (istart, iend - istart, unwind_data.nbytes); update_compositions (istart, istart, CHECK_HEAD); - return unbind_to (count, Qt); + return unbind_to (count, ret); } \f -- 2.11.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* bug#33133: 26.1.50; zlib-decompress-region too rigid 2018-10-27 21:48 ` Noam Postavsky @ 2018-10-28 15:41 ` Eli Zaretskii 2018-10-31 0:25 ` Noam Postavsky 0 siblings, 1 reply; 8+ messages in thread From: Eli Zaretskii @ 2018-10-28 15:41 UTC (permalink / raw) To: Noam Postavsky; +Cc: user42_kevin, yamaoka, 33133 > From: Noam Postavsky <npostavs@gmail.com> > Date: Sat, 27 Oct 2018 17:48:26 -0400 > Cc: Kevin Ryde <user42_kevin@yahoo.com.au>, 33133@debbugs.gnu.org > > Here's a proper patch. Thanks. I have a few comments: > +data. If @var{allow-partial} is @code{nil}, on failure, the function We usually say "nil or omitted" for optional arguments. Also, I'd say "then on failure, ...", otherwise this could be misinterpreted as if "on failure" qualifies the "is nil" part. Same comment regarding the doc string of the function. > +leaves the region unchanged and returns @code{nil}. Otherwise, it > +returns the number of bytes that were not decompressed and replaces > +the region text by whatever data was successfully decompressed. This > +function can be called only in unibyte buffers. Maybe it would make sense here to say that this emulates what 'gzip' does? > + Lisp_Object ret = Qt; > if (inflate_status != Z_STREAM_END) > - return unbind_to (count, Qnil); > + { > + if (!NILP (allow_partial)) > + ret = make_int (iend - pos_byte); > + else > + return unbind_to (count, Qnil); > + } Hmm... should we display a warning message, like gzip does? ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#33133: 26.1.50; zlib-decompress-region too rigid 2018-10-28 15:41 ` Eli Zaretskii @ 2018-10-31 0:25 ` Noam Postavsky 2018-10-31 16:07 ` Eli Zaretskii 0 siblings, 1 reply; 8+ messages in thread From: Noam Postavsky @ 2018-10-31 0:25 UTC (permalink / raw) To: Eli Zaretskii; +Cc: user42_kevin, yamaoka, 33133 [-- Attachment #1: Type: text/plain, Size: 1656 bytes --] Eli Zaretskii <eliz@gnu.org> writes: >> +data. If @var{allow-partial} is @code{nil}, on failure, the function > > We usually say "nil or omitted" for optional arguments. Also, I'd say > "then on failure, ...", otherwise this could be misinterpreted as if > "on failure" qualifies the "is nil" part. > > Same comment regarding the doc string of the function. Makes sense, done. >> +leaves the region unchanged and returns @code{nil}. Otherwise, it >> +returns the number of bytes that were not decompressed and replaces >> +the region text by whatever data was successfully decompressed. This >> +function can be called only in unibyte buffers. > > Maybe it would make sense here to say that this emulates what 'gzip' > does? Hmm, maybe. I've added a mention of this, not sure if it actually helps. >> + Lisp_Object ret = Qt; >> if (inflate_status != Z_STREAM_END) >> + { >> + if (!NILP (allow_partial)) >> + ret = make_int (iend - pos_byte); > Hmm... should we display a warning message, like gzip does? Not unconditionally, I'd say. In the example which prompted this bug thread, a warning would just be a nuisance. We could just leave it up to the caller to print a warning message if they want, e.g.: (unless (eq t (zlib-decompress-region START END t)) (message "Incomplete decompression")) Or perhaps instead of returning the byte count, return an error indicator which the caller could use to contruct a warning message (this could allow for a slightly more specific message)? Or maybe it's easier if the ALLOW-PARTIAL parameter could have another possible value to control display of the warning message? [-- Attachment #2: patch --] [-- Type: text/plain, Size: 4896 bytes --] From 43a912181a4a30d826b2c016ca05b5c9d8daf3f4 Mon Sep 17 00:00:00 2001 From: Noam Postavsky <npostavs@gmail.com> Date: Sat, 27 Oct 2018 17:45:52 -0400 Subject: [PATCH v2] Allow partial decompression (Bug#33133) * src/decompress.c (Fzlib_decompress_region): Add optional ALLOW-PARTIAL parameter. * lisp/url/url-http.el (url-handle-content-transfer-encoding): Use it. * doc/lispref/text.texi (Decompression): Document it. * etc/NEWS: Announce it. --- doc/lispref/text.texi | 11 +++++++---- etc/NEWS | 6 ++++++ lisp/url/url-http.el | 5 +++-- src/decompress.c | 22 +++++++++++++++++----- 4 files changed, 33 insertions(+), 11 deletions(-) diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi index 6c38d8eed0..9f241a6c3f 100644 --- a/doc/lispref/text.texi +++ b/doc/lispref/text.texi @@ -4462,14 +4462,17 @@ Decompression available. @end defun -@defun zlib-decompress-region start end +@defun zlib-decompress-region start end &optional allow-partial This function decompresses the region between @var{start} and @var{end}, using built-in zlib decompression. The region should contain data that were compressed with gzip or zlib. On success, the function replaces the contents of the region with the decompressed -data. On failure, the function leaves the region unchanged and -returns @code{nil}. This function can be called only in unibyte -buffers. +data. If @var{allow-partial} is @code{nil} or omitted, then on +failure, the function leaves the region unchanged and returns +@code{nil}. Otherwise, it returns the number of bytes that were not +decompressed and replaces the region text by whatever data was +successfully decompressed. This function can be called only in +unibyte buffers. @end defun diff --git a/etc/NEWS b/etc/NEWS index 3f86195695..395169253d 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1159,6 +1159,12 @@ to mean that it is not known whether DST is in effect. 'json-insert', 'json-parse-string', and 'json-parse-buffer'. These are implemented in C using the Jansson library. ++++ +** 'zlib-decompress-region' can partially decompress corrupted data. +If the new optional ALLOW-PARTIAL argument is passed, then the data +that was decompressed successfully before failing will be inserted +into the buffer. + ** Mailcap --- diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el index 6b5749e1bc..94ac660fcf 100644 --- a/lisp/url/url-http.el +++ b/lisp/url/url-http.el @@ -939,7 +939,8 @@ url-http-parse-headers (goto-char (point-min)) success)) -(declare-function zlib-decompress-region "decompress.c" (start end)) +(declare-function zlib-decompress-region "decompress.c" + (start end &optional allow-partial)) (defun url-handle-content-transfer-encoding () (let ((encoding (mail-fetch-field "content-encoding"))) @@ -951,7 +952,7 @@ url-handle-content-transfer-encoding (widen) (goto-char (point-min)) (when (search-forward "\n\n") - (zlib-decompress-region (point) (point-max))))))) + (zlib-decompress-region (point) (point-max) t)))))) ;; Miscellaneous (defun url-http-activate-callback () diff --git a/src/decompress.c b/src/decompress.c index 2836338216..dd55fd68e8 100644 --- a/src/decompress.c +++ b/src/decompress.c @@ -120,12 +120,18 @@ DEFUN ("zlib-available-p", Fzlib_available_p, Szlib_available_p, 0, 0, 0, DEFUN ("zlib-decompress-region", Fzlib_decompress_region, Szlib_decompress_region, - 2, 2, 0, + 2, 3, 0, doc: /* Decompress a gzip- or zlib-compressed region. Replace the text in the region by the decompressed data. -On failure, return nil and leave the data in place. + +If optional parameter ALLOW-PARTIAL is nil or omitted, then on +failure, return nil and leave the data in place. Otherwise, return +the number of bytes that were not decompressed and replace the region +text by whatever data was successfully decompressed (similar to gzip). +If decompression is completely successful return t. + This function can be called only in unibyte buffers. */) - (Lisp_Object start, Lisp_Object end) + (Lisp_Object start, Lisp_Object end, Lisp_Object allow_partial) { ptrdiff_t istart, iend, pos_byte; z_stream stream; @@ -206,8 +212,14 @@ DEFUN ("zlib-decompress-region", Fzlib_decompress_region, } while (inflate_status == Z_OK); + Lisp_Object ret = Qt; if (inflate_status != Z_STREAM_END) - return unbind_to (count, Qnil); + { + if (!NILP (allow_partial)) + ret = make_int (iend - pos_byte); + else + return unbind_to (count, Qnil); + } unwind_data.start = 0; @@ -218,7 +230,7 @@ DEFUN ("zlib-decompress-region", Fzlib_decompress_region, signal_after_change (istart, iend - istart, unwind_data.nbytes); update_compositions (istart, istart, CHECK_HEAD); - return unbind_to (count, Qt); + return unbind_to (count, ret); } \f -- 2.11.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* bug#33133: 26.1.50; zlib-decompress-region too rigid 2018-10-31 0:25 ` Noam Postavsky @ 2018-10-31 16:07 ` Eli Zaretskii 2019-04-03 2:09 ` Noam Postavsky 0 siblings, 1 reply; 8+ messages in thread From: Eli Zaretskii @ 2018-10-31 16:07 UTC (permalink / raw) To: Noam Postavsky; +Cc: user42_kevin, yamaoka, 33133 > From: Noam Postavsky <npostavs@gmail.com> > Cc: user42_kevin@yahoo.com.au, yamaoka@jpl.org, 33133@debbugs.gnu.org > Date: Tue, 30 Oct 2018 20:25:10 -0400 > > >From 43a912181a4a30d826b2c016ca05b5c9d8daf3f4 Mon Sep 17 00:00:00 2001 > From: Noam Postavsky <npostavs@gmail.com> > Date: Sat, 27 Oct 2018 17:45:52 -0400 > Subject: [PATCH v2] Allow partial decompression (Bug#33133) > > * src/decompress.c (Fzlib_decompress_region): Add optional > ALLOW-PARTIAL parameter. > * lisp/url/url-http.el (url-handle-content-transfer-encoding): Use it. > * doc/lispref/text.texi (Decompression): Document it. > * etc/NEWS: Announce it. Thanks, this LGTM. ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#33133: 26.1.50; zlib-decompress-region too rigid 2018-10-31 16:07 ` Eli Zaretskii @ 2019-04-03 2:09 ` Noam Postavsky 0 siblings, 0 replies; 8+ messages in thread From: Noam Postavsky @ 2019-04-03 2:09 UTC (permalink / raw) To: Eli Zaretskii; +Cc: user42_kevin, yamaoka, 33133 tags 33133 fixed close 33133 27.1 quit Eli Zaretskii <eliz@gnu.org> writes: >> Subject: [PATCH v2] Allow partial decompression (Bug#33133) >> >> * src/decompress.c (Fzlib_decompress_region): Add optional >> ALLOW-PARTIAL parameter. >> * lisp/url/url-http.el (url-handle-content-transfer-encoding): Use it. >> * doc/lispref/text.texi (Decompression): Document it. >> * etc/NEWS: Announce it. > > Thanks, this LGTM. Pushed. [1: b36913d803]: 2019-04-02 22:02:32 -0400 Allow partial decompression (Bug#33133) https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=b36913d803ee22a314f2e0a27523fbadeb60dd2c ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-04-03 2:09 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-10-23 23:07 bug#33133: 26.1.50; zlib-decompress-region too rigid Katsumi Yamaoka 2018-10-24 0:26 ` Noam Postavsky 2018-10-24 1:16 ` Katsumi Yamaoka 2018-10-27 21:48 ` Noam Postavsky 2018-10-28 15:41 ` Eli Zaretskii 2018-10-31 0:25 ` Noam Postavsky 2018-10-31 16:07 ` Eli Zaretskii 2019-04-03 2:09 ` Noam Postavsky
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).