unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters
@ 2022-02-25  9:04 Kai Tetzlaff
  2022-02-25 12:19 ` Lars Ingebrigtsen
       [not found] ` <87bkmwi0ut.fsf@tetzco.de>
  0 siblings, 2 replies; 38+ messages in thread
From: Kai Tetzlaff @ 2022-02-25  9:04 UTC (permalink / raw)
  To: 54154

[-- Attachment #1: Type: text/plain, Size: 568 bytes --]

The sieve-manage package uses the managesieve protocol (s.
https://datatracker.ietf.org/doc/html/draft-ietf-sieve-managesieve) to
communicate with a sieve server process.

When the sieve-manage client retrieves a script from the server it uses
the `sieve-manage-getscript' function to send command `GETSCRIPT
"<scriptname>"<CRLF>` to the server and tries to parse the response.

If the downloaded sieve script contains multibyte characters the attempt
to parse the response results in an infloop (in
`sieve-manage-parse-okno').

To reproduce, save the following code


[-- Attachment #2: minimal example --]
[-- Type: application/emacs-lisp, Size: 1077 bytes --]

(require 'sieve-manage)
(require 'cl) ; for flet below
(let* ((script-name "test.sieve")
       ;; variables `sieve-manage-server' and `sieve-manage-port' are
       ;; used in `sieve-manage-make-process-buffer'
       (sieve-manage-server)
       (sieve-manage-port "sieve")
       (sieve-buffer (sieve-manage-make-process-buffer))
       (output-buffer (generate-new-buffer script-name)))
  (with-current-buffer sieve-buffer
    (goto-char (point-min))
    ;; simulate managesieve response-getscript with a single multibyte
    ;; character: `ä`
    (insert "{32}\r\nif body :matches \"ä\" { stop; }\n\r\nOK \"Getscript completed.\"\r\n"))
  ;; use flet to mock some functions in call chain of sieve-manage-getscript
  (flet ((sieve-manage-send (_) nil)
         (accept-process-output (&optional _ _ _ _) nil)
         (get-buffer-process (_) nil))
    ;; watch `sieve-manage-getscript' infloop
    (sieve-manage-getscript script-name output-buffer sieve-buffer)
    (kill-buffer sieve-buffer)))

;; Local Variables:
;; coding: utf-8-unix
;; End:

[-- Attachment #3: Type: text/plain, Size: 1875 bytes --]

to a file and run: emacs -Q -l <file>


* Detailed analysis:

The example code sets up a response buffer for a successful managesieve
`response-getscript` defined as:

    response-getscript = (sieve-script CRLF response-ok)

Here's the buffer content:
```
1: {32}<CRLF>
2: if body :matches "ä" { stop; }<LF>
3: <CRLF>
4: OK "Getscript completed."<CRLF>
```

It comprises:

1. lines 1-2 (`sieve-script`): encoded as a managesieve `literal-s2c`
   string which:
   a. starts with a length in the form '{<OCTETS>}<CRLF>' (i.e. 32)
   b. followed by the string data (i.e. the actual script: 'if
      body :matches "ä" { stop;}<LF>') using UTF-8 encoding

2. line 3 (`CRLF`)

3. line 4 (`response-ok`): 'OK' SP "Getscript completed." (the latter is
   an optional `quoted` string which can be shown to the user)

The sieve-manage code is parsing the length into an integer and uses it
to skip over `sieve-script` to get to the start of line 3 (<CRLF> -
empty) which is then also skipped to get to line 4 in order to parse the
result ('OK').


Now the problem:

Since sieve-manage explicitly enables multibyte support in the response
buffer (by calling '(mm-enable-multibyte)' in
`sieve-manage-make-process-buffer`) and uses `goto-char' for the purpose
of skipping/jumping over `sieve-script`, each multibyte character in
`sieve-script` causes the jump to go 1 (2, 3) character(s) too far. In
the example above there's only a single 2 byte character (`ä`), so
instead of skipping to the beginning of line 3, we land in the middle of
<CRLF>: <CR(point)LF>.

This causes the following attempt to parse the result code (i.e. the 'OK
"Getscript completed."<CRLF>' line) to infloop in
`sieve-manage-parse-okno'.


* An attempt of a fix:

As far as I can tell, the attached patch fixes the issue for the
GETSCRIPT command.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: Fix for multibyte issue in `sieve-manage-getscript' --]
[-- Type: text/x-diff, Size: 1131 bytes --]

diff --git a/lisp/net/sieve-manage.el b/lisp/net/sieve-manage.el
index 50342b9105..8020e6fdca 100644
--- a/lisp/net/sieve-manage.el
+++ b/lisp/net/sieve-manage.el
@@ -449,10 +449,19 @@ sieve-manage-deletescript
 (defun sieve-manage-getscript (name output-buffer &optional buffer)
   (with-current-buffer (or buffer (current-buffer))
     (sieve-manage-send (format "GETSCRIPT \"%s\"" name))
+    (set-buffer-multibyte nil)
     (let ((script (sieve-manage-parse-string)))
+      (set-buffer-multibyte t)
       (sieve-manage-parse-crlf)
       (with-current-buffer output-buffer
-	(insert script))
+        (insert (decode-coding-string
+                 script
+                 ;; not sure if using `buffer-file-coding-system' is
+                 ;; the right approach, it might be better to hardcode
+                 ;; it to utf-8-* (managesieve requires UTF-8
+                 ;; encoding) but in that case, which variant of
+                 ;; utf-8-unix/dos/...  is to be used?
+                 buffer-file-coding-system t)))
       (sieve-manage-parse-okno))))
 
 (defun sieve-manage-setactive (name &optional buffer)

[-- Attachment #5: Type: text/plain, Size: 1904 bytes --]



* Additional remarks:

There might be more problems. E.g. `sieve-manage-putscript' contains the
following comment:

    ;; Here we assume that the coding-system will
    ;; replace each char with a single byte.
    ;; This is always the case if `content' is
    ;; a unibyte string.

which seems to indicate that it might also have an issue with multibyte
content (even though I have not experienced any uploading issues). I
will try do some more testing to check that.


In general, it is also not clear to me why the response (or process)
buffer needs to be multibyte enabled at all as it should only be used
for the line/byte oriented protocol data. But the commit message of
8e16fb987df9b which introduced the multibyte handling states:

    commit 8e16fb987df9b80b8328e9dbf80351a5f9d85bbb
    Author: Albert Krewinkel <krewinkel@moltkeplatz.de>
    Date:   2013-06-11 07:32:25 +0000
    ...
        * Enable Multibyte for SieveManage buffers: The parser won't properly
          handle umlauts and line endings unless multibyte is turned on in the
          process buffer.
    ...

so this was obviously done on purpose. I contacted Albert about this but
he couldn't remember the details (it's been nearly 10 years).



In GNU Emacs 29.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.31, cairo version 1.16.0)
 of 2022-02-18 built on moka
Repository revision: 51e51ce2df46fc0c6e17a97e74b00366bb9c09d8
Repository branch: master
System Description: Debian GNU/Linux bookworm/sid

Configured using:
 'configure --with-pgtk --with-native-compilation'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY
PDUMPER PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS XIM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-01-23 17:53 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25  9:04 bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters Kai Tetzlaff
2022-02-25 12:19 ` Lars Ingebrigtsen
2022-02-25 13:10   ` Lars Ingebrigtsen
2022-02-25 16:00     ` Kai Tetzlaff
2022-02-26 15:07       ` Lars Ingebrigtsen
2022-02-28 12:27         ` Kai Tetzlaff
2022-09-06 11:34           ` Lars Ingebrigtsen
2022-02-28 12:35         ` Kai Tetzlaff
2022-02-28 13:06           ` Lars Ingebrigtsen
2022-02-28 13:08           ` Lars Ingebrigtsen
2022-02-28 13:03         ` Kai Tetzlaff
     [not found] ` <87bkmwi0ut.fsf@tetzco.de>
     [not found]   ` <87fsc8i2c5.fsf@tetzco.de>
     [not found]     ` <87bkmw8b02.fsf@tetzco.de>
2023-01-18 18:28       ` bug#54154: Emacs commit ae963e80a79f5a9184daabfc8197f211a39b136d (sieve-manage) Herbert J. Skuhra
2023-01-18 19:17         ` Eli Zaretskii
2023-01-18 23:22           ` Herbert J. Skuhra
2023-01-19  4:06             ` Kai Tetzlaff
2023-01-19  7:45               ` Eli Zaretskii
2023-01-19 12:38                 ` Kai Tetzlaff
2023-01-19 14:08                   ` Eli Zaretskii
2023-01-19 15:59                     ` Kai Tetzlaff
2023-01-19 17:41                       ` Eli Zaretskii
2023-01-19 21:33                         ` Kai Tetzlaff
2023-01-20  6:54                         ` Kai Tetzlaff
2023-01-22  2:12                         ` Kai Tetzlaff
2023-01-23  0:59                         ` Kai Tetzlaff
2023-01-23 12:47                           ` Herbert J. Skuhra
2023-01-23 13:01                             ` Kai Tetzlaff
2023-01-23 13:36                               ` Herbert J. Skuhra
2023-01-23 13:57                                 ` Kai Tetzlaff
2023-01-23 14:27                                   ` Andreas Schwab
2023-01-23 17:07                                     ` Kai Tetzlaff
2023-01-23 17:53                                       ` Andreas Schwab
2023-01-23 13:40                               ` Eli Zaretskii
2023-01-23 16:22                                 ` Kai Tetzlaff
2023-01-23 16:49                                   ` Eli Zaretskii
2023-01-23 17:12                                     ` Kai Tetzlaff
2023-01-19 13:22                 ` Kai Tetzlaff
2023-01-19 14:16                 ` Kai Tetzlaff
2023-01-19  4:50             ` bug#54154: [update] " Kai Tetzlaff

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).