unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#42904: [PATCH] Non-Unicode frame title crashes Emacs on macOS
@ 2020-08-17 14:11 Mattias Engdegård
  2020-08-17 14:54 ` Andrii Kolomoiets
  2020-08-17 15:55 ` Eli Zaretskii
  0 siblings, 2 replies; 29+ messages in thread
From: Mattias Engdegård @ 2020-08-17 14:11 UTC (permalink / raw)
  To: 42904; +Cc: Alan Third

[-- Attachment #1: Type: text/plain, Size: 1073 bytes --]

Setting a frame title that contains non-Unicode characters causes a crash in the NS backend. (Other platforms may or may not deal with it appropriately -- if you have the opportunity to test, please report.)

Since the title is typically derived from the buffer name, this is easily reproduced by

 (rename-buffer "n\351")

The crash occurs in ns_set_name_internal:

  encoded_name = ENCODE_UTF_8 (name);

Here encoded_name is still "n\351" (a 2 byte unibyte string), because the \351 couldn't be encoded.

  str = [NSString stringWithUTF8String: SSDATA (encoded_name)];

Now str is nil since "n\351" isn't valid UTF-8.

    [[view window] setTitle: str];

Here we get an NS crash because nil isn't a valid setTitle: argument.

Proposed patch attached. I didn't find any obvious way to encode an Emacs string into valid UTF-8 (with bad parts replaced) so a new function was written. The corresponding Lisp function was marked internal because it's only there for test purposes, but it could of course be promoted to non-internal if someone wants it.


[-- Attachment #2: 0001-Fix-NS-crash-on-invalid-frame-title-string.patch --]
[-- Type: application/octet-stream, Size: 5805 bytes --]

From 13b43b826a7f7f539484babc275cd9a19a64da9e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Mon, 17 Aug 2020 15:37:33 +0200
Subject: [PATCH] Fix NS crash on invalid frame title string

Instead of blindly assuming that Emacs strings are valid UTF-8, which
they are not, convert them in a more careful way using U+FFFD for
replacing invalid values.

* src/coding.c (string_to_valid_utf_8)
Finternal_string_to_valid_utf_8): New functions.
* src/coding.h: Prototype.
* src/nsfns.m (ns_set_name_internal): Use string_to_valid_utf_8.
* test/src/coding-tests.el (coding-string-to-valid-utf-8): New test.
---
 src/coding.c             | 56 ++++++++++++++++++++++++++++++++++++++++
 src/coding.h             |  2 ++
 src/nsfns.m              |  6 ++---
 test/src/coding-tests.el | 14 ++++++++++
 4 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/src/coding.c b/src/coding.c
index 51bd441de9..65493b07ac 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -9564,6 +9564,61 @@ code_convert_string_norecord (Lisp_Object string, Lisp_Object coding_system,
 }
 
 
+/* Convert STRING to a pure Unicode string.
+   Non-Unicode values are substituted with U+FFFD REPLACEMENT CHARACTER.
+   Return a unibyte or multibyte string, possibly STRING itself,
+   whose SDATA is guaranteed to be UTF-8.  */
+Lisp_Object
+string_to_valid_utf_8 (Lisp_Object string)
+{
+  if (string_ascii_p (string))
+    return string;
+  if (!STRING_MULTIBYTE (string))
+    string = string_to_multibyte (string);
+
+  /* Now STRING is multibyte.  */
+  unsigned char *buf = NULL;
+  unsigned char *d = NULL;
+  unsigned char *s = SDATA (string);
+  unsigned char *end = s + SBYTES (string);
+  while (s < end)
+    {
+      int len;
+      int c = string_char_and_length (s, &len);
+      if (c > 0x10ffff || char_surrogate_p (c))
+        {
+          /* Not valid for UTF-8.  */
+          if (!d)
+            {
+              buf = xmalloc (4 * SCHARS (string));
+              ptrdiff_t n = s - SDATA (string);
+              memcpy (buf, SDATA (string), n);
+              d = buf + n;
+            }
+          *d++ = 0357;          /* Use U+FFFD.  */
+          *d++ = 0277;
+          *d++ = 0275;
+          s += len;
+        }
+      else if (d)
+        do *d++ = *s++; while (--len);
+      else
+        s += len;
+    }
+  Lisp_Object ret = buf ? make_multibyte_string (buf, SCHARS (string), d - buf)
+                        : string;
+  xfree (buf);
+  return ret;
+}
+
+DEFUN ("internal-string-to-valid-utf-8", Finternal_string_to_valid_utf_8,
+       Sinternal_string_to_valid_utf_8, 1, 1, 0,
+       doc:  /* Internal use only.  */)
+     (Lisp_Object string)
+{
+  return string_to_valid_utf_8 (string);
+}
+
 /* Return the gap address of BUFFER.  If the gap size is less than
    NBYTES, enlarge the gap in advance.  */
 
@@ -11811,6 +11866,7 @@ syms_of_coding (void)
   defsubr (&Scoding_system_aliases);
   defsubr (&Scoding_system_eol_type);
   defsubr (&Scoding_system_priority_list);
+  defsubr (&Sinternal_string_to_valid_utf_8);
 
   DEFVAR_LISP ("coding-system-list", Vcoding_system_list,
 	       doc: /* List of coding systems.
diff --git a/src/coding.h b/src/coding.h
index c2a7b2a00f..98f00a1731 100644
--- a/src/coding.h
+++ b/src/coding.h
@@ -709,6 +709,8 @@ #define UTF_16_LOW_SURROGATE_P(val) \
 extern void encode_coding_object (struct coding_system *,
                                   Lisp_Object, ptrdiff_t, ptrdiff_t,
                                   ptrdiff_t, ptrdiff_t, Lisp_Object);
+extern Lisp_Object string_to_valid_utf_8 (Lisp_Object);
+
 /* Defined in this file.  */
 INLINE int surrogates_to_codepoint (int, int);
 
diff --git a/src/nsfns.m b/src/nsfns.m
index 628233ea0d..3e84568991 100644
--- a/src/nsfns.m
+++ b/src/nsfns.m
@@ -405,9 +405,7 @@ Turn the input menu (an NSMenu) into a lisp list for tracking on lisp side.
   NSString *str;
   NSView *view = FRAME_NS_VIEW (f);
 
-
-  encoded_name = ENCODE_UTF_8 (name);
-
+  encoded_name = string_to_valid_utf_8 (name);
   str = [NSString stringWithUTF8String: SSDATA (encoded_name)];
 
 
@@ -418,7 +416,7 @@ Turn the input menu (an NSMenu) into a lisp list for tracking on lisp side.
   if (!STRINGP (f->icon_name))
     encoded_icon_name = encoded_name;
   else
-    encoded_icon_name = ENCODE_UTF_8 (f->icon_name);
+    encoded_icon_name = string_to_valid_utf_8 (f->icon_name);
 
   str = [NSString stringWithUTF8String: SSDATA (encoded_icon_name)];
 
diff --git a/test/src/coding-tests.el b/test/src/coding-tests.el
index c438ae22ce..f53f63eb48 100644
--- a/test/src/coding-tests.el
+++ b/test/src/coding-tests.el
@@ -429,6 +429,20 @@ coding-check-coding-systems-region
                  '((iso-latin-1 3) (us-ascii 1 3))))
   (should-error (check-coding-systems-region "å" nil '(bad-coding-system))))
 
+(ert-deftest coding-string-to-valid-utf-8 ()
+  (let ((empty "")
+        (valid-uni "Alpha")
+        (valid-multi "m\001ü∫𝔻"))
+    (should (eq (internal-string-to-valid-utf-8 empty) empty))
+    (should (eq (internal-string-to-valid-utf-8 valid-uni) valid-uni))
+    (should (eq (internal-string-to-valid-utf-8 valid-multi) valid-multi)))
+  (should (equal (internal-string-to-valid-utf-8 "unpaired\ud9a3surrogate")
+                 "unpaired\ufffdsurrogate"))
+  (should (equal (internal-string-to-valid-utf-8 "raw\200\377bytes")
+                 "raw\ufffd\ufffdbytes"))
+  (should (equal (internal-string-to-valid-utf-8 "all§\300at\udffeonce")
+                 "all§\ufffdat\ufffdonce")))
+
 ;; Local Variables:
 ;; byte-compile-warnings: (not obsolete)
 ;; End:
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-08-23 17:23 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-17 14:11 bug#42904: [PATCH] Non-Unicode frame title crashes Emacs on macOS Mattias Engdegård
2020-08-17 14:54 ` Andrii Kolomoiets
2020-08-17 15:55   ` Mattias Engdegård
2020-08-17 15:55 ` Eli Zaretskii
2020-08-17 16:11   ` Mattias Engdegård
2020-08-17 17:05     ` Eli Zaretskii
2020-08-17 18:48       ` Mattias Engdegård
2020-08-17 19:56         ` Alan Third
2020-08-18  8:07           ` Mattias Engdegård
2020-08-18  8:43             ` Alan Third
2020-08-18 11:48               ` Mattias Engdegård
2020-08-18 12:22                 ` Eli Zaretskii
2020-08-18 17:28                 ` Alan Third
2020-08-20  9:27                   ` Mattias Engdegård
2020-08-20 13:24                     ` Eli Zaretskii
2020-08-20 18:46                       ` Mattias Engdegård
2020-08-20 19:13                         ` Eli Zaretskii
2020-08-21  9:39                           ` Mattias Engdegård
2020-08-21 13:26                             ` Eli Zaretskii
2020-08-21 14:53                               ` Mattias Engdegård
2020-08-21 15:27                                 ` Eli Zaretskii
2020-08-21 15:50                                   ` Mattias Engdegård
2020-08-23 17:23                                     ` Mattias Engdegård
2020-08-20 13:24                     ` Alan Third
2020-08-20 17:44                       ` Mattias Engdegård
2020-08-18 12:24               ` Eli Zaretskii
2020-08-18 14:11                 ` Mattias Engdegård
2020-08-18 14:40                   ` Eli Zaretskii
2020-08-18 15:21                     ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).