all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Alan Third <alan@idiocy.org>
Cc: 42904@debbugs.gnu.org
Subject: bug#42904: [PATCH] Non-Unicode frame title crashes Emacs on macOS
Date: Tue, 18 Aug 2020 13:48:10 +0200	[thread overview]
Message-ID: <243A5DA8-2865-485D-A8A2-1F543B046BAA@acm.org> (raw)
In-Reply-To: <20200818084306.GA89999@breton.holly.idiocy.org>

[-- Attachment #1: Type: text/plain, Size: 1291 bytes --]

18 aug. 2020 kl. 10.43 skrev Alan Third <alan@idiocy.org>:

>> Presumably this should be utf_16be on big-endian platforms. We still support PowerPC macOS, don't we?
> 
> No, however I imagine we support GNUstep on big endian systems.

Well then, it's easy to deal with.

>>> +  str = [NSString stringWithCharacters: (const unichar *) SDATA (encoded_name)
>> 
>> Is SDATA guaranteed to be 16-bit aligned? Doesn't matter on x86 or
>> PowerPC, but strictly speaking...
> 
> I've no idea, I adapted the code from make_multibyte_string in
> alloc.c, and one of it's callers (although I can't remember which
> right now). I'm expecting Eli to appear and tell me this is the
> entirely wrong way of doing this. ;)

Data alignment trapping is optional on 64-bit ARM but I'd be surprised if macOS enabled it. It might be hazardous for all the GNUStep-on-MIPS workstations.

> Anyway, as I understand it the internal representation of NS strings
> are UTF-16, so the conversion through UTF-8 seems a bit of a waste if
> we can go direct.

Maybe, but the conversion to UTF-16 then has to be done on the Emacs side instead, probably less efficiently than in the NS libs. It's probably a wash.

Anyway, here is an alternative patch using your method. Tell us what you think.


[-- Attachment #2: 0001-UTF-16-Fix-NS-crash-on-invalid-frame-title-string-bug-42904.patch --]
[-- Type: application/octet-stream, Size: 5209 bytes --]

From 9858f0409cc83eefcc153109e180c230868d20a5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Tue, 18 Aug 2020 12:58:12 +0200
Subject: [PATCH] Fix NS crash on invalid frame title string (bug#42904)

Instead of blindly assuming that all Emacs strings are valid UTF-8,
which they are not, use UTF-16 because this conversion seems to be
more robust (after making sure that they use the correct replacement
character).  Unpaired surrogates will still go through to the NSString
objects, but the NS libs handle them gracefully.

* src/nsfns.m (string_to_nsstring): New function.
(ns_set_name_internal): Use string_to_nsstring.
* lisp/international/mule-conf.el (utf-16le, utf-16be)
(utf-16le-with-signature, utf-16be-with-signature, utf-16):
Use U+FFFD as :default-char instead of ASCII space.
* test/src/coding-tests.el (coding-utf-16-replacement-char): New test.
---
 lisp/international/mule-conf.el |  5 +++++
 src/nsfns.m                     | 34 +++++++++++++++++++--------------
 test/src/coding-tests.el        | 12 ++++++++++++
 3 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
index edda79ba4e..b9acafc158 100644
--- a/lisp/international/mule-conf.el
+++ b/lisp/international/mule-conf.el
@@ -1336,6 +1336,7 @@ 'utf-16le
   :mnemonic ?U
   :charset-list '(unicode)
   :endian 'little
+  :default-char #xfffd
   :mime-text-unsuitable t
   :mime-charset 'utf-16le)
 
@@ -1345,6 +1346,7 @@ 'utf-16be
   :mnemonic ?U
   :charset-list '(unicode)
   :endian 'big
+  :default-char #xfffd
   :mime-text-unsuitable t
   :mime-charset 'utf-16be)
 
@@ -1355,6 +1357,7 @@ 'utf-16le-with-signature
   :charset-list '(unicode)
   :bom t
   :endian 'little
+  :default-char #xfffd
   :mime-text-unsuitable t
   :mime-charset 'utf-16)
 
@@ -1365,6 +1368,7 @@ 'utf-16be-with-signature
   :charset-list '(unicode)
   :bom t
   :endian 'big
+  :default-char #xfffd
   :mime-text-unsuitable t
   :mime-charset 'utf-16)
 
@@ -1375,6 +1379,7 @@ 'utf-16
   :charset-list '(unicode)
   :bom '(utf-16le-with-signature . utf-16be-with-signature)
   :endian 'big
+  :default-char #xfffd
   :mime-text-unsuitable t
   :mime-charset 'utf-16)
 
diff --git a/src/nsfns.m b/src/nsfns.m
index 628233ea0d..e514585dc1 100644
--- a/src/nsfns.m
+++ b/src/nsfns.m
@@ -398,29 +398,35 @@ Turn the input menu (an NSMenu) into a lisp list for tracking on lisp side.
         [NSString stringWithUTF8String: SSDATA (arg)]];
 }
 
+static NSString *
+string_to_nsstring (Lisp_Object string)
+{
+  Lisp_Object coding =
+#ifdef WORDS_BIGENDIAN
+    Qutf_16be
+#else
+    Qutf_16le
+#endif
+    ;
+  Lisp_Object utf16 = code_convert_string_norecord (string, coding, true);
+  /* Here we somewhat precariously assume that SDATA is 16-bit aligned,
+     or that unaligned access is OK.  */
+  return [NSString stringWithCharacters: (const unichar *) SDATA (utf16)
+                                 length: SBYTES (utf16) / sizeof (unichar)];
+}
+
 static void
 ns_set_name_internal (struct frame *f, Lisp_Object name)
 {
-  Lisp_Object encoded_name, encoded_icon_name;
-  NSString *str;
   NSView *view = FRAME_NS_VIEW (f);
-
-
-  encoded_name = ENCODE_UTF_8 (name);
-
-  str = [NSString stringWithUTF8String: SSDATA (encoded_name)];
-
+  NSString *str = string_to_nsstring (name);
 
   /* Don't change the name if it's already NAME.  */
   if (! [[[view window] title] isEqualToString: str])
     [[view window] setTitle: str];
 
-  if (!STRINGP (f->icon_name))
-    encoded_icon_name = encoded_name;
-  else
-    encoded_icon_name = ENCODE_UTF_8 (f->icon_name);
-
-  str = [NSString stringWithUTF8String: SSDATA (encoded_icon_name)];
+  if (STRINGP (f->icon_name))
+    str = string_to_nsstring (f->icon_name);
 
   if ([[view window] miniwindowTitle]
       && ! [[[view window] miniwindowTitle] isEqualToString: str])
diff --git a/test/src/coding-tests.el b/test/src/coding-tests.el
index c438ae22ce..8b0adf0ad8 100644
--- a/test/src/coding-tests.el
+++ b/test/src/coding-tests.el
@@ -429,6 +429,18 @@ coding-check-coding-systems-region
                  '((iso-latin-1 3) (us-ascii 1 3))))
   (should-error (check-coding-systems-region "å" nil '(bad-coding-system))))
 
+(ert-deftest coding-utf-16-replacement-char ()
+  (should (equal (encode-coding-string "A\351B" 'utf-16be)
+                 (unibyte-string 0 ?A #xff #xfd 0 ?B)))
+  (should (equal (encode-coding-string "A\351B" 'utf-16le)
+                 (unibyte-string ?A 0 #xfd #xff ?B 0)))
+  (should (equal (encode-coding-string "A\ud8b6BΣ\227D𝄞" 'utf-16be)
+                 (unibyte-string 0 ?A #xd8 #xb6 0 ?B #x03 #xa3 #xff #xfd 0 ?D
+                                 #xd8 #x34 #xdd #x1e)))
+  (should (equal (encode-coding-string "A\ud8b6BΣ\227D𝄞" 'utf-16le)
+                 (unibyte-string ?A 0 #xb6 #xd8 ?B 0 #xa3 #x03 #xfd #xff ?D 0
+                                 #x34 #xd8 #x1e #xdd))))
+
 ;; Local Variables:
 ;; byte-compile-warnings: (not obsolete)
 ;; End:
-- 
2.21.1 (Apple Git-122.3)


  reply	other threads:[~2020-08-18 11:48 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-17 14:11 bug#42904: [PATCH] Non-Unicode frame title crashes Emacs on macOS Mattias Engdegård
2020-08-17 14:54 ` Andrii Kolomoiets
2020-08-17 15:55   ` Mattias Engdegård
2020-08-17 15:55 ` Eli Zaretskii
2020-08-17 16:11   ` Mattias Engdegård
2020-08-17 17:05     ` Eli Zaretskii
2020-08-17 18:48       ` Mattias Engdegård
2020-08-17 19:56         ` Alan Third
2020-08-18  8:07           ` Mattias Engdegård
2020-08-18  8:43             ` Alan Third
2020-08-18 11:48               ` Mattias Engdegård [this message]
2020-08-18 12:22                 ` Eli Zaretskii
2020-08-18 17:28                 ` Alan Third
2020-08-20  9:27                   ` Mattias Engdegård
2020-08-20 13:24                     ` Eli Zaretskii
2020-08-20 18:46                       ` Mattias Engdegård
2020-08-20 19:13                         ` Eli Zaretskii
2020-08-21  9:39                           ` Mattias Engdegård
2020-08-21 13:26                             ` Eli Zaretskii
2020-08-21 14:53                               ` Mattias Engdegård
2020-08-21 15:27                                 ` Eli Zaretskii
2020-08-21 15:50                                   ` Mattias Engdegård
2020-08-23 17:23                                     ` Mattias Engdegård
2020-08-20 13:24                     ` Alan Third
2020-08-20 17:44                       ` Mattias Engdegård
2020-08-18 12:24               ` Eli Zaretskii
2020-08-18 14:11                 ` Mattias Engdegård
2020-08-18 14:40                   ` Eli Zaretskii
2020-08-18 15:21                     ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=243A5DA8-2865-485D-A8A2-1F543B046BAA@acm.org \
    --to=mattiase@acm.org \
    --cc=42904@debbugs.gnu.org \
    --cc=alan@idiocy.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.