From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#42904: [PATCH] Non-Unicode frame title crashes Emacs on macOS Date: Fri, 21 Aug 2020 16:26:11 +0300 Message-ID: <83ft8gb05o.fsf@gnu.org> References: <83lfidgtc7.fsf@gnu.org> <838sedgq2x.fsf@gnu.org> <02F52D43-7EAB-4E61-A567-E8CCD11D856B@acm.org> <20200817195610.GA70682@breton.holly.idiocy.org> <3F71EF82-A143-4E3A-AEF3-8A236091891D@acm.org> <20200818084306.GA89999@breton.holly.idiocy.org> <243A5DA8-2865-485D-A8A2-1F543B046BAA@acm.org> <20200818172824.GA90575@breton.holly.idiocy.org> <83h7sxcux5.fsf@gnu.org> <5719A3A9-06A2-42AF-A290-726D96B6E6F1@acm.org> <834koxcere.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23407"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 42904@debbugs.gnu.org, alan@idiocy.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Aug 21 15:27:11 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1k974X-0005wl-S3 for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 21 Aug 2020 15:27:09 +0200 Original-Received: from localhost ([::1]:37096 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k974W-0007zE-U8 for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 21 Aug 2020 09:27:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44752) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k974Q-0007yk-Hh for bug-gnu-emacs@gnu.org; Fri, 21 Aug 2020 09:27:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:33880) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k974Q-0007vN-8q for bug-gnu-emacs@gnu.org; Fri, 21 Aug 2020 09:27:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1k974Q-0007Yf-3u for bug-gnu-emacs@gnu.org; Fri, 21 Aug 2020 09:27:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 21 Aug 2020 13:27:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42904 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 42904-submit@debbugs.gnu.org id=B42904.159801638729000 (code B ref 42904); Fri, 21 Aug 2020 13:27:02 +0000 Original-Received: (at 42904) by debbugs.gnu.org; 21 Aug 2020 13:26:27 +0000 Original-Received: from localhost ([127.0.0.1]:45425 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k973r-0007Xg-AL for submit@debbugs.gnu.org; Fri, 21 Aug 2020 09:26:27 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:34440) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k973o-0007XO-1e for 42904@debbugs.gnu.org; Fri, 21 Aug 2020 09:26:25 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:43288) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k973h-0007rk-KZ; Fri, 21 Aug 2020 09:26:17 -0400 Original-Received: from [176.228.60.248] (port=2495 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1k973g-0004oq-QW; Fri, 21 Aug 2020 09:26:17 -0400 In-Reply-To: (message from Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= on Fri, 21 Aug 2020 11:39:30 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:185859 Archived-At: > From: Mattias EngdegÄrd > Date: Fri, 21 Aug 2020 11:39:30 +0200 > Cc: alan@idiocy.org, 42904@debbugs.gnu.org > > Basically, it's about how the bytes end up in mode_line_noprop_buf in the first place, since currently the information of whether it should be interpreted as unibyte or multibyte gets lost as soon as data from the strings it is composed of (like the buffer name for %b, file name for %f etc) is added to it. Then make_string tries to restore that information by looking at the bytes, and it is not always accurate. make_string was written to work on byte sequences that don't begin as the payload of a Lisp string. So it doesn't handle the information you say is being lost, because it doesn't expect such information to be available to begin with. Which is basically just another way of saying "you want something other than make_string" here. > One way of doing this is to always make sure that the input strings (buffer name, file name, frame-title-format etc) are always in multibyte form. That's what I thought I was suggesting. > > Again, what would you like to have instead? Would calling > > str_as_multibyte do what you want? > > No, I don't think so -- once the unibyte/multibyte bit is lost, it can only be restored imperfectly if all we have is the sequence of bytes. That is true, but str_as_multibyte simply interprets any valid UTF-8 sequence as a character, and any invalid sequence as a raw bytes. I thought this was precisely what you wanted for this use case, no? > If we wrote Emacs from scratch we likely wouldn't have unibyte strings at all: they are only there for compatibility and various niche uses and performance hacks. I don't think it's unreasonable to start normalising strings to multibyte where it matters. Emacs (as any other old editor) started with only unibyte strings, so that's history for you. Some modern text-handling environments solve this conundrum by not supporting raw bytes at all, but Emacs knows better.