From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Use the Unicode replacement character for replacing unencodable characters into UTF-16 Date: Tue, 18 Aug 2020 21:13:35 +0300 Message-ID: <83pn7nesa8.fsf@gnu.org> References: <7399CD85-E381-4BE6-81D6-10AA9FC56685@acm.org> <83y2mcdizb.fsf@gnu.org> <691F8CF5-5FEA-4C35-AF1C-786CDF8DF7B8@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34192"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Aug 18 20:14:23 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1k867q-0008m4-Q9 for ged-emacs-devel@m.gmane-mx.org; Tue, 18 Aug 2020 20:14:22 +0200 Original-Received: from localhost ([::1]:38264 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k867p-0006o7-RZ for ged-emacs-devel@m.gmane-mx.org; Tue, 18 Aug 2020 14:14:21 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41992) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k867I-0006Lt-4e for emacs-devel@gnu.org; Tue, 18 Aug 2020 14:13:49 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:57409) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k867H-0004Ja-PP; Tue, 18 Aug 2020 14:13:47 -0400 Original-Received: from [176.228.60.248] (port=2137 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1k867H-0006xN-2Z; Tue, 18 Aug 2020 14:13:47 -0400 In-Reply-To: <691F8CF5-5FEA-4C35-AF1C-786CDF8DF7B8@acm.org> (message from Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Tue, 18 Aug 2020 19:07:41 +0200) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:253967 Archived-At: > From: Mattias EngdegÄrd > Date: Tue, 18 Aug 2020 19:07:41 +0200 > Cc: emacs-devel@gnu.org > > 18 aug. 2020 kl. 18.19 skrev Eli Zaretskii : > > > Can you describe under which circumstances this default-character will > > be used? > > It's what encoding into UTF-16 uses for characters that don't have a Unicode equivalent, such as raw bytes. My reading is that this happens only for codepoints beyond 0x10ffff. Raw bytes end up there, but I'm not sure they always end up there. Characters that aren't unified also end up there. > > The issue that bothers me is whether u+FFFD can appear in situations > > where it cannot be displayed by Emacs, because then the result will be > > more confusing than helping. > > Do you mean that on balance, all things considered, you prefer space as replacement character to U+FFFD? I mean if the situation that bother do in fact exist (I'm not sure they do), we should discuss them and see whether we care about them.