From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#12291: [rev 109796] wrong UTF-8 handling Date: Thu, 27 Jan 2022 18:52:26 +0200 Message-ID: <83mtjhdstx.fsf@gnu.org> References: <20120828.074720.480105751.wl@gnu.org> <87y231kukq.fsf@gnus.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19585"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 12291@debbugs.gnu.org, smithcu@gvsu.edu To: Lars Ingebrigtsen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jan 27 18:27:15 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nD8YE-0004p7-Jz for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 27 Jan 2022 18:27:14 +0100 Original-Received: from localhost ([::1]:49348 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nD8YD-0002zH-26 for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 27 Jan 2022 12:27:13 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:36540) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nD818-00011d-Pl for bug-gnu-emacs@gnu.org; Thu, 27 Jan 2022 11:53:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:35219) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nD818-0001Fw-Et for bug-gnu-emacs@gnu.org; Thu, 27 Jan 2022 11:53:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1nD818-0002rx-3l for bug-gnu-emacs@gnu.org; Thu, 27 Jan 2022 11:53:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 27 Jan 2022 16:53:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 12291 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 12291-submit@debbugs.gnu.org id=B12291.164330236411004 (code B ref 12291); Thu, 27 Jan 2022 16:53:02 +0000 Original-Received: (at 12291) by debbugs.gnu.org; 27 Jan 2022 16:52:44 +0000 Original-Received: from localhost ([127.0.0.1]:56355 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nD80q-0002rP-8j for submit@debbugs.gnu.org; Thu, 27 Jan 2022 11:52:44 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:35668) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nD80p-0002rC-0M for 12291@debbugs.gnu.org; Thu, 27 Jan 2022 11:52:43 -0500 Original-Received: from [2001:470:142:3::e] (port=59916 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nD80d-0001Cl-55; Thu, 27 Jan 2022 11:52:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=wjrSHW1Fb7tiVhJBE9a6s/nRBW/YqjWFUyEZ1MnYDf8=; b=MD1l1E5aWSV4 Ycg71V+sQi12vc7Qehsj05qWmOCjrvFpG9gmKLV2tHYsbPcTKoEMb28pB80aoi3C95fwXpgk+Il46 xFWrjqezp9ZsnKTCRLuYXG5XmGH6KQa4Z+NRSt5zTInYyU1h35D+EPpeO4Ah/R4FL5m38f95rGeT0 Wz0zZDpGLm3OkX2xWDsEUEr4Xqfnx6+ni0Ap+7fDAgdBC5MXz4GwOiGvt3wd0cB73mKNud1hJzZt1 ygvgwuc/SYPFBDyEE6EwrnPNG9ONgMRgAWa2vnxah8jPDXaM/fi5DPyOnd32uh8OzWQhgKC7sNk/Y KMTWbLTZwRx3ZFQTqLmNDQ==; Original-Received: from [87.69.77.57] (port=2859 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nD80c-0006yD-3i; Thu, 27 Jan 2022 11:52:31 -0500 In-Reply-To: <87y231kukq.fsf@gnus.org> (message from Lars Ingebrigtsen on Thu, 27 Jan 2022 17:32:53 +0100) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:225362 Archived-At: > From: Lars Ingebrigtsen > Cc: 12291@debbugs.gnu.org, Curtis Smith , Eli Zaretskii > > Date: Thu, 27 Jan 2022 17:32:53 +0100 > > position: 1 of 2 (0%), column: 0 > character: (displayed as ) (codepoint 1266142, #o4650736, #x1351de) > charset: emacs (Full Emacs charset (excluding eight bit chars)) > code point in charset: 0x1351DE > syntax: w which means: word > category: L:Strong L2R > to input: type "C-x 8 RET 1351de" > > So Emacs now displays more accurate information about the utf-8 > sequence. > > It was pointed out that this sequence is outside the Unicode range, > which only extends up to U+10FFFF, and that Emacs should perhaps display > this as a number of raw bytes instead. Is that something we still want > to pursue, or is Emacs behaving like we want to here? Eli? This is the expected behavior. The raw bytes start at #x3FFF00, so #x1351de is some character code reserved for characters not unified with Unicode (some CJK encodings have them). Interpreting them as raw bytes would be counter-productive. I'm not sure what was Werner's problem with this, so maybe let him chime in and explain more.