From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: 26.1.92, 26.1-mac-7.4; unrecognised escaped chars in *Help* Date: Tue, 05 Mar 2019 18:07:09 +0200 Message-ID: <831s3le24y.fsf@gnu.org> References: Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="179891"; mail-complaints-to="usenet@blaine.gmane.org" To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Mar 05 17:07:36 2019 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h1CbO-000kfl-TE for geh-help-gnu-emacs@m.gmane.org; Tue, 05 Mar 2019 17:07:35 +0100 Original-Received: from localhost ([127.0.0.1]:45091 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h1CbN-0006L2-Oo for geh-help-gnu-emacs@m.gmane.org; Tue, 05 Mar 2019 11:07:33 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:43793) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h1Cb7-0006Kj-WE for help-gnu-emacs@gnu.org; Tue, 05 Mar 2019 11:07:18 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:52060) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h1Cb4-0006xr-QP for help-gnu-emacs@gnu.org; Tue, 05 Mar 2019 11:07:15 -0500 Original-Received: from [176.228.60.248] (port=2606 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1h1Cb1-0005Ur-TP for help-gnu-emacs@gnu.org; Tue, 05 Mar 2019 11:07:12 -0500 In-reply-to: (message from Van L on Mon, 04 Mar 2019 12:46:02 +1100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:119553 Archived-At: > From: Van L > Date: Mon, 04 Mar 2019 12:46:02 +1100 > > >From the *scratch* buffer, I lookup the keybinding possibilities by > > C-h b > > Under the Global Bindings section, the two lines under SPC look to be > encoded in Latin-1. I guess Emacs assumes UTF-8. No, this has nothing to do with encoding. This text is produced by Emacs itself (unlike the previous problem with EWW, where the text came from an external source), so decoding text is not necessary, because text generated by Emacs itself and inserted into its buffers is always in the correct "encoding" (we prefer to call that "representation", to distinguish between the internal representation of characters in Emacs buffers and strings, and encoded text outside Emacs). > The problem is I see \200 \377 and a two row box having inside of it > 3FF F7F as follows > > -- quote - unknown encoding characters replaced with lookalike sequence > SPC .. ~ self-insert-command > \200 .. 3FF_F7F self-insert-command > \200 .. \377 self-insert-command Yes. This is admittedly confusing, although 100% correct. To start digging into what happens here, go to each of the 2 \200's and type "C-u C-x =". You will see that these two look identically on display, but are actually two very different beasts: the former is a Unicode character whose codepoint happens to be 200 octal (0x80 in hex), the latter is a raw byte of the same value. Emacs distinguishes between them. The confusing bit here is that they are by default both displayed identically, for dull historical reasons (once upon a time, Emacs didn't distinguish between them). (Perhaps there's no longer a reason to use this confusing display nowadays.) So the first of the above 2 lines stands for all the non-ASCII Unicode characters, all of which are bound to self-insert-command by default. The funny display of both ends of that character code range is because none of the shown codes corresponds to a printable character. In particular, the \200 codepoint is currently unassigned, i.e. there's no character whose Unicode codepoint is 0x80. By contrast, the second row shows all the raw bytes, which are also bound to self-insert-command by default. IOW, unlike the case with EWW showing incorrectly decoded text, here the issue is with how characters are _displayed_, not how they are decoded. To change how they look you need to fiddle with display features, not with decoding features. And now to your question: > I know what to do for this kind of situation in EWW, type "E latin-1 RET". > > What goes here? Type M-x customize-variable RET glyphless-char-display-control RET In the buffer this displays, check the box to the left of the "c1-control" group. This enables the button to the right of the checkbox; click on it and select the method you want, e.g. "Display acronym" or "Display hex code in a box". Then click "Apply". This will change how all the characters in the range [0x80..0x9f] are displayed.