From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Van L Newsgroups: gmane.emacs.help Subject: Re: 26.1.92, 26.1-mac-7.4; unrecognised escaped chars in *Help* Date: Wed, 06 Mar 2019 11:47:29 +1100 Message-ID: References: <831s3le24y.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="260570"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (berkeley-unix) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Mar 06 01:48:16 2019 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h1KjI-0015hI-IL for geh-help-gnu-emacs@m.gmane.org; Wed, 06 Mar 2019 01:48:16 +0100 Original-Received: from localhost ([127.0.0.1]:51830 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h1KjH-0007z4-Gc for geh-help-gnu-emacs@m.gmane.org; Tue, 05 Mar 2019 19:48:15 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:54001) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h1Kj6-0007yv-R1 for help-gnu-emacs@gnu.org; Tue, 05 Mar 2019 19:48:05 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h1Kj5-0006xi-SO for help-gnu-emacs@gnu.org; Tue, 05 Mar 2019 19:48:04 -0500 Original-Received: from [195.159.176.226] (port=38938 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h1Kj5-0006wj-Fi for help-gnu-emacs@gnu.org; Tue, 05 Mar 2019 19:48:03 -0500 Original-Received: from list by blaine.gmane.org with local (Exim 4.89) (envelope-from ) id 1h1Kj0-0015Q2-SI for help-gnu-emacs@gnu.org; Wed, 06 Mar 2019 01:47:58 +0100 X-Injected-Via-Gmane: http://gmane.org/ Cancel-Lock: sha1:Jl6hkqNxJS29nP44RcrR+hjKI1c= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 195.159.176.226 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:119559 Archived-At: Eli writes: >> >From the *scratch* buffer, I lookup the keybinding possibilities by >> >> C-h b >> >> Under the Global Bindings section, the two lines under SPC look to be >> encoded in Latin-1. I guess Emacs assumes UTF-8. > > No, this has nothing to do with encoding. This text is produced by > Emacs itself … the internal representation of characters in Emacs > buffers and strings >> \200 .. 3FF_F7F self-insert-command >> \200 .. \377 self-insert-command > > Yes. This is admittedly confusing, although 100% correct. But. But. But. Less than 100% beautiful. The out of ASCII range row terminated by unprintables as visually balanced hex values in a box would look and feel nicer. > To start > digging into what happens here, go to each of the 2 \200's and type > "C-u C-x =". You will see that these two look identically on display, > but are actually two very different beasts: the former is a Unicode > character whose codepoint happens to be 200 octal (0x80 in hex), the > latter is a raw byte of the same value. They are born digital homonyms. > Emacs distinguishes between > them. The confusing bit here is that they are by default both > displayed identically, "C-u C-x =" or M-x describe-char RET puts them in category: l:Latin category: L:Left-to-right (strong) > for dull historical reasons (once upon a time, > Emacs didn't distinguish between them). (Perhaps there's no longer a > reason to use this confusing display nowadays.) Wouldn't it be funny to pull on that string? all the way to the bottom is tied a boat anchor in the shape of a first of its kind 1950s Chinese electric computer keyboard invented and made in the U.S.A. which was being considered a gift to China by the Ike Admin. > So the first of the above 2 lines stands for all the non-ASCII Unicode > characters, all of which are bound to self-insert-command by default. > By contrast, the second row shows all the raw bytes, which are also > bound to self-insert-command by default. > IOW, unlike the case with EWW showing incorrectly decoded text, here > the issue is with how characters are _displayed_, > And now to your question: > >> I know what to do for this kind of situation in EWW, type "E latin-1 RET". >> >> What goes here? > > Type > > M-x customize-variable RET glyphless-char-display-control RET > Thank you. Should I file a bug report for copy and paste inconsistency when trying to collect in one buffer the `M-x describe-char' output? for the above two. Highlight region then M-w C-y fails whereas the middle-mouse button paste works. Having done that and attempting to save the buffer presents the following on problematic characters which makes sense given the above explanation -- quote These default coding systems were tried to encode text in the buffer ‘x’: (utf-8 (845 . 4194176) (861 . 4194176) (1376 . 4194176)) However, each of them encountered characters it couldn’t encode: utf-8 cannot encode these: \200 \200 \200 Click on a character (or switch to this window by ‘C-x o’ and select the characters by RET) to jump to the place it appears, where ‘C-u C-x =’ will give information about it. Select one of the safe coding systems listed below, or cancel the writing with C-g and edit the buffer to remove or modify the problematic characters, or specify any other coding system (and risk losing the problematic characters). raw-text no-conversion -- quote ends -- © 2019 Van L gpg using EEF2 37E9 3840 0D5D 9183 251E 9830 384E 9683 B835 "What's so strange when you know that you're a Wizard at 3?" -Joni Mitchell