From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: Re: ucs-normalize and diacritics Date: Thu, 26 Jul 2018 10:40:45 +0200 Message-ID: <877eli5r02.fsf@gmail.com> References: <8736w88pnn.fsf@gmail.com> <83lga0v4ff.fsf@gnu.org> <83in54v3sp.fsf@gnu.org> <87y3e07425.fsf@gmail.com> <83h8koujs9.fsf@gnu.org> <87pnzb7ogd.fsf@gmail.com> <874lgn74sw.fsf@gmail.com> <83va93tlb4.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1532594369 11486 195.159.176.226 (26 Jul 2018 08:39:29 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 26 Jul 2018 08:39:29 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jul 26 10:39:25 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fibnw-0002sW-Hr for ged-emacs-devel@m.gmane.org; Thu, 26 Jul 2018 10:39:24 +0200 Original-Received: from localhost ([::1]:32770 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fibq1-00005D-Cd for ged-emacs-devel@m.gmane.org; Thu, 26 Jul 2018 04:41:33 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52590) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fibpL-0008QQ-TJ for emacs-devel@gnu.org; Thu, 26 Jul 2018 04:40:54 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fibpI-0005Pn-UY for emacs-devel@gnu.org; Thu, 26 Jul 2018 04:40:51 -0400 Original-Received: from mail-wm0-x236.google.com ([2a00:1450:400c:c09::236]:34655) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fibpI-0005PD-NQ; Thu, 26 Jul 2018 04:40:48 -0400 Original-Received: by mail-wm0-x236.google.com with SMTP id l2-v6so1519617wme.1; Thu, 26 Jul 2018 01:40:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:mail-followup-to:mail-copies-to :gmane-reply-to-list:date:in-reply-to:message-id:mime-version :content-transfer-encoding; bh=mR8uuew8o/OPStWZxXC1M8tQvZXULa+GqZc/82uQN6M=; b=kTJC1nH2Aw0FlYuc/09jYJIKlNn+9wpUDFAK2/ySezv9rGCv46KbWzGedDxVVqfCMM QGsAtj7LOj/m1qqMJqb+CdGaHqU48m5WykzDBhU2PTZAjPWjLN/17+C6MZ3AVn83uC2j 3Zyvm1/+9LA9ucUvaKYWwm1x/BFOKUMNKaQqe7Qxqd3JGcccuDLuy7pdq/GhpqUnacEx 72rJKqFUw41qJzgcmjwpgkBWgVGbVuuJq8DsI+45IPJm4NUofCr7w5eRf5t3EXGCb8Va fvfmTCM8IzsiWkWsyNPeuj/Xd6M7rm3nzsA8bSKDqAG/fRf0lHRl6PnXA/TJv4TXxBuq lCnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:mail-followup-to :mail-copies-to:gmane-reply-to-list:date:in-reply-to:message-id :mime-version:content-transfer-encoding; bh=mR8uuew8o/OPStWZxXC1M8tQvZXULa+GqZc/82uQN6M=; b=mYTOzrWfD2Dmt70Na+vZlT2rPAXFO6WPi1p1Vs258USaBgXJ4LEUaiG1jdkbx04CyU pNPxRygnyZOKaqfzVTxMoGfeLYgRBO3su4ijXH6VQLN+E8zIXuM/zDSBVWUTfJ6sCmhC jgqlehzg2521MkIkZQ/clL3t7YreWqekv+3xdNP9M+TcIj9koivERMK+FCJjPz+UgTWm k0/N+93ZnYYw0+S6S5sS0XS2G34InvHCYWnRxppEQu8T9WCfqGV9TsLFbtsow44vfuiz qtvi405+pdyQHqhs9k3LbwuTwNhkovTnKp4oJ8zkAUCCCtt7xhlxZJ8UaWaOy4WO+Jsc P7AA== X-Gm-Message-State: AOUpUlF003d2yoW+6SHWUL5GAEUKG2SKytYBzXNI2f9t1zshZRv2cOnx 7BU4NDaW1bzt7jQVSksn9P7QGgB2 X-Google-Smtp-Source: AAOMgpeZ/xIg5fFkqTUOyFgoKAVV4TtnovFJ1mDg5LztzCLqi7/wZKl06Qy/gD2oQ0YMgufl+nv8fg== X-Received: by 2002:a1c:6585:: with SMTP id z127-v6mr868704wmb.5.1532594447066; Thu, 26 Jul 2018 01:40:47 -0700 (PDT) Original-Received: from rpluim-ubuntu ([149.5.228.1]) by smtp.gmail.com with ESMTPSA id h7-v6sm721917wrs.8.2018.07.26.01.40.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jul 2018 01:40:46 -0700 (PDT) Mail-Followup-To: emacs-devel@gnu.org Mail-Copies-To: never Gmane-Reply-To-List: yes In-Reply-To: <83va93tlb4.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 25 Jul 2018 17:57:35 +0300") X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c09::236 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:227828 Archived-At: Eli Zaretskii writes: >> From: Robert Pluim >> Date: Wed, 25 Jul 2018 16:45:03 +0200 >>=20 >> As a special case, if the character lies in the range 128 (0200 >> octal) through 159 (0237 octal), it stands for a raw byte that does = not >> correspond to any specific displayable character. Such a character = lies >> within the eight-bit-control character set, and is displayed as an >> escaped octal character code. In this case, C-x =3D shows part of >> display ... instead of file. > > This text is obsolete and inaccurate, it should be replaced/rewritten. > How about something like: As a special case, if the character lies in the range #x3fff80 through #x3fff9a (128 through 159 decimal, with prefix #x3fff), it stands for a raw byte that does not correspond to any specific displayable character. Such a character lies within the @code{eight-bit-control} character set, and is displayed as an escaped octal character code (0200 through 0237), or as an escaped hex character code (x80 through x9a) if @code{display-raw-bytes-as-hex} is non-@code{nil}. I=CA=BCm not sure the 'eight-bit-control' part is true, given the reference to 'tis620-2533' in the what-cursor-position output. >> emacs -Q >> C-x C-f /tmp/bin.txt >> C-x 8 RET 80 >> C-b >> C-x =3D >>=20 >> which gives >>=20 >> Char: \200 (128, #o200, #x80, file ...) point=3D1 of 1 (0%) column=3D0 > > Try > > C-x 8 RET 3fff80 RET Yes, that's better. So C-x 8 RET 80 results in emacs writing 2 bytes on disk, but 3fff80 results in only one. The joys of multibyte :-) Robert