From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Michael Welsh Duggan Newsgroups: gmane.emacs.devel Subject: Re: Encoding of etc/HELLO Date: Sat, 21 Apr 2018 10:58:53 -0400 Message-ID: <87fu3owqqa.fsf@md5i.com> References: <83sh7qxb5j.fsf@gnu.org> <87po2t6gdm.fsf@gmx.de> <83muxxyijl.fsf@gnu.org> <83lgdhyeqv.fsf@gnu.org> <83k1t1xcjp.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1524322620 11893 195.159.176.226 (21 Apr 2018 14:57:00 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 21 Apr 2018 14:57:00 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Apr 21 16:56:56 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f9twd-00031C-D6 for ged-emacs-devel@m.gmane.org; Sat, 21 Apr 2018 16:56:55 +0200 Original-Received: from localhost ([::1]:50298 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f9tyk-000069-6L for ged-emacs-devel@m.gmane.org; Sat, 21 Apr 2018 10:59:06 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:42675) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f9tyc-00005G-3j for emacs-devel@gnu.org; Sat, 21 Apr 2018 10:58:59 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f9tyZ-0001ai-2V for emacs-devel@gnu.org; Sat, 21 Apr 2018 10:58:58 -0400 Original-Received: from md5i.com ([75.151.244.229]:56830) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f9tyY-0001aP-S5 for emacs-devel@gnu.org; Sat, 21 Apr 2018 10:58:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=md5i.com; s=dkim; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:References: Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=taUo9mJijmKI2s7QY4MnW2SGnzMBp/c9gCCiWb6KY0s=; b=ONbH72jpGulya4zBNwsv0NYcEc KfiF1O5sgEI4DhrPKDQuCAkA+epnU9l32qp3F0pVFApES9AlUnxzzsjm8nQZn2qo/n3vfoDNgnu4/ 5Zj/7Wo6KcAfEm85TjCeg3E8N; Original-Received: from md5i by md5i.com with local (Exim 4.90_1) (envelope-from ) id 1f9tyX-0003bF-Km for emacs-devel@gnu.org; Sat, 21 Apr 2018 10:58:53 -0400 In-Reply-To: <83k1t1xcjp.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 21 Apr 2018 10:07:38 +0300") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 75.151.244.229 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:224761 Archived-At: Eli Zaretskii writes: >> From: Stefan Monnier >> Cc: emacs-devel@gnu.org >> Date: Fri, 20 Apr 2018 16:42:02 -0400 >> >> > The whole point of ISO-2022 is that the same Unicode codepoints can >> > come from different ISO-2022 charsets, and the ISO-2022 encoding keeps >> > that information in the bytestream. >> >> My question was meant to see if there's a way to encode a similar kind >> of charset info into the bytestream. From what you say above, there is >> such a thing but its use is discouraged. > > If you mean a Unicode-compatible bytestream, then yes, that's the > feature I know of. But if we want to use it in Emacs, we should > modify the UTF-x decoders to put the charset properties on the decoded > text, or invent a new property (since charset is currently 'unicode'), > and then augment the font selection code to consider that new > property. > >> Clearly this problem is not specific to Emacs, so what do people do? >> Hold on to iso-2022 for as long as they can (like we do in Emacs)? >> Give up on these "details" of rendering for files using a mix of C, J, and K? >> Rely on higher-level info (XML tags and friends) to carry the charset info? > > I don't know. Several years ago, I think each vendor used a private > extension of ISO-2022 to support the emoji, not sure if that is still > the case, especially since the number of standardized emoji continues > to grow all the time. We could perhaps follow one such extension in > our support of ISO-2022. Or we could decide that the Han unification > has conquered the world, and therefore the CJK charset distinction for > font selection is no longer important enough for us, in which case we > could recode HELLO in UTF-8. I would suppose that the usual way to do this (encode glyph variants in a Unicode-compatible bytestream) would be to use some form of document markup. In Emacs's case, enriched-mode would seem an ideal candidate for this. RFC-1896 specifically supports private extensions for attributes using the "X-" syntax, and enriched.el is small and should be simple to modify for this purpose. -- Michael Welsh Duggan (md5i@md5i.com)