From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Andreas Schwab Newsgroups: gmane.emacs.bugs Subject: bug#66760: 29.1; [BUG] GB18030 Incorrect Encoding Date: Thu, 26 Oct 2023 16:20:59 +0200 Message-ID: References: <1015f5fcf69b9c0656d42932da193bd4@sics.ac.cn> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30743"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: 66760@debbugs.gnu.org To: "Ruijie Yu" Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Oct 26 16:21:52 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qw1FA-0007oI-GJ for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 26 Oct 2023 16:21:52 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qw1Eu-0001iF-KC; Thu, 26 Oct 2023 10:21:36 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qw1Eq-0001h6-0Y for bug-gnu-emacs@gnu.org; Thu, 26 Oct 2023 10:21:32 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qw1Ep-0007GN-Oz for bug-gnu-emacs@gnu.org; Thu, 26 Oct 2023 10:21:31 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qw1FK-00013z-Hl for bug-gnu-emacs@gnu.org; Thu, 26 Oct 2023 10:22:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Andreas Schwab Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 26 Oct 2023 14:22:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66760 X-GNU-PR-Package: emacs Original-Received: via spool by 66760-submit@debbugs.gnu.org id=B66760.16983301014057 (code B ref 66760); Thu, 26 Oct 2023 14:22:02 +0000 Original-Received: (at 66760) by debbugs.gnu.org; 26 Oct 2023 14:21:41 +0000 Original-Received: from localhost ([127.0.0.1]:34297 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qw1Ez-00013N-Ck for submit@debbugs.gnu.org; Thu, 26 Oct 2023 10:21:41 -0400 Original-Received: from smtp-out2.suse.de ([2001:67c:2178:6::1d]:37252) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qw1Eu-00012q-KX for 66760@debbugs.gnu.org; Thu, 26 Oct 2023 10:21:40 -0400 Original-Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 37D6B1F8AE; Thu, 26 Oct 2023 14:20:59 +0000 (UTC) Original-Received: from hawking.nue2.suse.org (unknown [10.168.4.11]) by relay2.suse.de (Postfix) with ESMTP id E081B2D675; Thu, 26 Oct 2023 14:20:58 +0000 (UTC) Original-Received: by hawking.nue2.suse.org (Postfix, from userid 17005) id 106E24A0396; Thu, 26 Oct 2023 16:20:59 +0200 (CEST) In-Reply-To: <1015f5fcf69b9c0656d42932da193bd4@sics.ac.cn> (Ruijie Yu's message of "Thu, 26 Oct 2023 19:43:54 +0800") X-Yow: It's OBVIOUS.. The FURS never reached ISTANBUL.. You were an EXTRA in the REMAKE of ``TOPKAPI''.. Go home to your WIFE.. She's making FRENCH TOAST! X-Spamd-Bar: / Authentication-Results: smtp-out2.suse.de; dkim=none; dmarc=none; spf=softfail (smtp-out2.suse.de: 149.44.160.134 is neither permitted nor denied by domain of schwab@suse.de) smtp.mailfrom=schwab@suse.de X-Rspamd-Server: rspamd2 X-Spamd-Result: default: False [0.37 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-3.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.20)[suse.de]; R_SPF_SOFTFAIL(0.60)[~all:c]; RWL_MAILSPIKE_GOOD(0.00)[149.44.160.134:from]; VIOLATED_DIRECT_SPF(3.50)[]; MX_GOOD(-0.01)[]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.20)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; MID_RHS_MATCH_FROM(0.00)[]; BAYES_HAM(-0.12)[66.83%] X-Rspamd-Queue-Id: 37D6B1F8AE X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:273294 Archived-At: On Okt 26 2023, Ruijie Yu wrote: > I have noticed that in GB18030 encoding, certain ranges of characters > have incorrect encodings. > > One example is U+217A (SMALL ROMAN NUMERAL ELEVEN). The expected > encoding is 81 36 C5 30 (as can be seen from the GB18030 standard [1] > and verified from other programs such as iconv and MySQL), whereas the > observed encoding within Emacs is 81 36 C4 39, with a 1-codepoint > offset. This is a bug in the generation of GB180304.map. The gb180303.awk script assumes that the 4-byte encodings of GB18030 are filling the holes in sequence of characters with a 2-byte encoding by Unicode codepoint order, but there are some places where codepoints from the PUA area are inserted into the sequence. For example, U+1E3E maps to 81 35 F4 36, the next codepoint not mapped to a 2-byte code is U+1E40, but that maps to 81 35 F4 38, whereas 81 35 F4 37 is the encoding of U+E7C7. So the output gets out of sync. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."