From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#65996: 29.1; UCS normalization is wrong Date: Sat, 16 Sep 2023 12:21:42 +0300 Message-ID: <83sf7eiic9.fsf@gnu.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38259"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 65996-done@debbugs.gnu.org To: awrhygty@outlook.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Sep 16 11:23:07 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qhRW6-0009gF-I2 for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 16 Sep 2023 11:23:06 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qhRVx-00070i-D0; Sat, 16 Sep 2023 05:22:57 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qhRVw-00070K-2x for bug-gnu-emacs@gnu.org; Sat, 16 Sep 2023 05:22:56 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qhRVv-0005Tg-R6 for bug-gnu-emacs@gnu.org; Sat, 16 Sep 2023 05:22:55 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qhRW2-0007xI-Ng for bug-gnu-emacs@gnu.org; Sat, 16 Sep 2023 05:23:02 -0400 Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-To: bug-gnu-emacs@gnu.org Resent-Date: Sat, 16 Sep 2023 09:23:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: cc-closed 65996 X-GNU-PR-Package: emacs Mail-Followup-To: 65996@debbugs.gnu.org, eliz@gnu.org, awrhygty@outlook.com Original-Received: via spool by 65996-done@debbugs.gnu.org id=D65996.169485612930497 (code D ref 65996); Sat, 16 Sep 2023 09:23:02 +0000 Original-Received: (at 65996-done) by debbugs.gnu.org; 16 Sep 2023 09:22:09 +0000 Original-Received: from localhost ([127.0.0.1]:45352 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qhRV8-0007vl-DY for submit@debbugs.gnu.org; Sat, 16 Sep 2023 05:22:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:57802) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qhRV3-0007vF-3Q for 65996-done@debbugs.gnu.org; Sat, 16 Sep 2023 05:22:05 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qhRUp-0005ND-Sa; Sat, 16 Sep 2023 05:21:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=fgQ6nQ80s9cfV0Y5+KS3wd8FK2XUtGHGjBXWJKs+ONs=; b=aSbAqMkHX905d45jW0HA 022MELUiK5eYd6DH2pQNmExQDrnxL3LcwPXMFks/CvLyBQJ4grdJ4dcuC+s4kKQ3tHvcPp0Q5vWO7 w0nuvFfQnDEuh6TCwrxoXfIkNjAdJB07Jiwj1jTxqg5CHz7E8RjF8OHIjO/5urPKR9+6aiid0lutQ Fp74mccjJNDD+dloHa2QeP7cOOPQ3lJWBqMNJMEjcEUL+0nSr5kszvmQw8GugwQAuu0nHWC37I81C LXyox3zlcV4rgcWePSQq9o2ALG1GNJusqvr9d48q8fpKW/Tvp7RTj48jdZQFAhfkCBIXCZjoxWbfY O0J17GKtKVOmJA==; In-Reply-To: (awrhygty@outlook.com) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:270591 Archived-At: > From: awrhygty@outlook.com > Date: Fri, 15 Sep 2023 21:49:38 +0900 > > > UCS normalization is wrong for some characters. > > (1) NFD/NFKD decompostion is not done > U+1112E 𑄮 CHAKMA VOWEL SIGN O > U+1112F 𑄯 CHAKMA VOWEL SIGN AU > U+1134B 𑍋 GRANTHA VOWEL SIGN OO > U+1134C 𑍌 GRANTHA VOWEL SIGN AU > U+114BB 𑒻 TIRHUTA VOWEL SIGN AI > U+114BC 𑒼 TIRHUTA VOWEL SIGN O > U+114BE 𑒾 TIRHUTA VOWEL SIGN AU > U+115BA 𑖺 SIDDHAM VOWEL SIGN O > U+115BB 𑖻 SIDDHAM VOWEL SIGN AU > U+11938 𑤸 DIVES AKURU VOWEL SIGN O > > (let ((s "\U0001112E\U0001112F\U0001134B\U0001134C\ > \U000114BB\U000114BC\U000114BE\U000115BA\U000115BB\U00011938")) > (require 'ucs-normalize) > (list (equal s (ucs-normalize-NFD-string s)) > (equal s (ucs-normalize-NFKD-string s)))) > =>(t t) > > (2) NFKC/NFKD replacement is not done > U+1E030..U+1E06D Cyrillic MODIFIER LETTER or SUBSCRIPT > U+1EE00..U+1EEBB ARABIC MATHEMATICAL * > U+1FBF0..U+1FBF9 SEGMENTED DIGIT * > > (let* ((f (lambda (cell) > (apply #'string (number-sequence (car cell) (cdr cell))))) > (s (mapconcat f '((#x1E030 . #x1E06D) > (#x1EE00 . #x1EEBB) > (#x1FBF0 . #x1FBF9))))) > (require 'ucs-normalize) > (list (equal s (ucs-normalize-NFKC-string s)) > (equal s (ucs-normalize-NFKD-string s)))) > =>(t t) Thanks, fixed on the emacs-29 branch. Once again, if (as I'm guessing) you found these problems by examining the data in ucs-normalize.el, it would have greatly helped if you'd pointed to the problematic data in your report. Reverse-engineering the sources of the problem from the behavior takes time, especially when the relevant code is not trivial and was written by someone else.