* bug#65997: 29.1; ?\N{char_name} reference is wrong
@ 2023-09-15 13:02 awrhygty
2023-09-15 15:33 ` Robert Pluim
0 siblings, 1 reply; 6+ messages in thread
From: awrhygty @ 2023-09-15 13:02 UTC (permalink / raw)
To: 65997
S-exps in the form of ?\N{char_name} return wrong values for some
characters.
The S-exp below inserts a whole list of such characters.
(dotimes (u (1+ (max-char 'ucs)))
(let* ((name (get-char-code-property u 'name)))
(when (and name (not (<= #xD800 u #xDFFF)))
(let ((u2 (condition-case err
(read (format "?\\N{%s}" name))
(error 0))))
(unless (eq u u2)
(insert (format "%X\t%s\t%X\t%s\n" u name u2
(if (= 0 u2)
"error"
(get-char-code-property u2 'name)))))))))
output(TANGUT COMPONENTs are omitted):
21D LATIN SMALL LETTER YOGH 292 LATIN SMALL LETTER EZH
438 CYRILLIC SMALL LETTER I 456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
44D CYRILLIC SMALL LETTER E 454 CYRILLIC SMALL LETTER UKRAINIAN IE
3131 HANGUL LETTER KIYEOK 314B HANGUL LETTER KHIEUKH
3142 HANGUL LETTER PIEUP 314D HANGUL LETTER PHIEUPH
3148 HANGUL LETTER CIEUC 314A HANGUL LETTER CHIEUCH
3200 PARENTHESIZED HANGUL KIYEOK 320A PARENTHESIZED HANGUL KHIEUKH
3205 PARENTHESIZED HANGUL PIEUP 320C PARENTHESIZED HANGUL PHIEUPH
3208 PARENTHESIZED HANGUL CIEUC 3209 PARENTHESIZED HANGUL CHIEUCH
3260 CIRCLED HANGUL KIYEOK 326A CIRCLED HANGUL KHIEUKH
3265 CIRCLED HANGUL PIEUP 326C CIRCLED HANGUL PHIEUPH
3268 CIRCLED HANGUL CIEUC 3269 CIRCLED HANGUL CHIEUCH
FFA1 HALFWIDTH HANGUL LETTER KIYEOK FFBB HALFWIDTH HANGUL LETTER KHIEUKH
FFB2 HALFWIDTH HANGUL LETTER PIEUP FFBD HALFWIDTH HANGUL LETTER PHIEUPH
FFB8 HALFWIDTH HANGUL LETTER CIEUC FFBA HALFWIDTH HANGUL LETTER CHIEUCH
16FE4 KHITAN SMALL SCRIPT FILLER 0 error
16FF0 VIETNAMESE ALTERNATE READING MARK CA 0 error
16FF1 VIETNAMESE ALTERNATE READING MARK NHAY 0 error
18800 TANGUT COMPONENT-001 0 error
...
18AFF TANGUT COMPONENT-768 0 error
1B132 HIRAGANA LETTER SMALL KO 0 error
In GNU Emacs 29.1 (build 2, x86_64-w64-mingw32) of 2023-08-02 built on
AVALON
Windowing system distributor 'Microsoft Corp.', version 10.0.19045
System Description: Microsoft Windows 10 Pro (v10.0.2009.19045.3448)
Configured using:
'configure --with-modules --without-dbus --with-native-compilation=aot
--without-compress-install --with-tree-sitter CFLAGS=-O2'
Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NATIVE_COMP
NOTIFY W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB
(NATIVE_COMP present but libgccjit not available)
Important settings:
value of $LANG: JPN
locale-coding-system: cp932
Major mode: Lisp Interaction
Minor modes in effect:
highlight-changes-visible-mode: t
tooltip-mode: t
global-eldoc-mode: t
eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
Load-path shadows:
None found.
Features:
(qp rect misearch multi-isearch comp comp-cstr warnings icons rx
emoji-labels emoji multisession sqlite transient format-spec edmacro
kmacro cl-extra gnutls network-stream nsm mailalias smtpmail textsec
uni-scripts url url-proxy url-privacy url-expand url-methods url-history
url-cookie generate-lisp-file url-domsuf url-util url-parse auth-source
cl-seq eieio eieio-core cl-macs json map url-vars idna-mapping
ucs-normalize uni-confusable textsec-check cl-print byte-opt gv bytecomp
byte-compile debug backtrace find-func hilit-chg wid-edit thingatpt
help-fns radix-tree help-mode pp shadow sort mail-extr emacsbug message
mailcap yank-media puny dired dired-loaddefs rfc822 mml mml-sec
password-cache epa derived epg rfc6068 epg-config gnus-util
text-property-search time-date subr-x mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs cl-lib
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
term/bobcat japan-util rmc iso-transl tooltip cconv eldoc paren electric
uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel dos-w32
ls-lisp disp-table term/w32-win w32-win w32-vars term/common-win
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq
simple cl-generic indonesian philippine cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button
loaddefs theme-loaddefs faces cus-face macroexp files window
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget keymap hashtable-print-readable backquote threads
w32notify w32 lcms2 multi-tty make-network-process native-compile emacs)
Memory information:
((conses 16 343615 63051)
(symbols 48 17876 4)
(strings 32 70082 16005)
(string-bytes 1 1428826)
(vectors 16 57019)
(vector-slots 8 1745292 147352)
(floats 8 69 384)
(intervals 56 10985 3149)
(buffers 984 19))
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#65997: 29.1; ?\N{char_name} reference is wrong
2023-09-15 13:02 bug#65997: 29.1; ?\N{char_name} reference is wrong awrhygty
@ 2023-09-15 15:33 ` Robert Pluim
2023-09-15 18:31 ` Eli Zaretskii
2023-09-15 18:57 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 2 replies; 6+ messages in thread
From: Robert Pluim @ 2023-09-15 15:33 UTC (permalink / raw)
To: awrhygty; +Cc: Stefan Monnier, 65997
>>>>> On Fri, 15 Sep 2023 22:02:37 +0900, awrhygty@outlook.com said:
awrhygty> S-exps in the form of ?\N{char_name} return wrong values for some
awrhygty> characters.
awrhygty> The S-exp below inserts a whole list of such characters.
awrhygty> (dotimes (u (1+ (max-char 'ucs)))
awrhygty> (let* ((name (get-char-code-property u 'name)))
awrhygty> (when (and name (not (<= #xD800 u #xDFFF)))
awrhygty> (let ((u2 (condition-case err
awrhygty> (read (format "?\\N{%s}" name))
awrhygty> (error 0))))
awrhygty> (unless (eq u u2)
awrhygty> (insert (format "%X\t%s\t%X\t%s\n" u name u2
awrhygty> (if (= 0 u2)
awrhygty> "error"
awrhygty> (get-char-code-property u2 'name)))))))))
For a minute there I thought our hash tables were broken :-). Stefan,
it only took 9 years, but this is no longer true:
lisp/international/mule-cmds.el:
;; In theory this code could end up pushing an "old-name" that
;; shadows a "new-name" but in practice every time an
;; `old-name' conflicts with a `new-name', the newer one has a
;; higher code, so it gets pushed later!
The patch below fixes that issue.
awrhygty> output(TANGUT COMPONENTs are omitted):
I donʼt know why the ranges in `ucs-names' donʼt cover these
code-points. Itʼs easy enough to change them, but theyʼre
explicitly commented out.
awrhygty> 16FE4 KHITAN SMALL SCRIPT FILLER 0 error
awrhygty> 16FF0 VIETNAMESE ALTERNATE READING MARK CA 0 error
awrhygty> 16FF1 VIETNAMESE ALTERNATE READING MARK NHAY 0 error
awrhygty> 1B132 HIRAGANA LETTER SMALL KO 0 error
And similarly for these 4.
Robert
--
diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index c26898f7649..254ecae5bd5 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -3135,7 +3135,9 @@ ucs-names
;; `old-name' conflicts with a `new-name', the newer one has a
;; higher code, so it gets pushed later!
(if new-name (puthash new-name c names))
- (if old-name (puthash old-name c names))
+ (when (and old-name
+ (not (gethash old-name names)))
+ (puthash old-name c names))
;; Unicode uses the spelling "lamda" in character
;; names, instead of "lambda", due to "preferences
;; expressed by the Greek National Body" (Bug#30513).
^ permalink raw reply related [flat|nested] 6+ messages in thread
* bug#65997: 29.1; ?\N{char_name} reference is wrong
2023-09-15 15:33 ` Robert Pluim
@ 2023-09-15 18:31 ` Eli Zaretskii
2023-09-18 9:57 ` bug#65997: 29.1; ?\N{char_name} reference is wrong, " Robert Pluim
2023-09-15 18:57 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2023-09-15 18:31 UTC (permalink / raw)
To: Robert Pluim; +Cc: 65997, monnier, awrhygty
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 65997@debbugs.gnu.org
> From: Robert Pluim <rpluim@gmail.com>
> Date: Fri, 15 Sep 2023 17:33:41 +0200
>
> For a minute there I thought our hash tables were broken :-). Stefan,
> it only took 9 years, but this is no longer true:
>
> lisp/international/mule-cmds.el:
>
> ;; In theory this code could end up pushing an "old-name" that
> ;; shadows a "new-name" but in practice every time an
> ;; `old-name' conflicts with a `new-name', the newer one has a
> ;; higher code, so it gets pushed later!
>
> The patch below fixes that issue.
Please install on the emacs-29 branch, and thanks.
> awrhygty> output(TANGUT COMPONENTs are omitted):
>
> I donʼt know why the ranges in `ucs-names' donʼt cover these
> code-points. Itʼs easy enough to change them, but theyʼre
> explicitly commented out.
They are omitted because their names make no sense, and would just
confuse users.
> awrhygty> 16FE4 KHITAN SMALL SCRIPT FILLER 0 error
> awrhygty> 16FF0 VIETNAMESE ALTERNATE READING MARK CA 0 error
> awrhygty> 16FF1 VIETNAMESE ALTERNATE READING MARK NHAY 0 error
> awrhygty> 1B132 HIRAGANA LETTER SMALL KO 0 error
>
> And similarly for these 4.
These 4 should probably be included. They were excluded because they
are in the ranges that were once unused.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#65997: 29.1; ?\N{char_name} reference is wrong, Re: bug#65997: 29.1; ?\N{char_name} reference is wrong
2023-09-15 18:31 ` Eli Zaretskii
@ 2023-09-18 9:57 ` Robert Pluim
2023-09-18 11:24 ` Eli Zaretskii
0 siblings, 1 reply; 6+ messages in thread
From: Robert Pluim @ 2023-09-18 9:57 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 65997, Stefan Monnier, awrhygty
tags 65997 fixed
close 65997 29.2
quit
>>>>> On Fri, 15 Sep 2023 21:31:48 +0300, Eli Zaretskii <eliz@gnu.org> said:
Eli> Please install on the emacs-29 branch, and thanks.
awrhygty> output(TANGUT COMPONENTs are omitted):
>>
>> I donʼt know why the ranges in `ucs-names' donʼt cover these
>> code-points. Itʼs easy enough to change them, but theyʼre
>> explicitly commented out.
Eli> They are omitted because their names make no sense, and would just
Eli> confuse users.
OK.
awrhygty> 16FE4 KHITAN SMALL SCRIPT FILLER 0 error
awrhygty> 16FF0 VIETNAMESE ALTERNATE READING MARK CA 0 error
awrhygty> 16FF1 VIETNAMESE ALTERNATE READING MARK NHAY 0 error
awrhygty> 1B132 HIRAGANA LETTER SMALL KO 0 error
>>
>> And similarly for these 4.
Eli> These 4 should probably be included. They were excluded because they
Eli> are in the ranges that were once unused.
OK. Iʼll put a comment in admin/notes/unicode on master for the future.
>>>>> On Fri, 15 Sep 2023 14:57:36 -0400, Stefan Monnier <monnier@iro.umontreal.ca> said:
>> it only took 9 years, but this is no longer true:
>>
>> lisp/international/mule-cmds.el:
>>
>> ;; In theory this code could end up pushing an "old-name" that
>> ;; shadows a "new-name" but in practice every time an
>> ;; `old-name' conflicts with a `new-name', the newer one has a
>> ;; higher code, so it gets pushed later!
>>
>> The patch below fixes that issue.
Stefan> Please adjust the patch so it correct the comment as well :-)
Done.
Closing.
Committed as 6bc3800000c
Robert
--
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#65997: 29.1; ?\N{char_name} reference is wrong, Re: bug#65997: 29.1; ?\N{char_name} reference is wrong
2023-09-18 9:57 ` bug#65997: 29.1; ?\N{char_name} reference is wrong, " Robert Pluim
@ 2023-09-18 11:24 ` Eli Zaretskii
0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2023-09-18 11:24 UTC (permalink / raw)
To: Robert Pluim; +Cc: 65997, monnier, awrhygty
> From: Robert Pluim <rpluim@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, awrhygty@outlook.com,
> 65997@debbugs.gnu.org
> Date: Mon, 18 Sep 2023 11:57:11 +0200
>
> awrhygty> 16FE4 KHITAN SMALL SCRIPT FILLER 0 error
> awrhygty> 16FF0 VIETNAMESE ALTERNATE READING MARK CA 0 error
> awrhygty> 16FF1 VIETNAMESE ALTERNATE READING MARK NHAY 0 error
> awrhygty> 1B132 HIRAGANA LETTER SMALL KO 0 error
> >>
> >> And similarly for these 4.
>
> Eli> These 4 should probably be included. They were excluded because they
> Eli> are in the ranges that were once unused.
>
> OK. Iʼll put a comment in admin/notes/unicode on master for the future.
I already did that (since I had to do this just the other day ;-).
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#65997: 29.1; ?\N{char_name} reference is wrong
2023-09-15 15:33 ` Robert Pluim
2023-09-15 18:31 ` Eli Zaretskii
@ 2023-09-15 18:57 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 0 replies; 6+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-09-15 18:57 UTC (permalink / raw)
To: Robert Pluim; +Cc: 65997, awrhygty
> it only took 9 years, but this is no longer true:
>
> lisp/international/mule-cmds.el:
>
> ;; In theory this code could end up pushing an "old-name" that
> ;; shadows a "new-name" but in practice every time an
> ;; `old-name' conflicts with a `new-name', the newer one has a
> ;; higher code, so it gets pushed later!
>
> The patch below fixes that issue.
Please adjust the patch so it correct the comment as well :-)
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-09-18 11:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-15 13:02 bug#65997: 29.1; ?\N{char_name} reference is wrong awrhygty
2023-09-15 15:33 ` Robert Pluim
2023-09-15 18:31 ` Eli Zaretskii
2023-09-18 9:57 ` bug#65997: 29.1; ?\N{char_name} reference is wrong, " Robert Pluim
2023-09-18 11:24 ` Eli Zaretskii
2023-09-15 18:57 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).