unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#65997: 29.1; ?\N{char_name} reference is wrong
@ 2023-09-15 13:02 awrhygty
  2023-09-15 15:33 ` Robert Pluim
  0 siblings, 1 reply; 6+ messages in thread
From: awrhygty @ 2023-09-15 13:02 UTC (permalink / raw)
  To: 65997


S-exps in the form of ?\N{char_name} return wrong values for some
characters.
The S-exp below inserts a whole list of such characters.

(dotimes (u (1+ (max-char 'ucs)))
  (let* ((name (get-char-code-property u 'name)))
    (when (and name (not (<= #xD800 u #xDFFF)))
      (let ((u2 (condition-case err
                    (read (format "?\\N{%s}" name))
                  (error 0))))
        (unless (eq u u2)
          (insert (format "%X\t%s\t%X\t%s\n" u name u2
                          (if (= 0 u2)
                              "error"
                            (get-char-code-property u2 'name)))))))))

output(TANGUT COMPONENTs are omitted):

21D	LATIN SMALL LETTER YOGH	292	LATIN SMALL LETTER EZH
438	CYRILLIC SMALL LETTER I	456	CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
44D	CYRILLIC SMALL LETTER E	454	CYRILLIC SMALL LETTER UKRAINIAN IE
3131	HANGUL LETTER KIYEOK	314B	HANGUL LETTER KHIEUKH
3142	HANGUL LETTER PIEUP	314D	HANGUL LETTER PHIEUPH
3148	HANGUL LETTER CIEUC	314A	HANGUL LETTER CHIEUCH
3200	PARENTHESIZED HANGUL KIYEOK	320A	PARENTHESIZED HANGUL KHIEUKH
3205	PARENTHESIZED HANGUL PIEUP	320C	PARENTHESIZED HANGUL PHIEUPH
3208	PARENTHESIZED HANGUL CIEUC	3209	PARENTHESIZED HANGUL CHIEUCH
3260	CIRCLED HANGUL KIYEOK	326A	CIRCLED HANGUL KHIEUKH
3265	CIRCLED HANGUL PIEUP	326C	CIRCLED HANGUL PHIEUPH
3268	CIRCLED HANGUL CIEUC	3269	CIRCLED HANGUL CHIEUCH
FFA1	HALFWIDTH HANGUL LETTER KIYEOK	FFBB	HALFWIDTH HANGUL LETTER KHIEUKH
FFB2	HALFWIDTH HANGUL LETTER PIEUP	FFBD	HALFWIDTH HANGUL LETTER PHIEUPH
FFB8	HALFWIDTH HANGUL LETTER CIEUC	FFBA	HALFWIDTH HANGUL LETTER CHIEUCH
16FE4	KHITAN SMALL SCRIPT FILLER	0	error
16FF0	VIETNAMESE ALTERNATE READING MARK CA	0	error
16FF1	VIETNAMESE ALTERNATE READING MARK NHAY	0	error
18800	TANGUT COMPONENT-001	0	error
...
18AFF	TANGUT COMPONENT-768	0	error
1B132	HIRAGANA LETTER SMALL KO	0	error


In GNU Emacs 29.1 (build 2, x86_64-w64-mingw32) of 2023-08-02 built on
 AVALON
Windowing system distributor 'Microsoft Corp.', version 10.0.19045
System Description: Microsoft Windows 10 Pro (v10.0.2009.19045.3448)

Configured using:
 'configure --with-modules --without-dbus --with-native-compilation=aot
 --without-compress-install --with-tree-sitter CFLAGS=-O2'

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NATIVE_COMP
NOTIFY W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB

(NATIVE_COMP present but libgccjit not available)

Important settings:
  value of $LANG: JPN
  locale-coding-system: cp932

Major mode: Lisp Interaction

Minor modes in effect:
  highlight-changes-visible-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(qp rect misearch multi-isearch comp comp-cstr warnings icons rx
emoji-labels emoji multisession sqlite transient format-spec edmacro
kmacro cl-extra gnutls network-stream nsm mailalias smtpmail textsec
uni-scripts url url-proxy url-privacy url-expand url-methods url-history
url-cookie generate-lisp-file url-domsuf url-util url-parse auth-source
cl-seq eieio eieio-core cl-macs json map url-vars idna-mapping
ucs-normalize uni-confusable textsec-check cl-print byte-opt gv bytecomp
byte-compile debug backtrace find-func hilit-chg wid-edit thingatpt
help-fns radix-tree help-mode pp shadow sort mail-extr emacsbug message
mailcap yank-media puny dired dired-loaddefs rfc822 mml mml-sec
password-cache epa derived epg rfc6068 epg-config gnus-util
text-property-search time-date subr-x mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs cl-lib
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
term/bobcat japan-util rmc iso-transl tooltip cconv eldoc paren electric
uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel dos-w32
ls-lisp disp-table term/w32-win w32-win w32-vars term/common-win
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq
simple cl-generic indonesian philippine cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button
loaddefs theme-loaddefs faces cus-face macroexp files window
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget keymap hashtable-print-readable backquote threads
w32notify w32 lcms2 multi-tty make-network-process native-compile emacs)

Memory information:
((conses 16 343615 63051)
 (symbols 48 17876 4)
 (strings 32 70082 16005)
 (string-bytes 1 1428826)
 (vectors 16 57019)
 (vector-slots 8 1745292 147352)
 (floats 8 69 384)
 (intervals 56 10985 3149)
 (buffers 984 19))





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#65997: 29.1; ?\N{char_name} reference is wrong
  2023-09-15 13:02 bug#65997: 29.1; ?\N{char_name} reference is wrong awrhygty
@ 2023-09-15 15:33 ` Robert Pluim
  2023-09-15 18:31   ` Eli Zaretskii
  2023-09-15 18:57   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 6+ messages in thread
From: Robert Pluim @ 2023-09-15 15:33 UTC (permalink / raw)
  To: awrhygty; +Cc: Stefan Monnier, 65997

>>>>> On Fri, 15 Sep 2023 22:02:37 +0900, awrhygty@outlook.com said:

    awrhygty> S-exps in the form of ?\N{char_name} return wrong values for some
    awrhygty> characters.
    awrhygty> The S-exp below inserts a whole list of such characters.

    awrhygty> (dotimes (u (1+ (max-char 'ucs)))
    awrhygty>   (let* ((name (get-char-code-property u 'name)))
    awrhygty>     (when (and name (not (<= #xD800 u #xDFFF)))
    awrhygty>       (let ((u2 (condition-case err
    awrhygty>                     (read (format "?\\N{%s}" name))
    awrhygty>                   (error 0))))
    awrhygty>         (unless (eq u u2)
    awrhygty>           (insert (format "%X\t%s\t%X\t%s\n" u name u2
    awrhygty>                           (if (= 0 u2)
    awrhygty>                               "error"
    awrhygty>                             (get-char-code-property u2 'name)))))))))

For a minute there I thought our hash tables were broken :-). Stefan,
it only took 9 years, but this is no longer true:

lisp/international/mule-cmds.el:

	        ;; In theory this code could end up pushing an "old-name" that
	        ;; shadows a "new-name" but in practice every time an
	        ;; `old-name' conflicts with a `new-name', the newer one has a
	        ;; higher code, so it gets pushed later!

The patch below fixes that issue.

    awrhygty> output(TANGUT COMPONENTs are omitted):

I donʼt know why the ranges in `ucs-names' donʼt cover these
code-points. Itʼs easy enough to change them, but theyʼre
explicitly commented out.

    awrhygty> 16FE4	KHITAN SMALL SCRIPT FILLER	0	error
    awrhygty> 16FF0	VIETNAMESE ALTERNATE READING MARK CA	0	error
    awrhygty> 16FF1	VIETNAMESE ALTERNATE READING MARK NHAY	0	error
    awrhygty> 1B132	HIRAGANA LETTER SMALL KO	0	error

And similarly for these 4.

Robert
-- 

diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index c26898f7649..254ecae5bd5 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -3135,7 +3135,9 @@ ucs-names
 	        ;; `old-name' conflicts with a `new-name', the newer one has a
 	        ;; higher code, so it gets pushed later!
 	        (if new-name (puthash new-name c names))
-	        (if old-name (puthash old-name c names))
+                (when (and old-name
+                           (not (gethash old-name names)))
+                  (puthash old-name c names))
                 ;; Unicode uses the spelling "lamda" in character
                 ;; names, instead of "lambda", due to "preferences
                 ;; expressed by the Greek National Body" (Bug#30513).





^ permalink raw reply related	[flat|nested] 6+ messages in thread

* bug#65997: 29.1; ?\N{char_name} reference is wrong
  2023-09-15 15:33 ` Robert Pluim
@ 2023-09-15 18:31   ` Eli Zaretskii
  2023-09-18  9:57     ` bug#65997: 29.1; ?\N{char_name} reference is wrong, " Robert Pluim
  2023-09-15 18:57   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2023-09-15 18:31 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 65997, monnier, awrhygty

> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 65997@debbugs.gnu.org
> From: Robert Pluim <rpluim@gmail.com>
> Date: Fri, 15 Sep 2023 17:33:41 +0200
> 
> For a minute there I thought our hash tables were broken :-). Stefan,
> it only took 9 years, but this is no longer true:
> 
> lisp/international/mule-cmds.el:
> 
> 	        ;; In theory this code could end up pushing an "old-name" that
> 	        ;; shadows a "new-name" but in practice every time an
> 	        ;; `old-name' conflicts with a `new-name', the newer one has a
> 	        ;; higher code, so it gets pushed later!
> 
> The patch below fixes that issue.

Please install on the emacs-29 branch, and thanks.

>     awrhygty> output(TANGUT COMPONENTs are omitted):
> 
> I donʼt know why the ranges in `ucs-names' donʼt cover these
> code-points. Itʼs easy enough to change them, but theyʼre
> explicitly commented out.

They are omitted because their names make no sense, and would just
confuse users.

>     awrhygty> 16FE4	KHITAN SMALL SCRIPT FILLER	0	error
>     awrhygty> 16FF0	VIETNAMESE ALTERNATE READING MARK CA	0	error
>     awrhygty> 16FF1	VIETNAMESE ALTERNATE READING MARK NHAY	0	error
>     awrhygty> 1B132	HIRAGANA LETTER SMALL KO	0	error
> 
> And similarly for these 4.

These 4 should probably be included.  They were excluded because they
are in the ranges that were once unused.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#65997: 29.1; ?\N{char_name} reference is wrong
  2023-09-15 15:33 ` Robert Pluim
  2023-09-15 18:31   ` Eli Zaretskii
@ 2023-09-15 18:57   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 6+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-09-15 18:57 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 65997, awrhygty

> it only took 9 years, but this is no longer true:
>
> lisp/international/mule-cmds.el:
>
> 	        ;; In theory this code could end up pushing an "old-name" that
> 	        ;; shadows a "new-name" but in practice every time an
> 	        ;; `old-name' conflicts with a `new-name', the newer one has a
> 	        ;; higher code, so it gets pushed later!
>
> The patch below fixes that issue.

Please adjust the patch so it correct the comment as well :-)


        Stefan






^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#65997: 29.1; ?\N{char_name} reference is wrong, Re: bug#65997: 29.1; ?\N{char_name} reference is wrong
  2023-09-15 18:31   ` Eli Zaretskii
@ 2023-09-18  9:57     ` Robert Pluim
  2023-09-18 11:24       ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Robert Pluim @ 2023-09-18  9:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65997, Stefan Monnier, awrhygty

tags 65997 fixed
close 65997 29.2
quit

>>>>> On Fri, 15 Sep 2023 21:31:48 +0300, Eli Zaretskii <eliz@gnu.org> said:

    Eli> Please install on the emacs-29 branch, and thanks.

    awrhygty> output(TANGUT COMPONENTs are omitted):
    >> 
    >> I donʼt know why the ranges in `ucs-names' donʼt cover these
    >> code-points. Itʼs easy enough to change them, but theyʼre
    >> explicitly commented out.

    Eli> They are omitted because their names make no sense, and would just
    Eli> confuse users.

OK.

    awrhygty> 16FE4	KHITAN SMALL SCRIPT FILLER	0	error
    awrhygty> 16FF0	VIETNAMESE ALTERNATE READING MARK CA	0	error
    awrhygty> 16FF1	VIETNAMESE ALTERNATE READING MARK NHAY	0	error
    awrhygty> 1B132	HIRAGANA LETTER SMALL KO	0	error
    >> 
    >> And similarly for these 4.

    Eli> These 4 should probably be included.  They were excluded because they
    Eli> are in the ranges that were once unused.

OK. Iʼll put a comment in admin/notes/unicode on master for the future.

>>>>> On Fri, 15 Sep 2023 14:57:36 -0400, Stefan Monnier <monnier@iro.umontreal.ca> said:

    >> it only took 9 years, but this is no longer true:
    >> 
    >> lisp/international/mule-cmds.el:
    >> 
    >> ;; In theory this code could end up pushing an "old-name" that
    >> ;; shadows a "new-name" but in practice every time an
    >> ;; `old-name' conflicts with a `new-name', the newer one has a
    >> ;; higher code, so it gets pushed later!
    >> 
    >> The patch below fixes that issue.

    Stefan> Please adjust the patch so it correct the comment as well :-)

Done.

Closing.
Committed as 6bc3800000c

Robert
-- 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#65997: 29.1; ?\N{char_name} reference is wrong, Re: bug#65997: 29.1; ?\N{char_name} reference is wrong
  2023-09-18  9:57     ` bug#65997: 29.1; ?\N{char_name} reference is wrong, " Robert Pluim
@ 2023-09-18 11:24       ` Eli Zaretskii
  0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2023-09-18 11:24 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 65997, monnier, awrhygty

> From: Robert Pluim <rpluim@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,  awrhygty@outlook.com,
>   65997@debbugs.gnu.org
> Date: Mon, 18 Sep 2023 11:57:11 +0200
> 
>     awrhygty> 16FE4	KHITAN SMALL SCRIPT FILLER	0	error
>     awrhygty> 16FF0	VIETNAMESE ALTERNATE READING MARK CA	0	error
>     awrhygty> 16FF1	VIETNAMESE ALTERNATE READING MARK NHAY	0	error
>     awrhygty> 1B132	HIRAGANA LETTER SMALL KO	0	error
>     >> 
>     >> And similarly for these 4.
> 
>     Eli> These 4 should probably be included.  They were excluded because they
>     Eli> are in the ranges that were once unused.
> 
> OK. Iʼll put a comment in admin/notes/unicode on master for the future.

I already did that (since I had to do this just the other day ;-).





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-18 11:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-15 13:02 bug#65997: 29.1; ?\N{char_name} reference is wrong awrhygty
2023-09-15 15:33 ` Robert Pluim
2023-09-15 18:31   ` Eli Zaretskii
2023-09-18  9:57     ` bug#65997: 29.1; ?\N{char_name} reference is wrong, " Robert Pluim
2023-09-18 11:24       ` Eli Zaretskii
2023-09-15 18:57   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).