* bug#51954: 29.0.50; puny-encode doesn't normalize
@ 2021-11-18 17:06 Lars Ingebrigtsen
2021-11-18 18:40 ` Eli Zaretskii
0 siblings, 1 reply; 5+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-18 17:06 UTC (permalink / raw)
To: 51954
I'm reading
https://www.unicode.org/reports/tr36/
which says that IDNA should normalise the strings before encoding (and
lowercase, too?) This seems to agree:
https://en.wikipedia.org/wiki/Punycode
But:
(puny-encode-string "Bä.com")
=> "xn--Ba.com-xyd"
(puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
=> "xn--B.com-gra"
So I think puny-encode-string should do that first, if I'm reading TR36
right.
In GNU Emacs 29.0.50 (build 17, x86_64-pc-linux-gnu, GTK+ Version 3.24.30, cairo version 1.16.0)
of 2021-11-18 built on xo
Repository revision: 7a1e5ac8b29b731e89cc9d5b498e31bd90840b9b
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux bookworm/sid
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#51954: 29.0.50; puny-encode doesn't normalize
2021-11-18 17:06 bug#51954: 29.0.50; puny-encode doesn't normalize Lars Ingebrigtsen
@ 2021-11-18 18:40 ` Eli Zaretskii
2021-11-19 6:45 ` Lars Ingebrigtsen
0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2021-11-18 18:40 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 51954
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Thu, 18 Nov 2021 18:06:47 +0100
>
> I'm reading
>
> https://www.unicode.org/reports/tr36/
>
> which says that IDNA should normalise the strings before encoding (and
> lowercase, too?)
Yes. See also http://www.unicode.org/reports/tr46/.
> (puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
> => "xn--B.com-gra"
NFKC or NFC?
> So I think puny-encode-string should do that first, if I'm reading TR36
> right.
Agreed.
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#51954: 29.0.50; puny-encode doesn't normalize
2021-11-18 18:40 ` Eli Zaretskii
@ 2021-11-19 6:45 ` Lars Ingebrigtsen
2021-11-19 7:44 ` Eli Zaretskii
0 siblings, 1 reply; 5+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-19 6:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 51954
Eli Zaretskii <eliz@gnu.org> writes:
>> (puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
>> => "xn--B.com-gra"
>
> NFKC or NFC?
NFC. I've now expanded on the doc strings of these functions, removed
the ;;;###autoloads since they're not actually used, and added two new
string-glyph-* functions (pointing to the NFC functions) for greater
discoverability.
>> So I think puny-encode-string should do that first, if I'm reading TR36
>> right.
>
> Agreed.
Now done.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#51954: 29.0.50; puny-encode doesn't normalize
2021-11-19 6:45 ` Lars Ingebrigtsen
@ 2021-11-19 7:44 ` Eli Zaretskii
2021-11-19 7:50 ` Lars Ingebrigtsen
0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2021-11-19 7:44 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 51954
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 51954@debbugs.gnu.org
> Date: Fri, 19 Nov 2021 07:45:36 +0100
>
> NFC. I've now expanded on the doc strings of these functions, removed
> the ;;;###autoloads since they're not actually used
Isn't ucs-normalize used for accessing files on macOS? Their
file-coding-system uses normalization.
In any case, I wouldn't remove the autoloads: they are harmless, but
removing them could cause breakage.
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#51954: 29.0.50; puny-encode doesn't normalize
2021-11-19 7:44 ` Eli Zaretskii
@ 2021-11-19 7:50 ` Lars Ingebrigtsen
0 siblings, 0 replies; 5+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-19 7:50 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 51954
Eli Zaretskii <eliz@gnu.org> writes:
> Isn't ucs-normalize used for accessing files on macOS? Their
> file-coding-system uses normalization.
I grepped through the code base but couldn't find any usage of those
functions. (But on Macos we preload ucs-normalize.)
> In any case, I wouldn't remove the autoloads: they are harmless, but
> removing them could cause breakage.
I found it confusing to have all these unused functions autoloaded, but
if there's actually any usage out there, I hope people will complain,
and we can put them back in.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-11-19 7:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-18 17:06 bug#51954: 29.0.50; puny-encode doesn't normalize Lars Ingebrigtsen
2021-11-18 18:40 ` Eli Zaretskii
2021-11-19 6:45 ` Lars Ingebrigtsen
2021-11-19 7:44 ` Eli Zaretskii
2021-11-19 7:50 ` Lars Ingebrigtsen
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.