unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#51954: 29.0.50; puny-encode doesn't normalize
@ 2021-11-18 17:06 Lars Ingebrigtsen
  2021-11-18 18:40 ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-18 17:06 UTC (permalink / raw)
  To: 51954


I'm reading

https://www.unicode.org/reports/tr36/

which says that IDNA should normalise the strings before encoding (and
lowercase, too?)  This seems to agree:

https://en.wikipedia.org/wiki/Punycode

But:

(puny-encode-string "Bä.com")
=> "xn--Ba.com-xyd"

(puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
=> "xn--B.com-gra"

So I think puny-encode-string should do that first, if I'm reading TR36
right.


In GNU Emacs 29.0.50 (build 17, x86_64-pc-linux-gnu, GTK+ Version 3.24.30, cairo version 1.16.0)
 of 2021-11-18 built on xo
Repository revision: 7a1e5ac8b29b731e89cc9d5b498e31bd90840b9b
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux bookworm/sid

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#51954: 29.0.50; puny-encode doesn't normalize
  2021-11-18 17:06 bug#51954: 29.0.50; puny-encode doesn't normalize Lars Ingebrigtsen
@ 2021-11-18 18:40 ` Eli Zaretskii
  2021-11-19  6:45   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2021-11-18 18:40 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 51954

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Thu, 18 Nov 2021 18:06:47 +0100
> 
> I'm reading
> 
> https://www.unicode.org/reports/tr36/
> 
> which says that IDNA should normalise the strings before encoding (and
> lowercase, too?)

Yes.  See also http://www.unicode.org/reports/tr46/.

> (puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
> => "xn--B.com-gra"

NFKC or NFC?

> So I think puny-encode-string should do that first, if I'm reading TR36
> right.

Agreed.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#51954: 29.0.50; puny-encode doesn't normalize
  2021-11-18 18:40 ` Eli Zaretskii
@ 2021-11-19  6:45   ` Lars Ingebrigtsen
  2021-11-19  7:44     ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-19  6:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 51954

Eli Zaretskii <eliz@gnu.org> writes:

>> (puny-encode-string (ucs-normalize-NFKC-string "Bä.com"))
>> => "xn--B.com-gra"
>
> NFKC or NFC?

NFC.  I've now expanded on the doc strings of these functions, removed
the ;;;###autoloads since they're not actually used, and added two new
string-glyph-* functions (pointing to the NFC functions) for greater
discoverability.

>> So I think puny-encode-string should do that first, if I'm reading TR36
>> right.
>
> Agreed.

Now done.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#51954: 29.0.50; puny-encode doesn't normalize
  2021-11-19  6:45   ` Lars Ingebrigtsen
@ 2021-11-19  7:44     ` Eli Zaretskii
  2021-11-19  7:50       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2021-11-19  7:44 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 51954

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 51954@debbugs.gnu.org
> Date: Fri, 19 Nov 2021 07:45:36 +0100
> 
> NFC.  I've now expanded on the doc strings of these functions, removed
> the ;;;###autoloads since they're not actually used

Isn't ucs-normalize used for accessing files on macOS?  Their
file-coding-system uses normalization.

In any case, I wouldn't remove the autoloads: they are harmless, but
removing them could cause breakage.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#51954: 29.0.50; puny-encode doesn't normalize
  2021-11-19  7:44     ` Eli Zaretskii
@ 2021-11-19  7:50       ` Lars Ingebrigtsen
  0 siblings, 0 replies; 5+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-19  7:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 51954

Eli Zaretskii <eliz@gnu.org> writes:

> Isn't ucs-normalize used for accessing files on macOS?  Their
> file-coding-system uses normalization.

I grepped through the code base but couldn't find any usage of those
functions.  (But on Macos we preload ucs-normalize.)

> In any case, I wouldn't remove the autoloads: they are harmless, but
> removing them could cause breakage.

I found it confusing to have all these unused functions autoloaded, but
if there's actually any usage out there, I hope people will complain,
and we can put them back in.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-19  7:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-18 17:06 bug#51954: 29.0.50; puny-encode doesn't normalize Lars Ingebrigtsen
2021-11-18 18:40 ` Eli Zaretskii
2021-11-19  6:45   ` Lars Ingebrigtsen
2021-11-19  7:44     ` Eli Zaretskii
2021-11-19  7:50       ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).