From: Eli Zaretskii <eliz@gnu.org>
To: Lars Ingebrigtsen <larsi@gnus.org>
Cc: 51733@debbugs.gnu.org, jidanni@jidanni.org
Subject: bug#51733: 27.1; Detect impossible email addresses better
Date: Wed, 19 Jan 2022 18:58:54 +0200 [thread overview]
Message-ID: <837davslv5.fsf@gnu.org> (raw)
In-Reply-To: <87k0ev91ba.fsf@gnus.org> (message from Lars Ingebrigtsen on Wed, 19 Jan 2022 16:45:29 +0100)
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 51733@debbugs.gnu.org, jidanni@jidanni.org
> Date: Wed, 19 Jan 2022 16:45:29 +0100
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > OK, but why do you think "Сгсе.ru" is confusable? The SLD part is
> > entirely made of single-script characters, and UTS#39 explicitly
> > allows that:
> >
> > [...] it can be perfectly legitimate to have scripts in a SLD
> > (second level domain) not be the same as scripts in a TLD (top-level
> > domain), such as:
> >
> > Cyrillic labels in a domain name with a TLD of .ru or .рф
> >
> > That's your case, isn't it?
>
> Yes, indeed. But:
>
> ---
> For some applications, it is useful to determine if a given input string has any whole-script confusable. For example, the identifier "ѕсоре" using Cyrillic characters would pass the single-script test described in Section 5.2, Restriction-Level Detection, even though it is likely to be a spoof attempt.
> ---
>
> So "Сгсе.ru" is suspicious in most contexts.
Right, but the functions we had back then didn't yet support that
part.
> > Regardless of what they are saying, I don't think the above is
> > suitable for production. I think it should be enough to see whether
> > there could be confusion with the corresponding ASCII characters from
> > confusables.txt.
>
> Yes, so that's what I've done now, but... I'd feel slightly better if I
> knew what they were actually getting at. I think they're saying that if
> "foo" is confusable with anything in any other scripts, then it's
> suspicious?
Yes, that's what they meant.
> But that sounds unworkeable. For instance, "circle.ru" is
> confusable with "СігсӀе.ru", and perhaps it's suspicious to a Russian,
> but I don't see how to make a workable function from that.
They've left that to the implementation...
Anyway, I think confusable to ASCII is good enough for Emacs for now.
> So perhaps what I've implemented now is sufficient for domains.
I think it is, yes. It definitely covers a very large chunk of the
problem.
next prev parent reply other threads:[~2022-01-19 16:58 UTC|newest]
Thread overview: 123+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-10 0:29 bug#51733: 27.1; Detect impossible email addresses better 積丹尼 Dan Jacobson
2021-11-10 0:42 ` Lars Ingebrigtsen
2021-11-10 3:34 ` Eli Zaretskii
2021-11-10 4:44 ` Lars Ingebrigtsen
2021-11-10 13:39 ` Eli Zaretskii
2021-11-11 2:52 ` Lars Ingebrigtsen
2021-11-11 7:01 ` Eli Zaretskii
2021-11-11 7:31 ` Lars Ingebrigtsen
2022-01-16 15:47 ` Lars Ingebrigtsen
2022-01-16 16:03 ` Eli Zaretskii
2022-01-16 16:09 ` Lars Ingebrigtsen
2022-01-16 16:14 ` Eli Zaretskii
2022-01-16 16:33 ` Lars Ingebrigtsen
2022-01-16 16:44 ` Eli Zaretskii
2022-01-16 17:03 ` Lars Ingebrigtsen
2022-01-16 17:50 ` Lars Ingebrigtsen
2022-01-16 18:18 ` Eli Zaretskii
2022-01-17 8:59 ` Lars Ingebrigtsen
2022-01-17 10:18 ` Eli Zaretskii
2022-01-17 14:54 ` Lars Ingebrigtsen
2022-01-17 16:47 ` Eli Zaretskii
2022-01-17 17:09 ` Lars Ingebrigtsen
2022-01-17 17:19 ` Eli Zaretskii
2022-01-17 17:26 ` Lars Ingebrigtsen
2022-01-17 17:38 ` Lars Ingebrigtsen
2022-01-17 17:48 ` Eli Zaretskii
2022-01-17 19:08 ` Eli Zaretskii
2022-01-17 20:22 ` Lars Ingebrigtsen
2022-01-18 8:40 ` Lars Ingebrigtsen
2022-01-18 11:26 ` Lars Ingebrigtsen
2022-01-18 11:37 ` Lars Ingebrigtsen
2022-01-18 11:44 ` Lars Ingebrigtsen
2022-01-18 12:00 ` Lars Ingebrigtsen
2022-01-18 12:47 ` Lars Ingebrigtsen
2022-01-18 12:51 ` Lars Ingebrigtsen
2022-01-18 18:44 ` Eli Zaretskii
2022-01-19 9:21 ` Robert Pluim
2022-01-19 9:26 ` Lars Ingebrigtsen
2022-01-19 10:12 ` Robert Pluim
2022-01-19 10:27 ` Lars Ingebrigtsen
2022-01-19 10:42 ` Robert Pluim
2022-01-19 13:46 ` Lars Ingebrigtsen
2022-01-19 17:18 ` Eli Zaretskii
2022-01-20 8:36 ` Lars Ingebrigtsen
2022-01-19 11:53 ` Eli Zaretskii
2022-01-19 12:49 ` Robert Pluim
2022-01-19 12:56 ` Lars Ingebrigtsen
2022-01-19 13:00 ` Lars Ingebrigtsen
2022-01-19 13:03 ` Eli Zaretskii
2022-01-19 12:58 ` Eli Zaretskii
2022-01-19 13:02 ` Lars Ingebrigtsen
2022-01-19 13:06 ` Eli Zaretskii
2022-01-19 13:10 ` Lars Ingebrigtsen
2022-01-19 13:21 ` Eli Zaretskii
2022-01-19 13:25 ` Lars Ingebrigtsen
2022-01-19 13:28 ` Eli Zaretskii
2022-01-19 13:39 ` Robert Pluĭm
2022-01-19 14:00 ` Lars Ingebrigtsen
2022-01-19 14:10 ` Robert Pluĭm
2022-01-19 14:24 ` Lars Ingebrigtsen
2022-01-19 14:30 ` Robert Pluim
2022-01-19 14:36 ` Lars Ingebrigtsen
2022-01-19 14:43 ` Robert Pluim
2022-01-19 16:08 ` Andreas Schwab
2022-01-19 16:47 ` Robert Pluim
2022-01-19 16:51 ` Lars Ingebrigtsen
2022-01-19 16:57 ` Robert Pluim
2022-01-19 9:25 ` Lars Ingebrigtsen
2022-01-19 11:51 ` Eli Zaretskii
2022-01-19 12:54 ` Lars Ingebrigtsen
2022-01-19 13:01 ` Eli Zaretskii
2022-01-19 13:06 ` Lars Ingebrigtsen
2022-01-19 13:11 ` Eli Zaretskii
2022-01-19 13:16 ` Lars Ingebrigtsen
2022-01-19 13:25 ` Eli Zaretskii
2022-01-19 13:31 ` Lars Ingebrigtsen
2022-01-19 13:35 ` Eli Zaretskii
2022-01-19 13:36 ` Andreas Schwab
2022-01-19 13:57 ` Lars Ingebrigtsen
2022-01-19 14:06 ` Andreas Schwab
2022-01-19 14:09 ` Lars Ingebrigtsen
2022-01-19 14:13 ` Andreas Schwab
2022-01-19 14:33 ` Lars Ingebrigtsen
2022-01-19 14:39 ` Andreas Schwab
2022-01-19 14:44 ` Lars Ingebrigtsen
2022-01-18 18:48 ` Eli Zaretskii
2022-01-18 20:15 ` Eli Zaretskii
2022-01-18 20:31 ` Eli Zaretskii
2022-01-19 13:38 ` Lars Ingebrigtsen
2022-01-18 15:05 ` Eli Zaretskii
2022-01-19 12:49 ` Michael Albinus
2022-01-19 12:59 ` Eli Zaretskii
2022-01-19 13:35 ` Lars Ingebrigtsen
2022-01-18 14:59 ` Eli Zaretskii
2022-01-19 13:56 ` Lars Ingebrigtsen
2022-01-18 14:55 ` Eli Zaretskii
2022-01-19 13:55 ` Lars Ingebrigtsen
2022-01-19 14:14 ` Eli Zaretskii
2022-01-19 14:28 ` Lars Ingebrigtsen
2022-01-19 14:57 ` Eli Zaretskii
2022-01-19 15:45 ` Lars Ingebrigtsen
2022-01-19 16:58 ` Eli Zaretskii [this message]
2022-01-19 18:25 ` Lars Ingebrigtsen
2022-01-17 17:42 ` Eli Zaretskii
2022-01-17 17:46 ` Lars Ingebrigtsen
2022-01-17 15:22 ` Eli Zaretskii
2022-01-17 15:25 ` Lars Ingebrigtsen
2022-01-17 15:53 ` Lars Ingebrigtsen
2022-01-17 16:31 ` Lars Ingebrigtsen
2022-01-17 16:52 ` Eli Zaretskii
2022-01-17 16:57 ` Lars Ingebrigtsen
2022-01-17 17:02 ` Eli Zaretskii
2022-01-17 17:04 ` Lars Ingebrigtsen
2022-01-16 18:14 ` Eli Zaretskii
2022-01-16 18:24 ` Eli Zaretskii
2022-01-16 18:34 ` Andreas Schwab
2022-01-16 18:44 ` Eli Zaretskii
2022-01-16 17:53 ` Achim Gratz
2022-01-17 17:13 ` Lars Ingebrigtsen
2022-01-17 17:43 ` 積丹尼 Dan Jacobson
2022-01-17 19:06 ` Eli Zaretskii
2022-01-20 8:57 ` Lars Ingebrigtsen
2022-01-20 15:25 ` 積丹尼 Dan Jacobson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=837davslv5.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=51733@debbugs.gnu.org \
--cc=jidanni@jidanni.org \
--cc=larsi@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.