From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#51733: 27.1; Detect impossible email addresses better Date: Wed, 19 Jan 2022 18:58:54 +0200 Message-ID: <837davslv5.fsf@gnu.org> References: <87czn8etuz.7.fsf@jidanni.org> <87pmory859.fsf@gnus.org> <83ee57y7xz.fsf@gnu.org> <87lezfy70y.fsf@gnus.org> <83czkry6jw.fsf@gnu.org> <87h7a3y5no.fsf@gnus.org> <87czkry3h8.fsf@gnus.org> <835yqjy26h.fsf@gnu.org> <878rveybyf.fsf@gnus.org> <49EACC4C-E21E-4123-A3D2-901958CF7DC4@gnu.org> <87mtju76pp.fsf@gnus.org> <83bl0awbqq.fsf@gnu.org> <87sftm5lxc.fsf@gnus.org> <835yqiwa87.fsf@gnu.org> <87ee565l4d.fsf@gnus.org> <875yqi5kk7.fsf@gnus.org> <83zgnuuucu.fsf@gnu.org> <87zgnrakyw.fsf@gnus.org> <83fspjstho.fsf@gnu.org> <87a6frajfg.fsf@gnus.org> <83bl07srh9.fsf@gnu.org> <87k0ev91ba.fsf@gnus.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22130"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 51733@debbugs.gnu.org, jidanni@jidanni.org To: Lars Ingebrigtsen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Jan 19 18:04:14 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nAENZ-0005WO-Uv for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 19 Jan 2022 18:04:14 +0100 Original-Received: from localhost ([::1]:51590 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nAENY-0005VM-Gl for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 19 Jan 2022 12:04:12 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:38528) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nAEJX-0005M9-5U for bug-gnu-emacs@gnu.org; Wed, 19 Jan 2022 12:00:06 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:34256) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nAEJW-0008LR-9W for bug-gnu-emacs@gnu.org; Wed, 19 Jan 2022 12:00:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1nAEJW-0000FY-7n; Wed, 19 Jan 2022 12:00:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org, bugs@gnus.org Resent-Date: Wed, 19 Jan 2022 17:00:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 51733 X-GNU-PR-Package: emacs,gnus Original-Received: via spool by 51733-submit@debbugs.gnu.org id=B51733.1642611545852 (code B ref 51733); Wed, 19 Jan 2022 17:00:02 +0000 Original-Received: (at 51733) by debbugs.gnu.org; 19 Jan 2022 16:59:05 +0000 Original-Received: from localhost ([127.0.0.1]:55392 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nAEIb-0000Dg-7z for submit@debbugs.gnu.org; Wed, 19 Jan 2022 11:59:05 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:37252) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nAEIZ-0000D9-Ci for 51733@debbugs.gnu.org; Wed, 19 Jan 2022 11:59:03 -0500 Original-Received: from [2001:470:142:3::e] (port=42808 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nAEIT-00085C-R1; Wed, 19 Jan 2022 11:58:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=CXOZZSqVtJxFW66gzQoczEYG4tQFQFtmunTFgZ+8SEc=; b=eqzT9F/pVWHBLV2E5ByQ d0uNXOVHXEpR1SReNoWgBerlrixr8Gdv7/FUwcdtW7V6MXAxTyd5k4v8rSvWv2ArQ61WeKeOXYlop WpkqUKk+tHSjRh6kn9L1b56abjmHsxuX8BnJewWHI2bT8qwZHYC0J/N8hxjq+Jd2UtzLIaaElUuh3 JFYEs9fW99xknVXpcX8fBKdj1f9u2XC/qKVDJVaZFzBymcH+1KZL5sCuTvY7MtYpoABLKm2beyxZ8 HsvghWmqLIjfiLTKRvFBEPctESOTikSAUg76cxsZ+rVnfR9fzRV/zZVqrQ3Odtphx1WTQLq3oNPkO m0ZjAF7nvFjDWQ==; Original-Received: from [87.69.77.57] (port=4490 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nAEIT-000750-UU; Wed, 19 Jan 2022 11:58:58 -0500 In-Reply-To: <87k0ev91ba.fsf@gnus.org> (message from Lars Ingebrigtsen on Wed, 19 Jan 2022 16:45:29 +0100) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:224622 Archived-At: > From: Lars Ingebrigtsen > Cc: 51733@debbugs.gnu.org, jidanni@jidanni.org > Date: Wed, 19 Jan 2022 16:45:29 +0100 > > Eli Zaretskii writes: > > > OK, but why do you think "Сгсе.ru" is confusable? The SLD part is > > entirely made of single-script characters, and UTS#39 explicitly > > allows that: > > > > [...] it can be perfectly legitimate to have scripts in a SLD > > (second level domain) not be the same as scripts in a TLD (top-level > > domain), such as: > > > > Cyrillic labels in a domain name with a TLD of .ru or .рф > > > > That's your case, isn't it? > > Yes, indeed. But: > > --- > For some applications, it is useful to determine if a given input string has any whole-script confusable. For example, the identifier "ѕсоре" using Cyrillic characters would pass the single-script test described in Section 5.2, Restriction-Level Detection, even though it is likely to be a spoof attempt. > --- > > So "Сгсе.ru" is suspicious in most contexts. Right, but the functions we had back then didn't yet support that part. > > Regardless of what they are saying, I don't think the above is > > suitable for production. I think it should be enough to see whether > > there could be confusion with the corresponding ASCII characters from > > confusables.txt. > > Yes, so that's what I've done now, but... I'd feel slightly better if I > knew what they were actually getting at. I think they're saying that if > "foo" is confusable with anything in any other scripts, then it's > suspicious? Yes, that's what they meant. > But that sounds unworkeable. For instance, "circle.ru" is > confusable with "СігсӀе.ru", and perhaps it's suspicious to a Russian, > but I don't see how to make a workable function from that. They've left that to the implementation... Anyway, I think confusable to ASCII is good enough for Emacs for now. > So perhaps what I've implemented now is sufficient for domains. I think it is, yes. It definitely covers a very large chunk of the problem.