From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Simon Josefsson Newsgroups: gmane.emacs.devel Subject: Re: mail-extract-address-components extract modified full name Date: Tue, 27 Jul 2004 18:28:09 +0200 Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Message-ID: References: <87wu0sk8aa.wl%yoichi@geiin.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1090945749 13307 80.91.224.253 (27 Jul 2004 16:29:09 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 27 Jul 2004 16:29:09 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jul 27 18:28:54 2004 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BpUov-0000tm-00 for ; Tue, 27 Jul 2004 18:28:54 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BpUry-00078X-MN for ged-emacs-devel@m.gmane.org; Tue, 27 Jul 2004 12:32:02 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BpUrq-00078S-MJ for emacs-devel@gnu.org; Tue, 27 Jul 2004 12:31:54 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BpUrq-00078G-1W for emacs-devel@gnu.org; Tue, 27 Jul 2004 12:31:54 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BpUrp-00078D-Iu for emacs-devel@gnu.org; Tue, 27 Jul 2004 12:31:54 -0400 Original-Received: from [217.13.230.178] (helo=yxa.extundo.com) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1BpUoY-0000d2-Es for emacs-devel@gnu.org; Tue, 27 Jul 2004 12:28:30 -0400 Original-Received: from latte.josefsson.org (c494102a.s-bi.bostream.se [217.215.27.65]) (authenticated bits=0) by yxa.extundo.com (8.12.11/8.12.11/Debian-5) with ESMTP id i6RGSC2a022587 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=OK); Tue, 27 Jul 2004 18:28:22 +0200 Original-To: Stefan Monnier X-Hashcash: 0:040727:monnier@iro.umontreal.ca:98474fb5b080fbb8 X-Hashcash: 0:040727:emacs-devel@gnu.org:4f948270b2810b52 In-Reply-To: (Stefan Monnier's message of "27 Jul 2004 10:19:49 -0400") User-Agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3.50 (gnu/linux) X-Virus-Scanned: clamd / ClamAV version 0.73, clamav-milter version 0.73a on yxa.extundo.com X-Virus-Status: Clean X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:26020 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:26020 Stefan Monnier writes: >> like the approach you propose. XEmacs users have reported even >> Latin-2 problems with the current implementation (Emacs do not have >> those problems, though, but it suggest the implementation could be >> improved). > > [ The below is all "IIRC". ] > > The function is supposed to receive ASCII input, so it's no wonder it might > break in other circumstances. Why ASCII input? > > Because the way things are defined in the RFCs, you should split the address > before doing the un-quoting of base64 and QP thingies. > I.e. after unquoting, the string might not be parsable any more (because > one of the QP chars could be a ", a \, a <, or something like that). > > So the usual answer is that if you call the function with non-ASCII input, > you're not using it properly. But of course, it's not that simple since you > might want to call that function e.g. on an email message that is being > written and that hasn't been QP-encoded yet. I agree, and have been arguing the same thing when people complain that mail-extr* cannot handle their weird input. Unfortunately, it is a losing discussion, since I can't claim that mail-extr* is only intended for use with all-ASCII valid RFC 822 input, since that isn't what it implement. It is just a big hack, and could be massaged into behaving (badly) for any purpose. One example is that BBDB reportedly uses mail-extr* to split the e-mail addresses it store locally, in ~/.bbdb, which naturally aren't QP encoded. This probably illustrate a class of applications, that deal with mail addresses, but aren't proper mail reader or writer, so it wouldn't make sense for them to use QP. IMHO, there should be two packages: 1) Proper RFC (2)822 parser. There is rfc822.el but it is insufficient, and I'm not sure it is correct -- it uses regexp's a lot, but I recall that the "correct" 2822 grammar, expressed as regexp's, is much more complex than what rfc822.el does. Naturally, it should only accept valid RFC 822 input, which is ASCII only. (Incidentally, the QP encoder/decoder need to use this package, since QP must only be applied to certain RFC 2822 grammatical terminals, not all text, and I believe the current QP encoder/decoder doesn't do this properly.) 2) Ad-hoc approach that split real world textual e-mail address, including non-ASCII, into its components. Might use the proper parser, at least partially. Perhaps similar to what Katsumi Yamaoka proposed. When these two packages exist, each current uses of mail-extr* should be investigated to find out what is really intended there. At some point in time, I counted the number of functions in Emacs that implement something similar than the mail-extr* functions do (e.g. take a textual e-mail address and split it up) and found ~5-10 versions, all with their own problems. Sadly, I keep writing rants about the situation instead of working on solving it... Perhaps partly that is because it is not straight forward to solve this; you will probably have to implement one API first, tinker with it to get experience with it, and then rewrite it slightly, and so on. Sounds like real work. Perhaps someone else has a clearer vision on how to implement it, and time to try it out. Thanks.