From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Bob Rogers Newsgroups: gmane.emacs.devel Subject: Re: Another issue with thingatpt Date: Sat, 30 Dec 2006 22:08:29 -0500 Message-ID: <17815.10669.22210.825181@rgrjr.dyndns.org> References: <003001c727be$349c5a80$0203a8c0@HomeNetbbb0> <20061225.094150.13771816.wl@gnu.org> <20061227.115042.56977126.wl@gnu.org> <17810.55182.483602.421178@rgrjr.dyndns.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1167534526 15833 80.91.229.12 (31 Dec 2006 03:08:46 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 31 Dec 2006 03:08:46 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Dec 31 04:08:45 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1H0r40-0004ji-LX for ged-emacs-devel@m.gmane.org; Sun, 31 Dec 2006 04:08:44 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1H0r40-0000Xc-4X for ged-emacs-devel@m.gmane.org; Sat, 30 Dec 2006 22:08:44 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1H0r3o-0000XW-Oj for emacs-devel@gnu.org; Sat, 30 Dec 2006 22:08:32 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1H0r3m-0000XJ-8m for emacs-devel@gnu.org; Sat, 30 Dec 2006 22:08:31 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1H0r3m-0000XG-33 for emacs-devel@gnu.org; Sat, 30 Dec 2006 22:08:30 -0500 Original-Received: from [24.128.218.106] (helo=rgrjr.dyndns.org) by monty-python.gnu.org with smtp (Exim 4.52) id 1H0r3l-0003ug-Os for emacs-devel@gnu.org; Sat, 30 Dec 2006 22:08:29 -0500 Original-Received: (qmail 30580 invoked by uid 500); 31 Dec 2006 03:08:29 -0000 Original-To: Piet van Oostrum In-Reply-To: X-Mailer: VM 7.19 under Emacs 22.0.91.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:64551 Archived-At: From: Piet van Oostrum Date: Fri, 29 Dec 2006 22:23:55 +0100 >>>>> Bob Rogers (BR) wrote: >BR> From: Werner LEMBERG >BR> Date: Wed, 27 Dec 2006 11:50:42 +0100 (CET) >BR> . . . >BR> thingatpt ignores the final `;'. >BR> Werner >BR> According to RFC3986 (aka STD066), this is wrong; ";" is legitimate >BR> anywhere in a path or query part, including the end. So are "." and >BR> ",", but thing-at-point-url-path-regexp also refuses to match these >BR> characters at the end of the string. Doing (ffap-string-at-point 'url) >BR> drops these characters plus ":", "!", and (questionably) "?". >BR> It may not be possible to find a tradeoff between RFC compliance and >BR> parsing dwimmery that would satisfy everybody. Since stripping off >BR> trailing punctuation is useful behavior (ISTR it's worked this way for a >BR> while now), I would recommend against changing it now. However, a case >BR> could be made for making thing-at-point and ffap-string-at-point >BR> consistent. Perhaps "!:;.," would be best? This is just the union of >BR> the two sets but without the dubious inclusion of "?". The way to reconcile these would be to customize it, I think. For example have a string variable that contains the punctuation characters to be included at the end. Or a regexp. Both interfaces (ffap and thing-at-point) are already customizable, though in different ways. ffap-string-at-point uses ffap-string-at-point-mode-alist, which maps a thing type symbol or mode name symbol to a list of three character sets; the last string in each alist entry is the set of characters to exclude at the end. On the other hand, thing-at-point uses pure regexps, but they are constructed from each other, which makes thing-at-point harder to customize. Note that neither of thes implementations is really mode-sensitive, AFAICS; ffap-string-at-point-mode-alist is poorly named. If editing something XML-like, for example, you would want the attribute in to be parsed without dropping ANY characters at the end -- and any embedded ''' to be translated to a literal apostrophe. But even if this is TRT, it is clearly too risky to attempt now. But is there any objection to unifying these two implementations after the release? And if so, which is the better implementation? I believe the difference is only historical; ffap.el is much older than thingatpt.el (IIRC). By the way, thing-at-point-url-path-regexp also disallows : inside a url. These would be necessary to accept IPv6 IP addresses. It works for me (though in an emacs built two weeks ago): (string-match thing-at-point-url-path-regexp "http://::1/foo/bar.html") => 0 (string-match thing-at-point-url-regexp "http://::1/foo/bar.html") => 0 Do you have an example of failure? -- Bob