From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Piet van Oostrum Newsgroups: gmane.emacs.devel Subject: Re: Another issue with thingatpt Date: Fri, 29 Dec 2006 22:23:55 +0100 Message-ID: References: <003001c727be$349c5a80$0203a8c0@HomeNetbbb0> <20061225.094150.13771816.wl@gnu.org> <20061227.115042.56977126.wl@gnu.org> <17810.55182.483602.421178@rgrjr.dyndns.org> NNTP-Posting-Host: dough.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7BIT X-Trace: sea.gmane.org 1167427496 13996 80.91.229.10 (29 Dec 2006 21:24:56 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 29 Dec 2006 21:24:56 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Dec 29 22:24:54 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by dough.gmane.org with esmtp (Exim 4.50) id 1H0PDd-0006vb-AW for ged-emacs-devel@m.gmane.org; Fri, 29 Dec 2006 22:24:49 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1H0PDc-0005Ym-DK for ged-emacs-devel@m.gmane.org; Fri, 29 Dec 2006 16:24:48 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1H0PCv-0005GV-T9 for emacs-devel@gnu.org; Fri, 29 Dec 2006 16:24:05 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1H0PCt-0005Eh-F0 for emacs-devel@gnu.org; Fri, 29 Dec 2006 16:24:04 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1H0PCs-0005EY-Sm for emacs-devel@gnu.org; Fri, 29 Dec 2006 16:24:02 -0500 Original-Received: from [195.121.247.11] (helo=psmtp02.wxs.nl) by monty-python.gnu.org with esmtp (Exim 4.52) id 1H0PCs-0001oK-I6 for emacs-devel@gnu.org; Fri, 29 Dec 2006 16:24:02 -0500 Original-Received: from ordesa.cs.uu.nl (ip565c6aef.direct-adsl.nl [86.92.106.239]) by psmtp02.wxs.nl (iPlanet Messaging Server 5.2 HotFix 2.15 (built Nov 14 2006)) with ESMTP id <0JB200BFK0S0TR@psmtp02.wxs.nl> for emacs-devel@gnu.org; Fri, 29 Dec 2006 22:24:00 +0100 (MET) Original-Received: by ordesa.cs.uu.nl (Postfix, from userid -2) id E299860E5B8; Fri, 29 Dec 2006 22:23:57 +0100 (CET) Original-Received: from ordesa.lan (localhost [127.0.0.1]) by ordesa.cs.uu.nl (Postfix) with ESMTP id 30F0A60E59F for ; Fri, 29 Dec 2006 22:23:57 +0100 (CET) In-reply-to: <17810.55182.483602.421178@rgrjr.dyndns.org> Original-To: emacs-devel@gnu.org X-Mailer: emacs 22.0.92.2 (via feedmail 8 I) User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.92 (darwin) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:64443 Archived-At: >>>>> Bob Rogers (BR) wrote: >BR> From: Werner LEMBERG >BR> Date: Wed, 27 Dec 2006 11:50:42 +0100 (CET) >BR> Here's another problematic URL: >BR> http://mousai.kanji.zinbun.kyoto-u.ac.jp/ids-find?components=&U+20207; >BR> thingatpt ignores the final `;'. >BR> Werner >BR> According to RFC3986 (aka STD066), this is wrong; ";" is legitimate >BR> anywhere in a path or query part, including the end. So are "." and >BR> ",", but thing-at-point-url-path-regexp also refuses to match these >BR> characters at the end of the string. Doing (ffap-string-at-point 'url) >BR> drops these characters plus ":", "!", and (questionably) "?". >BR> It may not be possible to find a tradeoff between RFC compliance and >BR> parsing dwimmery that would satisfy everybody. Since stripping off >BR> trailing punctuation is useful behavior (ISTR it's worked this way for a >BR> while now), I would recommend against changing it now. However, a case >BR> could be made for making thing-at-point and ffap-string-at-point >BR> consistent. Perhaps "!:;.," would be best? This is just the union of >BR> the two sets but without the dubious inclusion of "?". The way to reconcile these would be to customize it, I think. For example have a string variable that contains the punctuation characters to be included at the end. Or a regexp. By the way, thing-at-point-url-path-regexp also disallows : inside a url. These would be necessary to accept IPv6 IP addresses. -- Piet van Oostrum URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4] Private email: piet@vanoostrum.org