From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andreas Roehler Newsgroups: gmane.emacs.devel Subject: Re: Another issue with thingatpt Date: Tue, 02 Jan 2007 14:34:46 +0100 Message-ID: <459A5F76.2090306@easy-emacs.de> References: <003001c727be$349c5a80$0203a8c0@HomeNetbbb0> <20061225.094150.13771816.wl@gnu.org> <20061227.115042.56977126.wl@gnu.org> <17810.55182.483602.421178@rgrjr.dyndns.org> <17815.10669.22210.825181@rgrjr.dyndns.org> <4597820F.9080906@easy-emacs.de> <17815.62022.147181.758643@rgrjr.dyndns.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1167744534 12669 80.91.229.12 (2 Jan 2007 13:28:54 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 2 Jan 2007 13:28:54 +0000 (UTC) Cc: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 02 14:28:52 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1H1jh3-0002eb-SL for ged-emacs-devel@m.gmane.org; Tue, 02 Jan 2007 14:28:42 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1H1jh3-0001m7-4X for ged-emacs-devel@m.gmane.org; Tue, 02 Jan 2007 08:28:41 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1H1jgj-0001jX-7I for emacs-devel@gnu.org; Tue, 02 Jan 2007 08:28:21 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1H1jgg-0001fY-1g for emacs-devel@gnu.org; Tue, 02 Jan 2007 08:28:20 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1H1jgf-0001fK-SR for emacs-devel@gnu.org; Tue, 02 Jan 2007 08:28:17 -0500 Original-Received: from [212.227.126.171] (helo=moutng.kundenserver.de) by monty-python.gnu.org with esmtp (Exim 4.52) id 1H1jgf-0000LJ-CU for emacs-devel@gnu.org; Tue, 02 Jan 2007 08:28:17 -0500 Original-Received: from [84.190.168.185] (helo=[192.168.178.25]) by mrelayeu.kundenserver.de (node=mrelayeu2) with ESMTP (Nemesis), id 0MKwtQ-1H1jgc1QVD-0006oU; Tue, 02 Jan 2007 14:28:15 +0100 User-Agent: Thunderbird 1.5.0.4 (X11/20060516) Original-To: Bob Rogers In-Reply-To: <17815.62022.147181.758643@rgrjr.dyndns.org> X-Provags-ID: kundenserver.de abuse@kundenserver.de login:62d13292e0fce6aaed56aaadcb96352d X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:64636 Archived-At: Bob Rogers schrieb: > From: Andreas Roehler > Date: Sun, 31 Dec 2006 10:25:35 +0100 > > > Both interfaces (ffap and thing-at-point) are already customizable, > > though in different ways. > > There is no `defcustom'-form in thingatpt.el, > it's done mostly with `defvar'. Wouldn't conceive that > as customizable. > > Not in the sense of defcustom, no. But someone who can't "customize" it > themselves via setq is probably not going to be able to change these > hairy regexps and/or char-classes without shooting themselves in the > foot. It's not just a matter of understanding Emacs regexps, but > understanding how thing-at-point uses them. Probably you are right. > > In any case, it seems to me that users shouldn't need to change the > regexp proper, since that is defined by RFC3986, just the set of > punctuation characters to drop at the end. Maybe I miss something, but AFAIS the regexp in question is not derived in a strict sense. I give the description from RFC here: ;;;;;;;;;;;;;; reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" ... Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde. unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" ;;;;;;;;;;;;;;; Thats basically what I detect concerning the matter there. > The only thing that needs to > be customized is just the "lose the punctuation" heuristic, IMHO. And > the definition of "punctuation" should be enlarged so that it addresses > Slawomir's issue with parens, which are not even allowed internally. > > The problem mentioned originally however shouldn't occur, as > > ,---- > | (defvar thing-at-point-url-path-regexp > | "[^]\t\n \"'()<>[^`{}]*[^]\t\n \"'()<>[^`{}.,;]+" > | "A regular expression probably matching the host and filename or > e-mail part of a URL.") > `---- > > includes that char. The error must reside elsewhere. > > Regards, > > Andreas Roehler > > It does include a ";" in the second character class, but both are > inverted. The second set is the same as the first set with the addition > of ".,;", which is why it refuses to match any of these characters at > the end of the URL. This would be easier to see if the regexp were > written this way: > > (defvar thing-at-point-url-path-regexp > (concat "[^]\t\n \"'()<>[^`{}]*" > "[^]\t\n \"'()<>[^`{}.,;]+") > "A regular expression probably matching the host and filename or e-mail part of a URL.") > > -- Bob Now I see it, thanks a lot. BTW: What about to drop the `;' from the regexp? Maybe together with the comma-sign, as this char is mentioned too as a sub-delimiter. Other problems: - Char ' (39, #o47, #x27) now seems excluded, whereas RFC mentiones it as a sub-delimiter too. - (defvar thing-at-point-short-url-regexp (concat "[-A-Za-z0-9.]+" thing-at-point-url-path-regexp) misses the underscore in its bracket. (unreserved after RFC) Andreas