unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Bob Rogers <rogers-emacs@rgrjr.dyndns.org>
Cc: emacs-devel@gnu.org
Subject: Re: Another issue with thingatpt
Date: Sat, 30 Dec 2006 22:08:29 -0500	[thread overview]
Message-ID: <17815.10669.22210.825181@rgrjr.dyndns.org> (raw)
In-Reply-To: <m28xgqtmz8.fsf@ordesa.lan>

   From: Piet van Oostrum <piet@cs.uu.nl>
   Date: Fri, 29 Dec 2006 22:23:55 +0100

   >>>>> Bob Rogers <rogers-emacs@rgrjr.dyndns.org> (BR) wrote:

   >BR>    From: Werner LEMBERG <wl@gnu.org>
   >BR>    Date: Wed, 27 Dec 2006 11:50:42 +0100 (CET)

   >BR>    . . .

   >BR>    thingatpt ignores the final `;'.

   >BR>        Werner

   >BR> According to RFC3986 (aka STD066), this is wrong; ";" is legitimate
   >BR> anywhere in a path or query part, including the end.  So are "." and
   >BR> ",", but thing-at-point-url-path-regexp also refuses to match these
   >BR> characters at the end of the string.  Doing (ffap-string-at-point 'url)
   >BR> drops these characters plus ":", "!", and (questionably) "?".

   >BR>    It may not be possible to find a tradeoff between RFC compliance and
   >BR> parsing dwimmery that would satisfy everybody.  Since stripping off
   >BR> trailing punctuation is useful behavior (ISTR it's worked this way for a
   >BR> while now), I would recommend against changing it now.  However, a case
   >BR> could be made for making thing-at-point and ffap-string-at-point
   >BR> consistent.  Perhaps "!:;.," would be best?  This is just the union of
   >BR> the two sets but without the dubious inclusion of "?".

   The way to reconcile these would be to customize it, I think. For example
   have a string variable that contains the punctuation characters to be
   included at the end. Or a regexp.

Both interfaces (ffap and thing-at-point) are already customizable,
though in different ways.  ffap-string-at-point uses
ffap-string-at-point-mode-alist, which maps a thing type symbol or mode
name symbol to a list of three character sets; the last string in each
alist entry is the set of characters to exclude at the end.  On the
other hand, thing-at-point uses pure regexps, but they are constructed
from each other, which makes thing-at-point harder to customize.

   Note that neither of thes implementations is really mode-sensitive,
AFAICS; ffap-string-at-point-mode-alist is poorly named.  If editing
something XML-like, for example, you would want the attribute in

	<tag attr='http://...'>

to be parsed without dropping ANY characters at the end -- and any
embedded '&apos;' to be translated to a literal apostrophe.  But even if
this is TRT, it is clearly too risky to attempt now.

   But is there any objection to unifying these two implementations
after the release?  And if so, which is the better implementation?  I
believe the difference is only historical; ffap.el is much older than
thingatpt.el (IIRC).

   By the way, thing-at-point-url-path-regexp also disallows : inside a url.
   These would be necessary to accept IPv6 IP addresses.

It works for me (though in an emacs built two weeks ago):

	(string-match thing-at-point-url-path-regexp "http://::1/foo/bar.html")
	    => 0
	(string-match thing-at-point-url-regexp "http://::1/foo/bar.html")
	    => 0

Do you have an example of failure?

					-- Bob

  reply	other threads:[~2006-12-31  3:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <003001c727be$349c5a80$0203a8c0@HomeNetbbb0>
     [not found] ` <20061225.094150.13771816.wl@gnu.org>
     [not found]   ` <htx7iwdn717.fsf@urania.kanji.zinbun.kyoto-u.ac.jp>
2006-12-27 10:50     ` Another issue with thingatpt Werner LEMBERG
2006-12-27 20:29       ` Bob Rogers
2006-12-28  6:39         ` Werner LEMBERG
2006-12-29 21:23         ` Piet van Oostrum
2006-12-31  3:08           ` Bob Rogers [this message]
2006-12-31  9:25             ` Andreas Roehler
2006-12-31 17:24               ` Bob Rogers
2007-01-02 13:34                 ` Andreas Roehler
2007-01-03 14:50                 ` Andreas Roehler
2006-12-31 20:07             ` Piet van Oostrum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17815.10669.22210.825181@rgrjr.dyndns.org \
    --to=rogers-emacs@rgrjr.dyndns.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).