From: Andreas Roehler <andreas.roehler@easy-emacs.de>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: Another issue with thingatpt
Date: Tue, 02 Jan 2007 14:34:46 +0100 [thread overview]
Message-ID: <459A5F76.2090306@easy-emacs.de> (raw)
In-Reply-To: <17815.62022.147181.758643@rgrjr.dyndns.org>
Bob Rogers schrieb:
> From: Andreas Roehler <andreas.roehler@easy-emacs.de>
> Date: Sun, 31 Dec 2006 10:25:35 +0100
>
> > Both interfaces (ffap and thing-at-point) are already customizable,
> > though in different ways.
>
> There is no `defcustom'-form in thingatpt.el,
> it's done mostly with `defvar'. Wouldn't conceive that
> as customizable.
>
> Not in the sense of defcustom, no. But someone who can't "customize" it
> themselves via setq is probably not going to be able to change these
> hairy regexps and/or char-classes without shooting themselves in the
> foot. It's not just a matter of understanding Emacs regexps, but
> understanding how thing-at-point uses them.
Probably you are right.
>
> In any case, it seems to me that users shouldn't need to change the
> regexp proper, since that is defined by RFC3986, just the set of
> punctuation characters to drop at the end.
Maybe I miss something, but AFAIS the regexp in question is not derived
in a strict sense. I give the description from RFC
here:
;;;;;;;;;;;;;;
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
...
Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
;;;;;;;;;;;;;;;
Thats basically what I detect concerning the matter there.
> The only thing that needs to
> be customized is just the "lose the punctuation" heuristic, IMHO. And
> the definition of "punctuation" should be enlarged so that it addresses
> Slawomir's issue with parens, which are not even allowed internally.
>
> The problem mentioned originally however shouldn't occur, as
>
> ,----
> | (defvar thing-at-point-url-path-regexp
> | "[^]\t\n \"'()<>[^`{}]*[^]\t\n \"'()<>[^`{}.,;]+"
> | "A regular expression probably matching the host and filename or
> e-mail part of a URL.")
> `----
>
> includes that char. The error must reside elsewhere.
>
> Regards,
>
> Andreas Roehler
>
> It does include a ";" in the second character class, but both are
> inverted. The second set is the same as the first set with the addition
> of ".,;", which is why it refuses to match any of these characters at
> the end of the URL. This would be easier to see if the regexp were
> written this way:
>
> (defvar thing-at-point-url-path-regexp
> (concat "[^]\t\n \"'()<>[^`{}]*"
> "[^]\t\n \"'()<>[^`{}.,;]+")
> "A regular expression probably matching the host and filename or e-mail part of a URL.")
>
> -- Bob
Now I see it, thanks a lot.
BTW: What about to drop the `;' from the regexp?
Maybe together with the comma-sign, as this char is mentioned too as a
sub-delimiter.
Other problems:
- Char ' (39, #o47, #x27) now seems excluded, whereas RFC mentiones it as a
sub-delimiter too.
- (defvar thing-at-point-short-url-regexp
(concat "[-A-Za-z0-9.]+" thing-at-point-url-path-regexp)
misses the underscore in its bracket. (unreserved after RFC)
Andreas
next prev parent reply other threads:[~2007-01-02 13:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <003001c727be$349c5a80$0203a8c0@HomeNetbbb0>
[not found] ` <20061225.094150.13771816.wl@gnu.org>
[not found] ` <htx7iwdn717.fsf@urania.kanji.zinbun.kyoto-u.ac.jp>
2006-12-27 10:50 ` Another issue with thingatpt Werner LEMBERG
2006-12-27 20:29 ` Bob Rogers
2006-12-28 6:39 ` Werner LEMBERG
2006-12-29 21:23 ` Piet van Oostrum
2006-12-31 3:08 ` Bob Rogers
2006-12-31 9:25 ` Andreas Roehler
2006-12-31 17:24 ` Bob Rogers
2007-01-02 13:34 ` Andreas Roehler [this message]
2007-01-03 14:50 ` Andreas Roehler
2006-12-31 20:07 ` Piet van Oostrum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=459A5F76.2090306@easy-emacs.de \
--to=andreas.roehler@easy-emacs.de \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.