all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Andreas Roehler <andreas.roehler@easy-emacs.de>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: Another issue with thingatpt
Date: Tue, 02 Jan 2007 14:34:46 +0100	[thread overview]
Message-ID: <459A5F76.2090306@easy-emacs.de> (raw)
In-Reply-To: <17815.62022.147181.758643@rgrjr.dyndns.org>

Bob Rogers schrieb:
>    From: Andreas Roehler <andreas.roehler@easy-emacs.de>
>    Date: Sun, 31 Dec 2006 10:25:35 +0100
>
>    > Both interfaces (ffap and thing-at-point) are already customizable,
>    > though in different ways. 
>
>    There is no `defcustom'-form in thingatpt.el,
>    it's done mostly with `defvar'. Wouldn't conceive that
>    as customizable.
>
> Not in the sense of defcustom, no.  But someone who can't "customize" it
> themselves via setq is probably not going to be able to change these
> hairy regexps and/or char-classes without shooting themselves in the
> foot.  It's not just a matter of understanding Emacs regexps, but
> understanding how thing-at-point uses them.
Probably you are right.
>
>    In any case, it seems to me that users shouldn't need to change the
> regexp proper, since that is defined by RFC3986, just the set of
> punctuation characters to drop at the end. 
Maybe I miss something, but AFAIS the regexp in question is not  derived 
in a strict sense. I give the description from RFC

 here:

;;;;;;;;;;;;;;

      reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

...


   Characters that are allowed in a URI but do not have a reserved
   purpose are called unreserved.  These include uppercase and lowercase
   letters, decimal digits, hyphen, period, underscore, and tilde.

      unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

;;;;;;;;;;;;;;;

Thats basically what I detect concerning the matter there.

>  The only thing that needs to
> be customized is just the "lose the punctuation" heuristic, IMHO.  And
> the definition of "punctuation" should be enlarged so that it addresses
> Slawomir's issue with parens, which are not even allowed internally.
>
>    The problem mentioned originally however shouldn't occur, as
>
>    ,----
>    | (defvar thing-at-point-url-path-regexp
>    |   "[^]\t\n \"'()<>[^`{}]*[^]\t\n \"'()<>[^`{}.,;]+"
>    |   "A regular expression probably matching the host and filename or 
>    e-mail part of a URL.")
>    `----
>
>    includes that char. The error must reside elsewhere.
>
>    Regards,
>
>    Andreas Roehler
>
> It does include a ";" in the second character class, but both are
> inverted.  The second set is the same as the first set with the addition
> of ".,;", which is why it refuses to match any of these characters at
> the end of the URL.  This would be easier to see if the regexp were
> written this way:
>
> 	(defvar thing-at-point-url-path-regexp
> 		(concat "[^]\t\n \"'()<>[^`{}]*"
> 			"[^]\t\n \"'()<>[^`{}.,;]+")
> 	  "A regular expression probably matching the host and filename or e-mail part of a URL.")
>
> 					-- Bob
Now I see it, thanks a lot.

BTW: What about to drop the `;' from the regexp?
 
Maybe together with the comma-sign, as this char is mentioned too as a 
sub-delimiter.

Other problems:

- Char ' (39, #o47, #x27) now seems excluded, whereas RFC mentiones it as a
sub-delimiter too.

- (defvar thing-at-point-short-url-regexp
  (concat "[-A-Za-z0-9.]+" thing-at-point-url-path-regexp)

misses the underscore in its bracket. (unreserved after RFC)



Andreas

  reply	other threads:[~2007-01-02 13:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <003001c727be$349c5a80$0203a8c0@HomeNetbbb0>
     [not found] ` <20061225.094150.13771816.wl@gnu.org>
     [not found]   ` <htx7iwdn717.fsf@urania.kanji.zinbun.kyoto-u.ac.jp>
2006-12-27 10:50     ` Another issue with thingatpt Werner LEMBERG
2006-12-27 20:29       ` Bob Rogers
2006-12-28  6:39         ` Werner LEMBERG
2006-12-29 21:23         ` Piet van Oostrum
2006-12-31  3:08           ` Bob Rogers
2006-12-31  9:25             ` Andreas Roehler
2006-12-31 17:24               ` Bob Rogers
2007-01-02 13:34                 ` Andreas Roehler [this message]
2007-01-03 14:50                 ` Andreas Roehler
2006-12-31 20:07             ` Piet van Oostrum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=459A5F76.2090306@easy-emacs.de \
    --to=andreas.roehler@easy-emacs.de \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.