unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: Adding email address support to thingatpt.el.
  2007-02-25 19:25 Adding email address support to thingatpt.el Karl Fogel
@ 2007-02-25 10:50 ` Andreas Schwab
  2007-02-26  3:27 ` Richard Stallman
  1 sibling, 0 replies; 10+ messages in thread
From: Andreas Schwab @ 2007-02-25 10:50 UTC (permalink / raw)
  To: Karl Fogel; +Cc: emacs-devel

Karl Fogel <kfogel@red-bean.com> writes:

> +;;   Email addresses
> +(defvar thing-at-point-email-regexp
> +  "<\\{0,1\\}[-+_.~a-zA-Z][-+_.~:a-zA-Z0-9]+@[-.a-zA-Z0-9]+>\\{0,1\\}"  
      ^^^^^^^^^^                                              ^^^^^^^^^^
I'd write that as "<?" and ">?" which is easier to grok.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Adding email address support to thingatpt.el.
@ 2007-02-25 19:25 Karl Fogel
  2007-02-25 10:50 ` Andreas Schwab
  2007-02-26  3:27 ` Richard Stallman
  0 siblings, 2 replies; 10+ messages in thread
From: Karl Fogel @ 2007-02-25 19:25 UTC (permalink / raw)
  To: emacs-devel

I was surprised to find that thingatpt.el doesn't support email
addresses, so made the patch below.  I'd like comments before
committing, though, as I've never modified thingatpt.el before.

If we're in a freeze, this can wait.  (I can't tell if we're in a
freeze or not, from looking at the Savannah pages for Emacs.)

-Karl

2007-02-25  Karl Fogel  <kfogel@red-bean.com>

   * thingatpt.el: Add support for email addresses (`email').
   (thing-at-point, bounds-of-thing-at-point): Document `email' support.
   (thing-at-point-email-regexp): New variable.
   (`email'): Put `bounds-of-thing-at-point' and `thing-at-point'
   properties on this symbol, with lambda forms for values.

Index: lisp/thingatpt.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/thingatpt.el,v
retrieving revision 1.40
diff -u -r1.40 thingatpt.el
--- lisp/thingatpt.el	21 Jan 2007 03:53:10 -0000	1.40
+++ lisp/thingatpt.el	25 Feb 2007 10:13:49 -0000
@@ -67,7 +67,7 @@
   "Determine the start and end buffer locations for the THING at point.
 THING is a symbol which specifies the kind of syntactic entity you want.
 Possibilities include `symbol', `list', `sexp', `defun', `filename', `url',
-`word', `sentence', `whitespace', `line', `page' and others.
+`email', `word', `sentence', `whitespace', `line', `page' and others.
 
 See the file `thingatpt.el' for documentation on how to define
 a symbol as a valid THING.
@@ -124,7 +124,7 @@
   "Return the THING at point.
 THING is a symbol which specifies the kind of syntactic entity you want.
 Possibilities include `symbol', `list', `sexp', `defun', `filename', `url',
-`word', `sentence', `whitespace', `line', `page' and others.
+`email', `word', `sentence', `whitespace', `line', `page' and others.
 
 See the file `thingatpt.el' for documentation on how to define
 a symbol as a valid THING."
@@ -340,6 +340,33 @@
              (goto-char (car bounds))
            (error "No URL here")))))
 
+;;   Email addresses
+(defvar thing-at-point-email-regexp
+  "<\\{0,1\\}[-+_.~a-zA-Z][-+_.~:a-zA-Z0-9]+@[-.a-zA-Z0-9]+>\\{0,1\\}"  
+  "A regular expression probably matching an email address.
+This does not match the real name portion, only the address, optionally
+with angle brackets.")
+
+;; Haven't set 'forward-op on 'email nor defined 'forward-email' because
+;; not sure they're actually needed, and URL seems to skip them too.
+;; Note that (end-of-thing 'email) and (beginning-of-thing 'email)
+;; work automagically, though.
+
+(put 'email 'bounds-of-thing-at-point
+     (lambda ()
+       (let ((thing (thing-at-point-looking-at thing-at-point-email-regexp)))
+         (if thing
+             (let ((beginning (match-beginning 0))
+                   (end (match-end 0)))
+               (cons beginning end)))))
+
+(put 'email 'thing-at-point
+     (lambda ()
+       (let ((boundary-pair (thing-at-point-bounds-of-email-at-point)))
+         (if boundary-pair
+             (buffer-substring-no-properties
+              (car boundary-pair) (cdr boundary-pair))))))
+
 ;;  Whitespace
 
 (defun forward-whitespace (arg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding email address support to thingatpt.el.
  2007-02-25 19:25 Adding email address support to thingatpt.el Karl Fogel
  2007-02-25 10:50 ` Andreas Schwab
@ 2007-02-26  3:27 ` Richard Stallman
  2007-02-26 13:31   ` Karl Fogel
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Stallman @ 2007-02-26  3:27 UTC (permalink / raw)
  To: Karl Fogel; +Cc: emacs-devel

Please suggest this again a couple of months after the 22.1 release.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding email address support to thingatpt.el.
  2007-02-26  3:27 ` Richard Stallman
@ 2007-02-26 13:31   ` Karl Fogel
  2007-02-26 18:00     ` Andreas Roehler
  0 siblings, 1 reply; 10+ messages in thread
From: Karl Fogel @ 2007-02-26 13:31 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:
> Please suggest this again a couple of months after the 22.1 release.

Okay, I'll try to remember to do that.

I'll incorporate Andreas Schwab's regexp suggestion in the meantime.

-Karl

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding email address support to thingatpt.el.
  2007-02-26 13:31   ` Karl Fogel
@ 2007-02-26 18:00     ` Andreas Roehler
  2007-02-27 12:08       ` Karl Fogel
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Roehler @ 2007-02-26 18:00 UTC (permalink / raw)
  To: Karl Fogel; +Cc: emacs-devel


Just for consideration:

Email at point might be used to pick emails from a
csv-database. Than `;' and `,' as delimiters should be
possible together with or instead of angles.

AFAIU rfc2822, several more chars are allowed to be
part of an email-adress than regexp honours now:

,----
| addr-spec       =       local-part "@" domain
|
| local-part      =       dot-atom / quoted-string / obs-local-part
|
| domain          =       dot-atom / domain-literal / obs-domain
|
| domain-literal  =       [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]
|
| dcontent        =       dtext / quoted-pair
|
| dtext           =       NO-WS-CTL /     ; Non white space controls
|
|                         %d33-90 /       ; The rest of the US-ASCII
|                         %d94-126        ;  characters not including "[",
|                                         ;  "]", or "\"
`----


__
Andreas Roehler

Karl Fogel schrieb:
> Richard Stallman <rms@gnu.org> writes:
>   
>> Please suggest this again a couple of months after the 22.1 release.
>>     
>
> Okay, I'll try to remember to do that.
>
> I'll incorporate Andreas Schwab's regexp suggestion in the meantime.
>
> -Karl
>
>
> _______________________________________________
> Emacs-devel mailing list
> Emacs-devel@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-devel
>
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding email address support to thingatpt.el.
  2007-02-27 12:08       ` Karl Fogel
@ 2007-02-27  7:37         ` Andreas Roehler
  2007-02-27 14:49           ` Drew Adams
  2007-02-27 17:19           ` Karl Fogel
  0 siblings, 2 replies; 10+ messages in thread
From: Andreas Roehler @ 2007-02-27  7:37 UTC (permalink / raw)
  To: Karl Fogel; +Cc: emacs-devel


>   
>> AFAIU rfc2822, several more chars are allowed to be
>> part of an email-adress than regexp honours now:
>>     
>
> It's tough to know what to include.  Many characters that technically
> could be part of an email address are rarely used in practice, and
> instead appear much more often as delimiters (in certain contexts).
> So if thingatpt.el is to Do The Right Thing most often for the user,
> it probably can't comply precisely with the RFC.
>
>   

If Emacs is more restrictive than RFC2822, errors and
bug-reports are ahead.

BTW: Does RFC2822 indeed require at least two chars before
the `@'? I'm not sure about it, but can't see that.

So the core of my regexp, taking the ranges from RFC, is this:

[\041-\132\136-\176]+@[\041-\132\136-\176]+


__
Andreas Roehler

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding email address support to thingatpt.el.
  2007-02-26 18:00     ` Andreas Roehler
@ 2007-02-27 12:08       ` Karl Fogel
  2007-02-27  7:37         ` Andreas Roehler
  0 siblings, 1 reply; 10+ messages in thread
From: Karl Fogel @ 2007-02-27 12:08 UTC (permalink / raw)
  To: Andreas Roehler; +Cc: emacs-devel

Andreas Roehler <andreas.roehler@easy-emacs.de> writes:
> Just for consideration:
>
> Email at point might be used to pick emails from a
> csv-database. Than `;' and `,' as delimiters should be
> possible together with or instead of angles.

It's okay, they're already treated as boundaries, because they're not
legal in the email address.  (I tested just now to make sure.)

Unless you mean they should be returned *as part of* the email
address, like ",andreas.roehler@easy-emacs.ed,"?  But that wouldn't be
good -- commas and semicolons are not the same as angle brackets in
that respect.

> AFAIU rfc2822, several more chars are allowed to be
> part of an email-adress than regexp honours now:

It's tough to know what to include.  Many characters that technically
could be part of an email address are rarely used in practice, and
instead appear much more often as delimiters (in certain contexts).
So if thingatpt.el is to Do The Right Thing most often for the user,
it probably can't comply precisely with the RFC.

I'm including the latest patch below, for reference, but I won't do
anything with it until after the release.

-Karl

2007-02-25  Karl Fogel  <kfogel@red-bean.com>

   * thingatpt.el: Add support for email addresses (`email').
   (thing-at-point, bounds-of-thing-at-point): Document `email' support.
   (thing-at-point-email-regexp): New variable.
   (`email'): Put `bounds-of-thing-at-point' and `thing-at-point'
   properties on this symbol, with lambda forms for values.

Index: thingatpt.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/thingatpt.el,v
retrieving revision 1.40
diff -u -r1.40 thingatpt.el
--- thingatpt.el	21 Jan 2007 03:53:10 -0000	1.40
+++ thingatpt.el	27 Feb 2007 03:07:51 -0000
@@ -67,7 +67,7 @@
   "Determine the start and end buffer locations for the THING at point.
 THING is a symbol which specifies the kind of syntactic entity you want.
 Possibilities include `symbol', `list', `sexp', `defun', `filename', `url',
-`word', `sentence', `whitespace', `line', `page' and others.
+`email', `word', `sentence', `whitespace', `line', `page' and others.
 
 See the file `thingatpt.el' for documentation on how to define
 a symbol as a valid THING.
@@ -124,7 +124,7 @@
   "Return the THING at point.
 THING is a symbol which specifies the kind of syntactic entity you want.
 Possibilities include `symbol', `list', `sexp', `defun', `filename', `url',
-`word', `sentence', `whitespace', `line', `page' and others.
+`email', `word', `sentence', `whitespace', `line', `page' and others.
 
 See the file `thingatpt.el' for documentation on how to define
 a symbol as a valid THING."
@@ -340,6 +340,33 @@
              (goto-char (car bounds))
            (error "No URL here")))))
 
+;;   Email addresses
+(defvar thing-at-point-email-regexp
+  "<?[-+_.~a-zA-Z][-+_.~:a-zA-Z0-9]+@[-.a-zA-Z0-9]+>?"
+  "A regular expression probably matching an email address.
+This does not match the real name portion, only the address, optionally
+with angle brackets.")
+
+;; Haven't set 'forward-op on 'email nor defined 'forward-email' because
+;; not sure they're actually needed, and URL seems to skip them too.
+;; Note that (end-of-thing 'email) and (beginning-of-thing 'email)
+;; work automagically, though.
+
+(put 'email 'bounds-of-thing-at-point
+     (lambda ()
+       (let ((thing (thing-at-point-looking-at thing-at-point-email-regexp)))
+         (if thing
+             (let ((beginning (match-beginning 0))
+                   (end (match-end 0)))
+               (cons beginning end))))))
+
+(put 'email 'thing-at-point
+     (lambda ()
+       (let ((boundary-pair (bounds-of-thing-at-point 'email)))
+         (if boundary-pair
+             (buffer-substring-no-properties
+              (car boundary-pair) (cdr boundary-pair))))))
+
 ;;  Whitespace
 
 (defun forward-whitespace (arg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Adding email address support to thingatpt.el.
  2007-02-27  7:37         ` Andreas Roehler
@ 2007-02-27 14:49           ` Drew Adams
  2007-02-27 20:49             ` Karl Fogel
  2007-02-27 17:19           ` Karl Fogel
  1 sibling, 1 reply; 10+ messages in thread
From: Drew Adams @ 2007-02-27 14:49 UTC (permalink / raw)
  To: emacs-devel

> >> AFAIU rfc2822, several more chars are allowed to be
> >> part of an email-adress than regexp honours now:
> >
> > It's tough to know what to include.  Many characters that technically
> > could be part of an email address are rarely used in practice, and
> > instead appear much more often as delimiters (in certain contexts).
> > So if thingatpt.el is to Do The Right Thing most often for the user,
> > it probably can't comply precisely with the RFC.
>
> If Emacs is more restrictive than RFC2822, errors and
> bug-reports are ahead.

I haven't followed all of this, and I have no special knowledge of this. It
sounds as if:

- The spec allows stuff that most people don't use, and that many people and
programs try to interpret as delimiters.

- If the code fits the spec completely, then many users would be
inconvenienced.

- If the code doesn't fit the spec completely, then some people will
complain and file bugs.

Why not have the code do both, with a user option to choose the behavior you
want? Then pick the default value of the option to inconvenience the fewest
users.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding email address support to thingatpt.el.
  2007-02-27  7:37         ` Andreas Roehler
  2007-02-27 14:49           ` Drew Adams
@ 2007-02-27 17:19           ` Karl Fogel
  1 sibling, 0 replies; 10+ messages in thread
From: Karl Fogel @ 2007-02-27 17:19 UTC (permalink / raw)
  To: Andreas Roehler; +Cc: emacs-devel

Andreas Roehler <andreas.roehler@easy-emacs.de> writes:
> If Emacs is more restrictive than RFC2822, errors and
> bug-reports are ahead.

They may be ahead either way :-).  But let's take it up after the
release.

> BTW: Does RFC2822 indeed require at least two chars before
> the `@'? I'm not sure about it, but can't see that.

No.  s{_AT_}x.org is a real email address (substitute "@" of course).
So my regexp is wrong, that should have been a "*" not a "+".  Thanks!

> So the core of my regexp, taking the ranges from RFC, is this:
> [\041-\132\136-\176]+@[\041-\132\136-\176]+

*nod*

Okay.  I'm going to sit until after the release, but let's pick it up
here then.

Best,
-Karl

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding email address support to thingatpt.el.
  2007-02-27 14:49           ` Drew Adams
@ 2007-02-27 20:49             ` Karl Fogel
  0 siblings, 0 replies; 10+ messages in thread
From: Karl Fogel @ 2007-02-27 20:49 UTC (permalink / raw)
  To: Drew Adams; +Cc: emacs-devel

"Drew Adams" <drew.adams@oracle.com> writes:
> I haven't followed all of this, and I have no special knowledge of this. It
> sounds as if:
>
> - The spec allows stuff that most people don't use, and that many people and
> programs try to interpret as delimiters.
>
> - If the code fits the spec completely, then many users would be
> inconvenienced.
>
> - If the code doesn't fit the spec completely, then some people will
> complain and file bugs.
>
> Why not have the code do both, with a user option to choose the behavior you
> want? Then pick the default value of the option to inconvenience the fewest
> users.

Sounds like the best plan to me, given that we must compromise between
opposing needs here.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-02-27 20:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-25 19:25 Adding email address support to thingatpt.el Karl Fogel
2007-02-25 10:50 ` Andreas Schwab
2007-02-26  3:27 ` Richard Stallman
2007-02-26 13:31   ` Karl Fogel
2007-02-26 18:00     ` Andreas Roehler
2007-02-27 12:08       ` Karl Fogel
2007-02-27  7:37         ` Andreas Roehler
2007-02-27 14:49           ` Drew Adams
2007-02-27 20:49             ` Karl Fogel
2007-02-27 17:19           ` Karl Fogel

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).