unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Bug in mail-extract-address-components (mail-extr.el)?
@ 2002-09-23 12:25 Reiner Steib
  2002-09-23 12:57 ` Jesper Harder
  2002-09-24 22:06 ` Simon Josefsson
  0 siblings, 2 replies; 6+ messages in thread
From: Reiner Steib @ 2002-09-23 12:25 UTC (permalink / raw)


In Gnus, I use a `message-citation-line-function' that extracts the
full name of the previous poster with the function
`mail-extract-address-components' [1]:

,----[ C-h f mail-extract-address-components RET ]
| mail-extract-address-components is a compiled Lisp function in `mail-extr'.
| (mail-extract-address-components ADDRESS &optional ALL)
|
| Given an RFC-822 address ADDRESS, extract full name and canonical address.
| Returns a list of the form (FULL-NAME CANONICAL-ADDRESS).
| If no name can be extracted, FULL-NAME will be nil.
| [...]
`----

Recently I noticed, that the function fails for the following From:
line (which seem to be correct according to RFC-822):

| From: "Harald H.-J. Bongartz" <bongie@gmx.net>

Instead of "Harald H.-J. Bongartz" I get "Harald H.":

ELISP> (require 'mail-extr)
mail-extr
ELISP> (setq email "\"Harald H.-J. Bongartz\" <bongie@gmx.net>")
"\"Harald H.-J. Bongartz\" <bongie@gmx.net>"
ELISP> (setq data (mail-extract-address-components email))
("Harald H." "bongie@gmx.net")
ELISP> (car data)
"Harald H."

The error is reproducible with Emacs 21.1 and Emacs from CVS (last
week). The problem seems to be the "-":

ELISP> (car (mail-extract-address-components
  "\"Harald H. J. Bongartz\" <bongie@gmx.net>")
"Harald H. J. Bongartz"

Is this a bug in `mail-extract-address-components' or should I use a
different function to get the full name?

Bye, Reiner.

[1] My function is based on a suggestion of François Fleuret in
    news:<s02pu3weovk.fsf@wasabi.inria.fr>
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo--- PGP key available via WWW   http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in mail-extract-address-components (mail-extr.el)?
  2002-09-23 12:25 Bug in mail-extract-address-components (mail-extr.el)? Reiner Steib
@ 2002-09-23 12:57 ` Jesper Harder
  2002-09-23 14:29   ` Reiner Steib
  2002-09-24 22:06 ` Simon Josefsson
  1 sibling, 1 reply; 6+ messages in thread
From: Jesper Harder @ 2002-09-23 12:57 UTC (permalink / raw)


Reiner Steib <4uce.02.r.steib@gmx.net> writes:

> Recently I noticed, that the function fails for the following From:
> line (which seem to be correct according to RFC-822):
>
> | From: "Harald H.-J. Bongartz" <bongie@gmx.net>
>
> Instead of "Harald H.-J. Bongartz" I get "Harald H.":
>
> Is this a bug in `mail-extract-address-components' or should I use a
> different function to get the full name?

In this particular case `gnus-extract-address-components' works better:

(gnus-extract-address-components "\"Harald H.-J. Bongartz\" <bongie@gmx.net>")
==> ("Harald H.-J. Bongartz" "bongie@gmx.net")

But usually `mail-extract-address-components' is more reliable (but also
really complicated).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in mail-extract-address-components (mail-extr.el)?
  2002-09-23 12:57 ` Jesper Harder
@ 2002-09-23 14:29   ` Reiner Steib
  2002-09-23 14:48     ` lawrence mitchell
  0 siblings, 1 reply; 6+ messages in thread
From: Reiner Steib @ 2002-09-23 14:29 UTC (permalink / raw)


On Mon, Sep 23 2002, Jesper Harder wrote:

> Reiner Steib <4uce.02.r.steib@gmx.net> writes:
[...]
>> Instead of "Harald H.-J. Bongartz" I get "Harald H.":
>>
>> Is this a bug in `mail-extract-address-components' or should I use a
>> different function to get the full name?
>
> In this particular case `gnus-extract-address-components' works better:
[...]
> ==> ("Harald H.-J. Bongartz" "bongie@gmx.net")

Thanks for the hint!

> But usually `mail-extract-address-components' is more reliable (but also
> really complicated).

The code of mail-e-a-c spans more than 700 lines, whereas gnus-e-a-c
has only 27 lines. Therefore it's even more surprising that mail-e-a-c
fails for the given example (assuming it's a valid RFC-822 address),
which probably occurs quite often in real life [1]. mail-e-a-c also
fails for this:

(car (mail-extract-address-components "\"K.-H. Foo\" <foo@bar.invalid>"))
==> nil

Bye, Reiner.

[1] At least in Germany such names are not so rare: Abbreviated forms
    of Karl-Heinz, ...
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo--- PGP key available via WWW   http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in mail-extract-address-components (mail-extr.el)?
  2002-09-23 14:29   ` Reiner Steib
@ 2002-09-23 14:48     ` lawrence mitchell
  2002-09-24 21:30       ` Simon Josefsson
  0 siblings, 1 reply; 6+ messages in thread
From: lawrence mitchell @ 2002-09-23 14:48 UTC (permalink / raw)



[...] mail-e-a-c vs gnus-e-a-c.

Jesper Harder commented:
>> But usually `mail-extract-address-components' is more reliable (but also
>> really complicated).

To which Reiner Steib responded:
> The code of mail-e-a-c spans more than 700 lines, whereas gnus-e-a-c
> has only 27 lines. Therefore it's even more surprising that mail-e-a-c
> fails for the given example (assuming it's a valid RFC-822 address),
> which probably occurs quite often in real life [1]. mail-e-a-c also
> fails for this:

> (car (mail-extract-address-components "\"K.-H. Foo\" <foo@bar.invalid>"))
> ==> nil

mail-e-a-c also fails when for a single name/comment part of the
email address:

(mail-extract-address-components "lawrence <foo@bar.com>")
    => (nil "foo@bar.com")

Which, by my reading of RFC2822 is a valid address form (ICBW).

Time for a bug report I wonder?

-- 
lawrence mitchell <wence@gmx.li>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in mail-extract-address-components (mail-extr.el)?
  2002-09-23 14:48     ` lawrence mitchell
@ 2002-09-24 21:30       ` Simon Josefsson
  0 siblings, 0 replies; 6+ messages in thread
From: Simon Josefsson @ 2002-09-24 21:30 UTC (permalink / raw)


lawrence mitchell <wence@gmx.li> writes:

> [...] mail-e-a-c vs gnus-e-a-c.
>
> Jesper Harder commented:
>>> But usually `mail-extract-address-components' is more reliable (but also
>>> really complicated).
>
> To which Reiner Steib responded:
>> The code of mail-e-a-c spans more than 700 lines, whereas gnus-e-a-c
>> has only 27 lines. Therefore it's even more surprising that mail-e-a-c
>> fails for the given example (assuming it's a valid RFC-822 address),
>> which probably occurs quite often in real life [1]. mail-e-a-c also
>> fails for this:
>
>> (car (mail-extract-address-components "\"K.-H. Foo\" <foo@bar.invalid>"))
>> ==> nil
>
> mail-e-a-c also fails when for a single name/comment part of the
> email address:
>
> (mail-extract-address-components "lawrence <foo@bar.com>")
>     => (nil "foo@bar.com")
>
> Which, by my reading of RFC2822 is a valid address form (ICBW).

This is a feature, see `mail-extr-ignore-single-names'.  I think the
default value is a bad choice though.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in mail-extract-address-components (mail-extr.el)?
  2002-09-23 12:25 Bug in mail-extract-address-components (mail-extr.el)? Reiner Steib
  2002-09-23 12:57 ` Jesper Harder
@ 2002-09-24 22:06 ` Simon Josefsson
  1 sibling, 0 replies; 6+ messages in thread
From: Simon Josefsson @ 2002-09-24 22:06 UTC (permalink / raw)


Reiner Steib <4uce.02.r.steib@gmx.net> writes:

> Recently I noticed, that the function fails for the following From:
> line (which seem to be correct according to RFC-822):
>
> | From: "Harald H.-J. Bongartz" <bongie@gmx.net>
>
> Instead of "Harald H.-J. Bongartz" I get "Harald H.":

Yes, mail-extr.el does (too) many things.  The code that fails in this
example is:

	 ;; Fixup initials
	 ((looking-at mail-extr-initial-pattern)
	  (or (eq (following-char) (upcase (following-char)))
	      (setq lower-case-flag t))
	  (forward-char 1)
	  (if (eq ?. (following-char))
	      (forward-char 1)
	    (insert ?.))
	  (or (eq ?\  (following-char))
	      (insert ?\ ))
	  (setq word-found-flag t))

> Is this a bug in `mail-extract-address-components' or should I use a
> different function to get the full name?

mail-extr is not a clean RFC 2822 parser, it is a heuristic parser.
There is no complete RFC 2822 parser in Emacs AFAIK, only several
heuristic ones.

A real RFC 2822 parser would be good to have, it would improve Gnus'
header encoding which sometimes generate bad QP that causes mail to be
bounced...

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-09-24 22:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-23 12:25 Bug in mail-extract-address-components (mail-extr.el)? Reiner Steib
2002-09-23 12:57 ` Jesper Harder
2002-09-23 14:29   ` Reiner Steib
2002-09-23 14:48     ` lawrence mitchell
2002-09-24 21:30       ` Simon Josefsson
2002-09-24 22:06 ` Simon Josefsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).