From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Cc: "Nicolás Ojeda Bär" <n.oje.bar@gmail.com>
Subject: [PATCH 32/34] import: fall back to Sender for extracting name and email
Date: Tue, 6 Mar 2018 08:42:40 +0000 [thread overview]
Message-ID: <20180306084242.19988-33-e@80x24.org> (raw)
In-Reply-To: <20180306084242.19988-1-e@80x24.org>
This seems like a reasonable course of action for old messages.
Cc: Nicolás Ojeda Bär <n.oje.bar@gmail.com>
---
lib/PublicInbox/Import.pm | 62 +++++++++++++++++++++++++++++------------------
1 file changed, 39 insertions(+), 23 deletions(-)
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 7ba1668..664bec6 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -208,15 +208,50 @@ sub parse_date ($) {
"$ts $zone";
}
-# returns undef on duplicate
-# returns the :MARK of the most recent commit
-sub add {
- my ($self, $mime, $check_cb) = @_; # mime = Email::MIME
+sub extract_author_info ($) {
+ my ($mime) = @_;
+ my $sender = '';
my $from = $mime->header('From');
my ($email) = PublicInbox::Address::emails($from);
my ($name) = PublicInbox::Address::names($from);
+ if (!defined($name) || !defined($email)) {
+ $sender = $mime->header('Sender');
+ if (!defined($name)) {
+ ($name) = PublicInbox::Address::names($sender);
+ }
+ if (!defined($email)) {
+ ($email) = PublicInbox::Address::emails($sender);
+ }
+ }
+ if (defined $email) {
+ # quiet down wide character warnings with utf8::encode
+ utf8::encode($email);
+ } else {
+ $email = '';
+ warn "no email in From: $from or Sender: $sender\n";
+ }
+
+ # git gets confused with:
+ # "'A U Thor <u@example.com>' via foo" <foo@example.com>
+ # ref:
+ # <CAD0k6qSUYANxbjjbE4jTW4EeVwOYgBD=bXkSu=akiYC_CB7Ffw@mail.gmail.com>
+ if (defined $name) {
+ $name =~ tr/<>//d;
+ utf8::encode($name);
+ } else {
+ $name = '';
+ warn "no name in From: $from or Sender: $sender\n";
+ }
+ ($name, $email);
+}
+
+# returns undef on duplicate
+# returns the :MARK of the most recent commit
+sub add {
+ my ($self, $mime, $check_cb) = @_; # mime = Email::MIME
+ my ($name, $email) = extract_author_info($mime);
my $date_raw = parse_date($mime);
my $subject = $mime->header('Subject');
$subject = '(no subject)' unless defined $subject;
@@ -263,25 +298,6 @@ sub add {
print $w "reset $ref\n" or wfail;
}
- # quiet down wide character warnings with utf8::encode
- if (defined $email) {
- utf8::encode($email);
- } else {
- $email = '';
- warn "no email in From: $from\n";
- }
-
- # git gets confused with:
- # "'A U Thor <u@example.com>' via foo" <foo@example.com>
- # ref:
- # <CAD0k6qSUYANxbjjbE4jTW4EeVwOYgBD=bXkSu=akiYC_CB7Ffw@mail.gmail.com>
- if (defined $name) {
- $name =~ tr/<>//d;
- utf8::encode($name);
- } else {
- $name = '';
- warn "no name in From: $from\n";
- }
utf8::encode($subject);
print $w "commit $ref\nmark :$commit\n",
"author $name <$email> $date_raw\n",
--
EW
next prev parent reply other threads:[~2018-03-06 8:42 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-06 8:42 [v2 PATCH 00/34] duplicate handling, smaller Xapian DBs, date fixes Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 01/34] v2writable: delete ::Import obj when ->done Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 02/34] search: remove informational "warning" message Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 03/34] searchidx: add PID to error message when die-ing Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 04/34] content_id: special treatment for Message-Id headers Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 05/34] evcleanup: disable outside of daemon Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 06/34] v2writable: deduplicate detection on add Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 07/34] evcleanup: do not create event loop if nothing was registered Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 08/34] mid: add `mids' and `references' methods for extraction Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 09/34] content_id: use `mids' and `references' for MID extraction Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 10/34] searchidx: use new `references' method for parsing References Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 11/34] content_id: no need to be human-friendly Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 12/34] v2writable: inject new Message-IDs on true duplicates Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 13/34] search: revert to using 'Q' as a uniQue id per-Xapian conventions Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 14/34] searchidx: support indexing multiple MIDs Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 15/34] mid: be strict with References, but loose on Message-Id Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 16/34] searchidx: avoid excessive XNQ indexing with diffs Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 17/34] searchidxskeleton: add a note about locking Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 18/34] v2writable: generated Message-ID goes first Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 19/34] searchidx: use add_boolean_term for internal terms Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 20/34] searchidx: add NNTP article number as a searchable term Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 21/34] mid: truncate excessively long MIDs early Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 22/34] nntp: use NNTP article numbers for lookups Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 23/34] nntp: fix NEWNEWS command Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 24/34] searchidx: store the primary MID in doc data for NNTP Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 25/34] import: consolidate object info for v2 imports Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 26/34] v2: avoid redundant/repeated configs for git partition repos Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 27/34] INSTALL: document more optional dependencies Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 28/34] search: favor skeleton DB for lookup_mail Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 29/34] search: each_smsg_by_mid uses skeleton if available Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 30/34] v2writable: remove unnecessary skeleton commit Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 31/34] favor Received: date over Date: header globally Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-06 8:42 ` [PATCH 33/34] scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:42 ` [PATCH 34/34] v2writable: detect and use previous partition count Eric Wong (Contractor, The Linux Foundation)
2018-03-06 8:53 ` [v2 PATCH 00/34] duplicate handling, smaller Xapian DBs, date fixes Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180306084242.19988-33-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
--cc=n.oje.bar@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).