* RFC: marking spam via refs/notes/spam to hide it
@ 2019-06-27 18:42 Konstantin Ryabitsev
2019-06-27 18:52 ` Eric Wong
0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-27 18:42 UTC (permalink / raw)
To: meta
Greetings:
I'm reluctant to delete spam because it rebases the repository -- for
large ones this can cause excessive downloads to mirrors. A thought
occurred to me -- would it make sense to just hide spam from the
frontend? E.g.:
public-inbox-hide linux-kernel message@id
This would do the following:
- remove that message from search databases
- attach a refs/notes/spam git-note to that commit
- tell public-inbox-init/reindex to ignore this commit in the future
Seems like it would be easy to do and would give a way to remove spam
without needing to edit git history.
-K
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: marking spam via refs/notes/spam to hide it
2019-06-27 18:42 RFC: marking spam via refs/notes/spam to hide it Konstantin Ryabitsev
@ 2019-06-27 18:52 ` Eric Wong
2019-06-27 18:57 ` Konstantin Ryabitsev
0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2019-06-27 18:52 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Greetings:
>
> I'm reluctant to delete spam because it rebases the repository -- for large
> ones this can cause excessive downloads to mirrors. A thought occurred to me
> -- would it make sense to just hide spam from the frontend? E.g.:
>
> public-inbox-hide linux-kernel message@id
>
> This would do the following:
>
> - remove that message from search databases
> - attach a refs/notes/spam git-note to that commit
> - tell public-inbox-init/reindex to ignore this commit in the future
Aside from the git note, public-inbox-learn already does that:
public-inbox-learn spam </path/to/message
(scans everything in ~/.public-inbox/config since spam is
frequently cross-posted)
I've been using it since the earliest days of the project and
frequently need it for the git@vger mirror.
It's also wired into -watch via watchspam (but the sa-learn step
to train spamassassin is broken atm).
> Seems like it would be easy to do and would give a way to remove spam
> without needing to edit git history.
It appends to git history, v2 changes the 'm' file to a 'd'
file with the corresponding blob; v1 removes the file from the
tree. It doesn't add blobs to git history, but there'll be
new tree and commit objects. There's no rebasing at all.
public-inbox-index has always handled unindexing it in mirrors,
too.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: marking spam via refs/notes/spam to hide it
2019-06-27 18:52 ` Eric Wong
@ 2019-06-27 18:57 ` Konstantin Ryabitsev
2019-06-27 19:33 ` Eric Wong
0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-27 18:57 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On Thu, Jun 27, 2019 at 06:52:36PM +0000, Eric Wong wrote:
>> I'm reluctant to delete spam because it rebases the repository -- for
>> large
>> ones this can cause excessive downloads to mirrors. A thought occurred to me
>> -- would it make sense to just hide spam from the frontend? E.g.:
>>
>> public-inbox-hide linux-kernel message@id
>>
>> This would do the following:
>>
>> - remove that message from search databases
>> - attach a refs/notes/spam git-note to that commit
>> - tell public-inbox-init/reindex to ignore this commit in the future
>
>Aside from the git note, public-inbox-learn already does that:
>
> public-inbox-learn spam </path/to/message
>
> (scans everything in ~/.public-inbox/config since spam is
> frequently cross-posted)
Ah, that shows how carefully I read docs, I guess. :) Is it possible to
just specify a message-id, so that there's no extra step to dump the
spam message into a file?
>> Seems like it would be easy to do and would give a way to remove spam
>> without needing to edit git history.
>
>It appends to git history, v2 changes the 'm' file to a 'd'
>file with the corresponding blob; v1 removes the file from the
>tree. It doesn't add blobs to git history, but there'll be
>new tree and commit objects. There's no rebasing at all.
>
>public-inbox-index has always handled unindexing it in mirrors,
>too.
Right on, thanks! This is certainly along the same lines that I was
thinking.
-K
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: marking spam via refs/notes/spam to hide it
2019-06-27 18:57 ` Konstantin Ryabitsev
@ 2019-06-27 19:33 ` Eric Wong
2019-06-27 19:45 ` Konstantin Ryabitsev
0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2019-06-27 19:33 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Jun 27, 2019 at 06:52:36PM +0000, Eric Wong wrote:
> > > I'm reluctant to delete spam because it rebases the repository --
> > > for large
> > > ones this can cause excessive downloads to mirrors. A thought occurred to me
> > > -- would it make sense to just hide spam from the frontend? E.g.:
> > >
> > > public-inbox-hide linux-kernel message@id
> > >
> > > This would do the following:
> > >
> > > - remove that message from search databases
> > > - attach a refs/notes/spam git-note to that commit
> > > - tell public-inbox-init/reindex to ignore this commit in the future
> >
> > Aside from the git note, public-inbox-learn already does that:
> >
> > public-inbox-learn spam </path/to/message
> >
> > (scans everything in ~/.public-inbox/config since spam is
> > frequently cross-posted)
>
> Ah, that shows how carefully I read docs, I guess. :) Is it possible to just
> specify a message-id, so that there's no extra step to dump the spam message
> into a file?
Not exactly with the Message-ID arg. It would be dangerous if
somebody malicious wanted to get you to remove a legit message
by sending a spam message which reuses a Message-ID of a legit
message. I'd definitely want to verify a message is what I'd
want to remove, first.
In theory, you could: "curl $URL_MESSAGE_ID/raw | public-inbox-learn spam";
but that's still dangerous because there are/were legit bots
(and IIRC, old git-send-email) which reused Message-IDs, too.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: marking spam via refs/notes/spam to hide it
2019-06-27 19:33 ` Eric Wong
@ 2019-06-27 19:45 ` Konstantin Ryabitsev
2019-06-27 19:50 ` Eric Wong
0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-27 19:45 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On Thu, Jun 27, 2019 at 07:33:32PM +0000, Eric Wong wrote:
>> > Aside from the git note, public-inbox-learn already does that:
>> >
>> > public-inbox-learn spam </path/to/message
>> >
>> > (scans everything in ~/.public-inbox/config since spam is
>> > frequently cross-posted)
>>
>> Ah, that shows how carefully I read docs, I guess. :) Is it possible to just
>> specify a message-id, so that there's no extra step to dump the spam message
>> into a file?
>
>Not exactly with the Message-ID arg. It would be dangerous if
>somebody malicious wanted to get you to remove a legit message
>by sending a spam message which reuses a Message-ID of a legit
>message. I'd definitely want to verify a message is what I'd
>want to remove, first.
This makes sense, thanks. I tried it out and it works to remove spam
from the frontend, but spamc step seems to fail with a somewhat
incongruous error code:
spamc failed with: 18944
Any pointers where I should look to figure out which part is failing?
-K
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: marking spam via refs/notes/spam to hide it
2019-06-27 19:45 ` Konstantin Ryabitsev
@ 2019-06-27 19:50 ` Eric Wong
2019-06-27 20:18 ` Konstantin Ryabitsev
0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2019-06-27 19:50 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Jun 27, 2019 at 07:33:32PM +0000, Eric Wong wrote:
> > > > Aside from the git note, public-inbox-learn already does that:
> > > >
> > > > public-inbox-learn spam </path/to/message
> > > >
> > > > (scans everything in ~/.public-inbox/config since spam is
> > > > frequently cross-posted)
> > >
> > > Ah, that shows how carefully I read docs, I guess. :) Is it possible to just
> > > specify a message-id, so that there's no extra step to dump the spam message
> > > into a file?
> >
> > Not exactly with the Message-ID arg. It would be dangerous if
> > somebody malicious wanted to get you to remove a legit message
> > by sending a spam message which reuses a Message-ID of a legit
> > message. I'd definitely want to verify a message is what I'd
> > want to remove, first.
>
> This makes sense, thanks. I tried it out and it works to remove spam from
> the frontend, but spamc step seems to fail with a somewhat incongruous error
> code:
>
> spamc failed with: 18944
Oops, might be $? in Perl needs to be >> 8 to get the exit code.
That gives 74, which spamc(1) says is EX_IOERR
> Any pointers where I should look to figure out which part is failing?
Anything in syslog? you can also check spamc or sa-learn on
the message directly.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: marking spam via refs/notes/spam to hide it
2019-06-27 19:50 ` Eric Wong
@ 2019-06-27 20:18 ` Konstantin Ryabitsev
0 siblings, 0 replies; 7+ messages in thread
From: Konstantin Ryabitsev @ 2019-06-27 20:18 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On Thu, Jun 27, 2019 at 07:50:11PM +0000, Eric Wong wrote:
>> This makes sense, thanks. I tried it out and it works to remove spam
>> from
>> the frontend, but spamc step seems to fail with a somewhat incongruous error
>> code:
>>
>> spamc failed with: 18944
>
>Oops, might be $? in Perl needs to be >> 8 to get the exit code.
>That gives 74, which spamc(1) says is EX_IOERR
That's indeed 74, and poking there led me to discover that spamd needed
--allow-tell to permit this sort of thing.
All's good now, thanks!
-K
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-06-27 20:18 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-27 18:42 RFC: marking spam via refs/notes/spam to hide it Konstantin Ryabitsev
2019-06-27 18:52 ` Eric Wong
2019-06-27 18:57 ` Konstantin Ryabitsev
2019-06-27 19:33 ` Eric Wong
2019-06-27 19:45 ` Konstantin Ryabitsev
2019-06-27 19:50 ` Eric Wong
2019-06-27 20:18 ` Konstantin Ryabitsev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).