Spam through the newsgroup gateway

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Spam through the newsgroup gateway
@ 2018-10-25 19:13 Bob Proulx
  2018-10-27 17:01 ` Garreau, Alexandre
  0 siblings, 1 reply; 10+ messages in thread
From: Bob Proulx @ 2018-10-25 19:13 UTC (permalink / raw)
  To: help-gnu-emacs

Since we have been talking about the newsgroup gateway of late...

The recent spam messages just now to the mailing list came through the
newsgroup and not the mailing list.  There isn't a way to filter it
from the mailing list since it is done upstream by Mailman outside of
our control.

It is an example of a peeve of mine with the way Mailman handles
email.  When spam enters the newsgroup it is gateway'd directly to the
mailing list bypassing spam filtering.  If it went through the spam
filtering the same as other mail then I would be okay with it.  The
only direct way to stop it is to block messages from the gateway.  And
obviously we have already talked about why that isn't desired.

Public Service Announcement: Please do not reply to spam.  If a valid
message is in reply to a spam message then it refers to it and in a
sense validates it.  To talk about spam please use an independent
thread so as not to validate the original spam.

Bob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Spam through the newsgroup gateway
       [not found] <mailman.2742.1540494841.1284.help-gnu-emacs@gnu.org>
@ 2018-10-25 20:30 ` Nuno Silva
  2018-10-25 20:41   ` Bob Proulx
  2018-10-25 20:48   ` Emanuel Berg
  0 siblings, 2 replies; 10+ messages in thread
From: Nuno Silva @ 2018-10-25 20:30 UTC (permalink / raw)
  To: help-gnu-emacs

On 2018-10-25, Bob Proulx wrote:

> Since we have been talking about the newsgroup gateway of late...
>
> The recent spam messages just now to the mailing list came through the
> newsgroup and not the mailing list.  There isn't a way to filter it
> from the mailing list since it is done upstream by Mailman outside of
> our control.
>
> It is an example of a peeve of mine with the way Mailman handles
> email.  When spam enters the newsgroup it is gateway'd directly to the
> mailing list bypassing spam filtering.  If it went through the spam
> filtering the same as other mail then I would be okay with it.  The
> only direct way to stop it is to block messages from the gateway.  And
> obviously we have already talked about why that isn't desired.
>
> Public Service Announcement: Please do not reply to spam.  If a valid
> message is in reply to a spam message then it refers to it and in a
> sense validates it.  To talk about spam please use an independent
> thread so as not to validate the original spam.

Here I only saw the two recent spam messages in the Gmane group. They
did not appear in the USENET group.

-- 
Nuno Silva


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Spam through the newsgroup gateway
  2018-10-25 20:30 ` Nuno Silva
@ 2018-10-25 20:41   ` Bob Proulx
  2018-10-25 20:57     ` Emanuel Berg
  2018-10-25 20:48   ` Emanuel Berg
  1 sibling, 1 reply; 10+ messages in thread
From: Bob Proulx @ 2018-10-25 20:41 UTC (permalink / raw)
  To: help-gnu-emacs

Nuno Silva wrote:
> Here I only saw the two recent spam messages in the Gmane group. They
> did not appear in the USENET group.

I will guess that they were filtered by one of the newsgroups in the
mesh of hosts passing news articles around and therefore you didn't
see it in your news server.  Here is host Path they listed.  Therefore
looking at any of those host newsgroup servers would have seen the
message there.

Path: usenet.stanford.edu!e5-v6no2874978qtr.0!news-out.google.com!o27-v6ni12010qtk.1!nntp.google.com!e5-v6no2874964qtr.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail

It is also possible that they were posted, then an anti-spam cancel
control (possibly an automated cancel-bot) canceled the message from
the newsgroup.  Meaning that if one reads the newsgroup later in time
that the spam will have been removed before it is seen.  But it would
have already been passed on via the gateway to the mailing list.

Just guessing...

Bob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Spam through the newsgroup gateway
  2018-10-25 20:30 ` Nuno Silva
  2018-10-25 20:41   ` Bob Proulx
@ 2018-10-25 20:48   ` Emanuel Berg
  1 sibling, 0 replies; 10+ messages in thread
From: Emanuel Berg @ 2018-10-25 20:48 UTC (permalink / raw)
  To: help-gnu-emacs

Nuno Silva wrote:

>> Since we have been talking about the
>> newsgroup gateway of late... The recent spam
>> messages just now to the mailing list came
>> through the newsgroup and not the mailing
>> list. There isn't a way to filter it from
>> the mailing list since it is done upstream
>> by Mailman outside of our control. It is an
>> example of a peeve of mine with the way
>> Mailman handles email. When spam enters the
>> newsgroup it is gateway'd directly to the
>> mailing list bypassing spam filtering. If it
>> went through the spam filtering the same as
>> other mail then I would be okay with it.
>> The only direct way to stop it is to block
>> messages from the gateway. And obviously we
>> have already talked about why that isn't
>> desired. Public Service Announcement: Please
>> do not reply to spam. If a valid message is
>> in reply to a spam message then it refers to
>> it and in a sense validates it. To talk
>> about spam please use an independent thread
>> so as not to validate the original spam.
>
> Here I only saw the two recent spam messages in
> the Gmane group. They did not appear in the
> USENET group.

I have not seen any spam in the Gmane
group either?

-- 
underground experts united
http://user.it.uu.se/~embe8573




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Spam through the newsgroup gateway
  2018-10-25 20:41   ` Bob Proulx
@ 2018-10-25 20:57     ` Emanuel Berg
  2018-10-25 22:06       ` Van L
  0 siblings, 1 reply; 10+ messages in thread
From: Emanuel Berg @ 2018-10-25 20:57 UTC (permalink / raw)
  To: help-gnu-emacs

Sorry, I didn't look close enough. I see the
spam in the Gmane group and in the newsgroup.
The reason I didn't see them is that that stuff
"solution manual" is a spammer thing that has
been around for years. So I have killfiled it
long ago. Yes, most definitely is that coming
from the newsgroup!

-- 
underground experts united
http://user.it.uu.se/~embe8573

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Spam through the newsgroup gateway
  2018-10-25 20:57     ` Emanuel Berg
@ 2018-10-25 22:06       ` Van L
  2018-10-26 10:57         ` Emanuel Berg
  0 siblings, 1 reply; 10+ messages in thread
From: Van L @ 2018-10-25 22:06 UTC (permalink / raw)
  To: help-gnu-emacs


> "solution manual" is a spammer thing

On my mailing list reader, I can mark the thing as junk but even the junk folder barfs it back out if I try to move it there.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Spam through the newsgroup gateway
  2018-10-25 22:06       ` Van L
@ 2018-10-26 10:57         ` Emanuel Berg
  0 siblings, 0 replies; 10+ messages in thread
From: Emanuel Berg @ 2018-10-26 10:57 UTC (permalink / raw)
  To: help-gnu-emacs

Van L wrote:

>> "solution manual" is a spammer thing
>
> On my mailing list reader, I can mark the
> thing as junk but even the junk folder barfs
> it back out if I try to move it there.

The guy who wrote whatever software it is that
propagates it forever sure made a fine job :)
I hope he ended up doing something better with
his talent than the spammer career.

-- 
underground experts united
http://user.it.uu.se/~embe8573




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Spam through the newsgroup gateway
  2018-10-25 19:13 Spam through the newsgroup gateway Bob Proulx
@ 2018-10-27 17:01 ` Garreau, Alexandre
  2018-11-10 22:17   ` Bob Proulx
       [not found]   ` <mailman.3846.1541888586.1284.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 10+ messages in thread
From: Garreau, Alexandre @ 2018-10-27 17:01 UTC (permalink / raw)
  To: help-gnu-emacs

On 2018-10-25 at 13:13, Bob Proulx wrote:
> Since we have been talking about the newsgroup gateway of late...
>
> The recent spam messages just now to the mailing list came through the
> newsgroup and not the mailing list.  There isn't a way to filter it
> from the mailing list since it is done upstream by Mailman outside of
> our control.
>
> It is an example of a peeve of mine with the way Mailman handles
> email.  When spam enters the newsgroup it is gateway'd directly to the
> mailing list bypassing spam filtering.  If it went through the spam
> filtering the same as other mail then I would be okay with it.  The
> only direct way to stop it is to block messages from the gateway.  And
> obviously we have already talked about why that isn't desired.
>
> Public Service Announcement: Please do not reply to spam.  If a valid
> message is in reply to a spam message then it refers to it and in a
> sense validates it.  To talk about spam please use an independent
> thread so as not to validate the original spam.

Why so?  If not sending anything to whoever sent the mail, will they
track the mailing-list or its archive to find some other mail referring
to it, and take this as an encouragement and post more spam?

Otherwise, what’s the problem of validation if it’s for a single spam?
Let’s say someone got their antispam block that spam: it seems to me
normal, whenever a discussion is being about some spam that has been
relayed by the list, that the user either see the aforementioned spam,
to aknowledge the problem other are living (and get a sample of it), or
not to see the thread at all, as they’re not concerned.

Ideally there should be a way to trigger metadata so that when you
answer to something you do while marking it as spam for people seeing
your message, like a mail header for it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Spam through the newsgroup gateway
  2018-10-27 17:01 ` Garreau, Alexandre
@ 2018-11-10 22:17   ` Bob Proulx
       [not found]   ` <mailman.3846.1541888586.1284.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 10+ messages in thread
From: Bob Proulx @ 2018-11-10 22:17 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: Garreau, Alexandre

Alexandre Garreau wrote:
> Bob Proulx wrote:
> > Public Service Announcement: Please do not reply to spam.  If a valid
> > message is in reply to a spam message then it refers to it and in a
> > sense validates it.  To talk about spam please use an independent
> > thread so as not to validate the original spam.
>
> Why so?

The best anti-spam engines in practice are learning engines such as
Bayes and other.  Spam characteristics change so quickly and their
human senders keep trying to be more sneaky than before.  We use no
fewer than three!  SpamAssassin, Bogofilter, and CRM114.  By far
CRM114 is the best of those three.  But there are subtle differences
that keep me playing one off the other and therefore continuing to add
engines rather than remove them.

Since they are learning engines they must be trained in order to
learn.  The best training has been training on error.  When the
classification is different it must be corrected.

All messages are fed through the anti-spam classification engines
twice.  Once on the frontend in order to classify the message to
determine if it should be automatically discarded.  And then once
again after the messages go through the mailing list to train on any
errors.  Since the mailing lists are relatively spam free (IMNHO) then
I assume that any message through the mailing list is a desired
message.  If any of the learning engines think otherwise then it
triggers training to learn that message as non-spam.

SpamAssassin knows the structure of email, what's a header and what is
the body.  Bogofilter and CRM114 have no knowledge of email structure
and process the message as a raw file looking at tokens in the headers
and structure and learning them as either indicators or not
dynamically.  For them this includes IP addresses and email addresses
and everything.  Everything is open to gripping upon.

Just recently, due to our conversations about the newsgroup gateway
here, I have modified this algorithm slightly.  I now look for the
newsgroup gateway header.  If a message entered through the newsgroup
then I ignore it.  There isn't anything I can do about it.  Training
on it makes no sense.  Therefore I ignore it.  No training.  But until
recently I did train on newsgroup messages too.

If someone replies to the message then the email headers and the
structure of it and, goodness forbid if they quote any of the message
(top posting on the entire spam is worst), then all of that may have
been associated with spam but when it comes through the mailing list
now it will be associated with non-spam.  Training the learning
engines on it will pull the database to thinking that that type of
message, spam though it is, is desirable on the mailing list and will
pass it through in the future.  It will eventually correct but may
take a while.  A while being around a month for the size of the token
database we keep.  From week to week the trend in spam changes.

> If not sending anything to whoever sent the mail, will they
> track the mailing-list or its archive to find some other mail referring
> to it, and take this as an encouragement and post more spam?

Not likely.  I think for spammers it is mostly send and forget (like a
"fire and forget" military missile).

> Otherwise, what's the problem of validation if it's for a single spam?
> Let's say someone got their antispam block that spam: it seems to me
> normal, whenever a discussion is being about some spam that has been
> relayed by the list, that the user either see the aforementioned spam,
> to aknowledge the problem other are living (and get a sample of it), or
> not to see the thread at all, as they're not concerned.

If it is a single spam it isn't the end of the world.  It is all just
incremental.  Because it will be used to train the learning engines.
And they will recover given enough time and good later input.  But
every little bit counts!

> Ideally there should be a way to trigger metadata so that when you
> answer to something you do while marking it as spam for people seeing
> your message, like a mail header for it.

There are systems in use where the community can vote upon messages.
They usually require multiple votes, say five, from known quality
voters, and then the message is hidden.  But mostly we see those with
web page forums.  Since this is a mailing list in order to install
such a thing we would need to have users trained on how to do this.

As another data point in this area the Debian mailing lists have an
address where people can "bounce" the spam to for further training of
their anti-spam learning engines.  And as a notification to the
listmaster that spam is flowing in and needs help to be blocked (they
use procmail rules, we do too) if they get a new type that slips
through.  (Mutt has a 'b'ounce mail action, other mailers may or may
not.)  We could set up something like that but one does not exist at
the moment.  With some more work it could be useful if people were to
contribute spams that slip through into the mailing list to it.

Sorry for the long delay in answering this message.  Life and time is
what keeps everything from happening all at once.

Bob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Spam through the newsgroup gateway
       [not found]   ` <mailman.3846.1541888586.1284.help-gnu-emacs@gnu.org>
@ 2018-11-16  3:38     ` Rusi
  0 siblings, 0 replies; 10+ messages in thread
From: Rusi @ 2018-11-16  3:38 UTC (permalink / raw)
  To: help-gnu-emacs

On Sunday, November 11, 2018 at 3:53:08 AM UTC+5:30, Bob Proulx wrote:
> Alexandre Garreau wrote:
> > Bob Proulx wrote:
> > > Public Service Announcement: Please do not reply to spam.  If a valid
> > > message is in reply to a spam message then it refers to it and in a
> > > sense validates it.  To talk about spam please use an independent
> > > thread so as not to validate the original spam.
> >
> > Why so?
> 
> The best anti-spam engines in practice are learning engines such as
> Bayes and other.  Spam characteristics change so quickly and their
> human senders keep trying to be more sneaky than before.  We use no
> fewer than three!  SpamAssassin, Bogofilter, and CRM114.  By far
> CRM114 is the best of those three.  But there are subtle differences
> that keep me playing one off the other and therefore continuing to add
> engines rather than remove them.
> 
> Since they are learning engines they must be trained in order to
> learn.  The best training has been training on error.  When the
> classification is different it must be corrected.
> 
> All messages are fed through the anti-spam classification engines
> twice.  Once on the frontend in order to classify the message to
> determine if it should be automatically discarded.  And then once
> again after the messages go through the mailing list to train on any
> errors.  Since the mailing lists are relatively spam free (IMNHO) then
> I assume that any message through the mailing list is a desired
> message.  If any of the learning engines think otherwise then it
> triggers training to learn that message as non-spam.
> 
> SpamAssassin knows the structure of email, what's a header and what is
> the body.  Bogofilter and CRM114 have no knowledge of email structure
> and process the message as a raw file looking at tokens in the headers
> and structure and learning them as either indicators or not
> dynamically.  For them this includes IP addresses and email addresses
> and everything.  Everything is open to gripping upon.
> 
> Just recently, due to our conversations about the newsgroup gateway
> here, I have modified this algorithm slightly.  I now look for the
> newsgroup gateway header.  If a message entered through the newsgroup
> then I ignore it.  There isn't anything I can do about it.  Training
> on it makes no sense.  Therefore I ignore it.  No training.  But until
> recently I did train on newsgroup messages too.
> 
> If someone replies to the message then the email headers and the
> structure of it and, goodness forbid if they quote any of the message
> (top posting on the entire spam is worst), then all of that may have
> been associated with spam but when it comes through the mailing list
> now it will be associated with non-spam.  Training the learning
> engines on it will pull the database to thinking that that type of
> message, spam though it is, is desirable on the mailing list and will
> pass it through in the future.  It will eventually correct but may
> take a while.  A while being around a month for the size of the token
> database we keep.  From week to week the trend in spam changes.
> 
> > If not sending anything to whoever sent the mail, will they
> > track the mailing-list or its archive to find some other mail referring
> > to it, and take this as an encouragement and post more spam?
> 
> Not likely.  I think for spammers it is mostly send and forget (like a
> "fire and forget" military missile).
> 
> > Otherwise, what's the problem of validation if it's for a single spam?
> > Let's say someone got their antispam block that spam: it seems to me
> > normal, whenever a discussion is being about some spam that has been
> > relayed by the list, that the user either see the aforementioned spam,
> > to aknowledge the problem other are living (and get a sample of it), or
> > not to see the thread at all, as they're not concerned.
> 
> If it is a single spam it isn't the end of the world.  It is all just
> incremental.  Because it will be used to train the learning engines.
> And they will recover given enough time and good later input.  But
> every little bit counts!
> 
> > Ideally there should be a way to trigger metadata so that when you
> > answer to something you do while marking it as spam for people seeing
> > your message, like a mail header for it.
> 
> There are systems in use where the community can vote upon messages.
> They usually require multiple votes, say five, from known quality
> voters, and then the message is hidden.  But mostly we see those with
> web page forums.  Since this is a mailing list in order to install
> such a thing we would need to have users trained on how to do this.
> 
> As another data point in this area the Debian mailing lists have an
> address where people can "bounce" the spam to for further training of
> their anti-spam learning engines.  And as a notification to the
> listmaster that spam is flowing in and needs help to be blocked (they
> use procmail rules, we do too) if they get a new type that slips
> through.  (Mutt has a 'b'ounce mail action, other mailers may or may
> not.)  We could set up something like that but one does not exist at
> the moment.  With some more work it could be useful if people were to
> contribute spams that slip through into the mailing list to it.
> 
> Sorry for the long delay in answering this message.  Life and time is
> what keeps everything from happening all at once.
> 
> Bob

You seem to be managing a splendid job with ML-news gateway spam
[ Compare https://groups.google.com/forum/#!forum/comp.lang.python ]

Wonder how easy it would be for you to share your know-how in capsule/summary??
(assuming the folks managing comp.lang.python are interested]


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-11-16  3:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-25 19:13 Spam through the newsgroup gateway Bob Proulx
2018-10-27 17:01 ` Garreau, Alexandre
2018-11-10 22:17   ` Bob Proulx
     [not found]   ` <mailman.3846.1541888586.1284.help-gnu-emacs@gnu.org>
2018-11-16  3:38     ` Rusi
     [not found] <mailman.2742.1540494841.1284.help-gnu-emacs@gnu.org>
2018-10-25 20:30 ` Nuno Silva
2018-10-25 20:41   ` Bob Proulx
2018-10-25 20:57     ` Emanuel Berg
2018-10-25 22:06       ` Van L
2018-10-26 10:57         ` Emanuel Berg
2018-10-25 20:48   ` Emanuel Berg

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.