Front page | perl.perl5.porters |
Postings from July 2008
Fighting the Good Fight against spam deluge (was: Senatorial (Senescent?) reflective pause)
From:
Tom Christiansen
Date:
July 31, 2008 15:16
Subject:
Fighting the Good Fight against spam deluge (was: Senatorial (Senescent?) reflective pause)
Message ID:
27231.1217542549@chthon
In-Reply-To: Message from Johan Vromans <jvromans@squirrel.nl>
of "31 Jul 2008 13:31:41 +0200." <m2hca6waqq.fsf@phoenix.squirrel.nl>
NB: Those uninterested in iterator classes or how to survive 50,000
pieces of mail a day should please skip to /Perl-spin near the
bottom for another semi-strange perl-related spin-death issue.
>Tom Christiansen <tchrist@perl.com> writes:
>> I still have a vague hunch like a module, or here even a pragma,
>> might be a good idea.
> I'd go for a nice iterator class instead of <<<<>>>> weirdness.
Hi Johan,
Agreed on all those icky bbbracketsss!
But now I'm curious--a condition for which, per Dorothy Parker, there
is no cure :). Still, I'll try to cure it by asking whether you
might you mean:
(1) A class that has some sort of:
use overload "<>" => sub { ... };
(2) A tied-filehandle class that provides a special OPEN and maybe
READLINE function?
(3) Something that uses one of the Iterator:: CPAN modules as a base?
(4) Something else entirely, maybe like hand-rolled iterators like
# Pass -10, 1, 42, even or "az" if you please.
# Defaults to 0.
sub new_iterator(;$) {
my $start = @_ ? shift() : 0;
return sub {
state $count = $start;
return $count++;
};
}
I know we once talked about an ITERATE method, kinda like the PROPAGATE
one, but I think 1 and 2 covered that need, and both seem to do so while
allowing the old <> look-and-feel. Still, that didn't stop people from
still writing 3 and even 3.
> I don't mind typing a few more characters, especially since (as
> pointed out several times now) it is functionality that often occurs
> only once in a program -- if at all.
I really do use this sort of thing a lot in sysadminny work. I have a pair
of little perl scripts to temporarily blacklist imposters, one that trails
maillog for sendmail complaints (of my own devising), and the other
daemonlog for spamd(8) [not spamd(1)!] messages.
Sometimes I run them on /var/log/{daemon,mail}log* respecively, which
includes gz files.
#!/usr/local/bin/perl
# blacklist-imposters-smtp
use File::Tail;
$ML = "/var/log/maillog";
die "need to run as superuser" unless $> == 0;
tie(*$ML, "File::Tail", "name" => $ML)
|| die "tie failed to /var/log/maillog: $!";
@handles = ( @ARGV ? *ARGV : (), *$ML );
@ARGV = map { /\.gz$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
foreach $fh (@handles) {
if (@ARGV && $fh eq "*main::ARGV") {
warn "reading from ", join(", " => @ARGV), "\n";
} else {
warn "reading from $fh\n";
}
while (<$fh>) {
# eg:
# Jul 31 03:04:49 chthon sm-mta[8808]: m6V94csp008808: ruleset=check_mail, arg1=<root@perl.com>, rel
ay=mail.myebroadband.com [58.26.29.142], reject=553 5.3.0 <root@perl.com>... Imposter!
#
#Jul 31 04:22:30 chthon sm-mta[8799]: m6VAMLn5008799: ruleset=check_rcpt, arg1=<cech@jhereg.perl.co
m>, relay=imr-d01.mx.aol.com [205.188.157.39], reject=553 5.3.0 <cech@jhereg.perl.com>... Defunct
host spam rejected
if ( /(Defunct)/ # sent to eg jhereg.perl.com
||
/(Imposter)/ # sent from eg perl.com but rcvd on ext if
)
{
print "$0: Found $1 mail ";
unless ( / \[ ( \d+\.\d+\.\d+\.\d+ ) \] /x ) {
warn "malformed line tailed: $_";
next;
}
print " blacklisting $1\n";
system("spamdb -t -a $1") == 0
|| warn "spamdb command failed: $?";
}
}
}
or in blacklist-imposters-spamd:
#!/usr/local/bin/perl
# blacklist-imposters-spamd
use File::Tail;
$ML = "/var/log/daemon";
die "need to run as superuser" unless $> == 0;
tie(*$ML, "File::Tail", "name" => $ML)
|| die "tie failed to /var/log/maillog: $!";
@handles = ( @ARGV ? *ARGV : (), *$ML );
@ARGV = map { /\.gz$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
foreach $fh (@handles) {
if (@ARGV && $fh eq "*main::ARGV") {
warn "reading from ", join(", " => @ARGV), "\n";
} else {
warn "reading from $fh\n";
}
while (<$fh>) {
# eg:
# Jul 30 23:07:42 chthon spamd[8872]: 121.131.215.187: To: sam@mox.perl.com
# Jul 30 23:00:36 chthon spamd[8872]: 189.19.251.140: connected (54/46), lists: uatraps
# Jul 31 06:58:55 chthon spamd[8872]: 122.166.2.69: From: "Mail Delivery Subsystem" <noreply@perl.co
m>
# Jul 31 06:58:55 chthon spamd[8872]: 122.166.2.69: To: sales@perl.com
# Jul 31 06:58:55 chthon spamd[8872]: 122.166.2.69: Subject: Message could not be delivered
if ( /spamd.*: (\d+\.\d+\.\d+\.\d+): (From:.*(perl|paypal)\.com)/ ) {
print "$0: Found imposter poser in spamd log $2, blacklisting $1\n";
system("spamdb -t -a $1") == 0
|| warn "spamdb command failed: $?";
}
if ( /spamd.*: (\d+\.\d+\.\d+\.\d+): (To:.*(mox|jhereg|wraeththu)\.perl\.com)/ ) {
print "$0: Found defunct host poser in spamd log $2, blacklisting $1\n";
system("spamdb -t -a $1") == 0
|| warn "spamdb command failed: $?";
}
}
}
I also have permanent spamdb entries with choice honeypot users--like john
or michael, sandra or or susan--people who've never had accounts here but
are common names for so-called directory-scans. But that's something that
takes care of itself. There are from daemonlog:
Jul 31 01:16:06 chthon spamd[8872]: (BLACK) 220.225.252.243: <brent@maginfo.fr> -> <john@perl.com>
Jul 31 01:16:34 chthon spamd[8872]: 220.225.252.243: disconnected after 384 seconds. lists: spamd-
greytrap
Jul 31 01:16:53 chthon spamd[8872]: 220.225.252.243: connected (31/29), lists: spamd-greytrap
Jul 31 01:17:49 chthon spamd[8872]: 220.225.252.243: From: brent@maginfo.fr
Jul 31 01:17:49 chthon spamd[8872]: 220.225.252.243: To: john@perl.com
Jul 31 01:17:50 chthon spamd[8872]: 220.225.252.243: Subject: hello
Jul 31 01:17:51 chthon spamd[8872]: (BLACK) 220.225.252.243: <wjm@best.com> -> <michael@perl.com>
Jul 31 01:18:55 chthon spamd[8872]: 220.225.252.243: disconnected after 387 seconds. lists: spamd-
greytrap
Jul 31 01:19:06 chthon spamd[8872]: 220.225.252.243: connected (22/21), lists: spamd-greytrap
Jul 31 01:19:36 chthon spamd[8872]: 220.225.252.243: From: wjm@best.com
Jul 31 01:19:36 chthon spamd[8872]: 220.225.252.243: To: michael@perl.com
Jul 31 01:19:36 chthon spamd[8872]: 220.225.252.243: Subject:
Jul 31 01:20:28 chthon spamd[8872]: (BLACK) 220.225.252.243: <britney@teleport.com> -> <sandra@per
l.com>
Jul 31 01:20:39 chthon spamd[8872]: 220.225.252.243: disconnected after 387 seconds. lists: spamd-
greytrap
Jul 31 01:20:48 chthon spamd[8872]: 220.225.252.243: connected (21/21), lists: spamd-greytrap
Jul 31 01:22:17 chthon spamd[8872]: 220.225.252.243: From: britney@teleport.com
Jul 31 01:22:17 chthon spamd[8872]: 220.225.252.243: To: sandra@perl.com
Jul 31 01:22:17 chthon spamd[8872]: 220.225.252.243: Subject: hello
Jul 31 01:22:43 chthon spamd[8872]: (BLACK) 220.225.252.243: <brent@maginfo.fr> -> <john@perl.com>
For those that don't take care of themselves, Perl helps a lot with the
ad-hoc ones, and it really does simplify coding to run a simple map on
@ARGV to convert gzipped archived logs into parsable text.
I rely on pf and spamd(8) for my front line of defense (#1), with its very
clever interaction with pf(4) and persistent tables for packet redirection.
For my second line (#2) of defence, I have sendmail configured pretty
agressively. Besides that, I've split up the duties into a somewhat
elaborate separation of external-mta (listens externally; load-limited;
time-limited, etc) from internal-mta (listens internally, not load-limited)
from internal queue-deliverer (which runs only a few jobs at a time). It
catches things like:
Jul 31 06:44:57 chthon sm-mta[8204]: ruleset=check_relay, arg1=imsar.bu.edu.ro, arg2=127.0.0.4, re
lay=imsar.bu.edu.ro [217.73.165.147], reject=553 5.3.0 Spam blocked - see http://www.spamhaus.org/
I also have sendmail primes log messages for later processing
blacklist-imports-smtp with its noises about imposters or defunct hosts.
The message only makes it to spamassassin, slow as it is, after that, as
stage #3. This is from maillog, not daemon, and shows it processing your
message to me:
Jul 31 06:45:36 chthon sm-mta[26698]: m6VCjXVl026698: from=<perl5-porters-return-139016-tchrist=pe
rl.com@perl.org>, size=1756, class=-60, nrcpts=1, msgid=<m2d4kuw7c4.fsf@phoenix.squirrel.nl>, prot
o=SMTP, daemon=MTA, relay=x6.develooper.com [63.251.223.186]
Jul 31 06:45:41 chthon spamd[2463]: spamd: connection from localhost [127.0.0.1] at port 21901
Jul 31 06:45:41 chthon spamd[2463]: spamd: setuid to tchrist succeeded
Jul 31 06:45:41 chthon spamd[2463]: spamd: processing message <m2d4kuw7c4.fsf@phoenix.squirrel.nl>
for tchrist:101
Jul 31 06:45:47 chthon spamd[2463]: spamd: clean message (-10.6/4.5) for tchrist:101 in 6.3 second
s, 2040 bytes.
Jul 31 06:45:47 chthon spamd[2463]: spamd: result: . -10 - BAYES_00,RCVD_IN_DNSWL_HI scantime=6.3,
size=2040,user=tchrist,uid=101,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=21901,mid=
<m2d4kuw7c4.fsf@phoenix.squirrel.nl>,bayes=0.000000,autolearn=ham
Jul 31 06:45:48 chthon sm-queue[3539]: m6VCjXVl026698: to="|/home/tchrist/.audit_mail tchrist", ct
laddr=<tchrist@perl.com> (101/10), delay=00:00:14, xdelay=00:00:08, mailer=prog, pri=229756, dsn=2
.0.0, stat=Sent
There's a also stage #4 (|.audit_mail) and even a stage #5 (sorting into
incoming folders, eg: direct, personal, p5p, etc).
Every day, between 30,000 and 60,000 or so pieces of mail, nearly *ALL*
spam, are attempted to be delivered to me. But my load stays around
0.42 and the machine is nimble to the interactive touch, even though
it's only an old 300 Mhz Pentium-2 (686) with 128M of real memory of
512K L2 cache. It took some doing to get me to that state, but I think
it's amazing it works all, let alone with nearly no visible impact on
its 2-4 interactive users.
I do have another Perl-spin bug/problem related, but I am pretty sure
this is some pessimal combo of input data and processing code. Here's
an example of it:
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
0 2463 3795 0 2 20 44920 10020 poll I ?? 14:42.02 perl5.10.0: spamd child (perl5.10.0)
Nearly 15 minutes CPU time to process one message?!
That's SpamAssassin's spamd(1) servicing a spamc(1) client. I figure
that somewhere there must be a regex in SpamAssassin that could use
nonbacktracking, whether (?>...) or possessive quantifiers or perhaps
the new "backtracking control verbs".
But I don't know where it is, and haven't the patience to track
it down. So I just kill those off when they take too long.
Is this something others here have seen? I'd at first thought
that 5.10.0 had fixed it, but I was mistaken.
--tom
--
"What Orwell feared were those who would ban books. What Huxley feared
was that there would be no reason to ban a book, for there would be no
one who wanted to read one. Orwell feared those who would deprive us
of information. Huxley feared those who would give us so much that we
would be reduced to passivity and egoism. Orwell feared that the truth
would be concealed from us. Huxley feared the truth would be drowned
in a sea of irrelevance. Orwell feared we would become a captive
culture. Huxley feared we would become a trivial culture, ... "
--Neil Postman, forward to "Amusing Ourselves to Death"
agressively arg1 arg2 ARGV bayes best.com blacklist blacklisting britney
bytes cc cech Christiansen chthon com combo CPAN CPU ct d01.mx.aol.com
daemon,mail daemonlog dc defence didn't DNSWL don't dsn eg eq etc ext Fcc
fh File::Tail filehandle greytrap gz gzip gzipped haven't hoc http Huxley
I'd I've imr imsar.bu.edu.ro jhereg jhereg.perl.co jhereg.perl.com Johan
Jul jvromans kinda l.com laddr localhost localhost,raddr m2d4kuw7c4.fsf
m2hca6waqq.fsf m6V94csp008808 m6VAMLn5008799 m6VCjXVl026698 maginfo.fr
mail.myebroadband.com maillog main::ARGV Mhz ML mox mox.perl.com msgid MTA
noreply nrcpts Orwell p5p paypal pe Pentium Perl perl.co perl.com perl.org
perl5 perl5.10.0 pf phoenix.squirrel.nl PID PPID pragma pri prot rcpt rcvd
READLINE regex respecively rl.com RSS ruleset sandra scantime sendmail
setuid sm SMTP Spam SpamAssassin SpamAssassin's spamc spamd spamdb
squirrel.nl STAT sysadminny tchrist tchrist,uid tchrist:101 teleport.com TT
uatraps UID usr var Vromans VSZ WCHAN who've wjm wraeththu www.spamhaus.org
x6.develooper.com xdelay