...making Linux just a little more fun!

<-- 2c Tips | TAG Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Knowledge Base | News Bytes -->

The Answer Gang

By Jim Dennis, Karl-Heinz Herrmann, Breen, Chris, and... (meet the Gang) ... the Editors of Linux Gazette... and You!



(!) Perl, WWW::Mechanize, and Mailman administration

From Ben Okopnik

Answered By: Jimmy O'Regan

...or, "The Evolution of a script".
This started as a complaint about Mailman's administration interface. Over the course of 5 days in November, Ben and I bashed out a script to automate the deletion of mail that was held up by Mailman (spam, in other words), and Ben taught me some Perl along the way. -- Jimmy

Yeah, I dislike the damned thing as well. I wonder if Monsieur O'Regan would be willing to cruft up a screen-scraper that would automate the procedure?

(!) [Jimmy] Sure -- I was looking for something I could set WWW::Mechanize on anyway. Does anyone have a sample setup I can be let loose on, because Mandrake seem to have done a wonderful job of fucking up everything related to email.

(!) Awesome! Thanks, Jimmy; that damn thing is a regular pain. I wish there was a way to tell Mailman to just delete every single one of them, but I've never found a way to do so. This way, I can maybe cron it up and forget about it.

See the attached file for a sample. The only things that need to happen are

a) The "Action to take" needs to be switched to "Discard", and
b) "Submit all data" needs to be triggered.

It's actually something I need to learn about at some point, so I'll be very interested in what you code up.

(!) [Jimmy] Give this script a whirl:

See attached www-mech-1.pl.txt

I changed the action of the page to submit to a simple PHP script

See attached simple-dump.php.txt

All it does is check that a username and password have been passed, and if so, regurgitate everything the script sent. It seems to work, based on the HTML in that sample. If it doesn't work, uncomment the two 'print' statments and send me the results.
(!) [Jimmy] Mailman's auth mechanism uses cookies, starting from http://linuxgazette.net/mailman/admindb/tag

See attached www-mech-2.pl.txt

(!)

#print $mech->response();

This is probably not what you want - you'll just get a hashref as a result. However, just in case it is, for some reason, I'm sending the output along (but I'll be tweaking the script so that it does produce something useful from the above.)

(!) [Jimmy] I forgot to remove that from the original, when I thought I was using basic authentication; it prints a hashref, but it also prints the HTTP status code. Not something to rely on, but it worked well enough to let me see where I was going wrong (I was forgetting to prepend 'Basic ' to the base64 encoded user/pass pair).

(!) What it looks like is that the script is pulling down the content, but then it's not doing anything with it.

(!) [Jimmy] After that form is submitted, is there any sort of 'Are you sure?' step?

(!) Nope. It just shows you a result page that essentially says "there aren't any new messages".

I only had a few minutes this morning, but - the stuff in the "if" clause never happens. I put in a print statement above it and inside it, and the one above prints stuff like

See attached annoyed-senderaction.log.txt

just fine, but nothing from the inside (which would have been prefixed with '--->'.)

Don't know why; the regex is right...

(!) [Jimmy] maybe try changing it to /(senderaction-[^">]*)/ -- it can't hurt.

(!) Ah - that got inside:

See attached inside-senderaction.log.txt

However, it still fails to delete the buggers.  :( I suspect that the normal submission process sends something more than just the radio button values to the CGI, whereas you skip everything else:

next unless $token->return_attr('type') =~ /radio/i;

At least in my limited perception; I don't know the module at all.

Around here, Ben wondered what the PHP script was for -- Jimmy

(!) Err... sorry, I've lost the context. What is this page, where does it go, and what do I need to do with it?

(!) [Jimmy] Sorry, forgot myself. That was there to make sure the script was sending the right values: '3' for reject.

(!) Oh. I don't have PHP - no way to test that; however, you've seen the output from Data::Dumper by now, and that gives you everything.

(!) [Jimmy] Well, you said already that it's not getting inside the if statement, which is strange. If it was that there was a missing value that needed to be submitted to the form, that'd be one thing, but as it is, only the default stuff is getting submitted.
Hang on... the first submit works, but that doesn't use {name => adminpw, value => "}, it's {adminpw => "}; so maybe I should have the array made up of {$regex_match => 3}. I'm not so hot with using anything other than scalars, so you may need to fix the syntax inside the if statement.

See attached www-mech-3.pl.txt

(!) [Jimmy] OK, try again:

See attached www-mech-4.pl.txt

(!)

...............

         # This may need sytax correction
         $name->{"$1"} = 3;

...............

Looks OK, although quoting is deprecated unless you need interpolation. However, it still doesn't work; see the appended output (again, from Data::Dumper.)

Looking at it, specifically the data that's sent back, I see what looks like a problem (I've added some newlines to clarify the view):

...............

$VAR12 = bless( {
                  '_content' => '
		  senderaction-%2522pearl%2Bdeleon%2522%2540genetikayos.com=0
		  &senderforwardto-%2522pearl%2Bdeleon%2522%2540genetikayos.com=tag-owner%40linuxgazette.net
		  &senderfilter-%2522pearl%2Bdeleon%2522%2540genetikayos.com=3
		  &senderaction-abcd21ruby%2540hotmail.com=0
		  &senderforwardto-abcd21ruby%2540hotmail.com=tag-owner%40linuxgazette.net
		  &senderfilter-abcd21ruby%2540hotmail.com=3

[snip]

...............

Seems like '3' is somehow getting assigned to the wrong bit; it should be on the "senderaction" statements, but is ending up on the "senderfilter".

(!) [Jimmy] I think I have it this time...

See attached www-mech-5.pl.txt

(!) Well... still doesn't work. Time for me to stop being lazy, then, and actually look it up myself. :)

...............

while (my $token = $p->get_tag('input'))
{
    next unless $token->return_attr('type') =~ /radio/i;
    if ($token->return_attr('name') =~ /(senderaction-[^">]*)/)
    {
        $name->{$1} = 3;
    }
}

# Eek! is this \%name or %name?
$mech->submit_form(form_number => 1, fields => \%name);

...............

Neither; you've never defined a %name hash. What you've got is a reference named $name pointing to an anonymous hash. "fields" does indeed expect a hashref, though. So,

$name->{$1} = 3;

should be simply

$name{$1} = 3;

and %name should be declared in a "my" somewhere; "fields" should point to "\%name".

Ahhh... now it works. Very cool!

(!) [Jimmy] Yeah, I knew there was something I wasn't getting there; thanks for the explanation. I think I still have a mark on my forehead from when I realised I was trying to send an array where a hash was expected.
So, just so I'm sure, is the final version this?

See attached www-mech-6.pl.txt

(!) I've added a little processing to make sure that an empty page doesn't cause any errors, and a little noise so it'll tell me that it's doing its job.

See attached www-mech-7.pl.txt

(!) [Jimmy] Reminds me of something I read once -- something like "a program is complete when there's nothing left to take away, not when there's nothing left to add".

(!) Yep, the Rodin school of programming. I'm certainly an adherent.

(!) [Jimmy] I'll just chalk it up to the perils of cut 'n' paste programming.

(!) No worries; that's one of the ways to learn. If you're not making mistakes, you're not learning - right? I have to keep repeating that to myself, especially since I'm teaching my first full yoga class today. :)

(!) Seems to work fine without TokeParser.

See attached admreqrm.pl.txt

(!) [Jimmy]

...............

for ( grep /^senderaction-/, split /[ \n"']/, $mech -> content() ){

...............

I only saw the 'grep' feature for the first time a few days ago (in TAG, IIRC).

(!) Eeep!

(!) [Jimmy] Well, that's the entertaining thing about Perl; it's a "language" language. I'm still at the tourist stage, but I'm thinking of moving :)
I prefer using HTML::TokeParser::Simple because of the [http://linuxgazette.net/108/misc/oregan/tp.pl.txt Google script] I wrote, which formats the HTML differently depending on the client.

(!) Ah.

(!) [Jimmy] ...though it beats me why I didn't just change the browser string to pretend to be Mozilla. I suppose I just like the idea that if someone did change the UA string, the regex would still work.
(!) [Jimmy] Heh.

...............

    print "Deleting $_\n";
    # 'uniq' action happens because hashes possess the Buddha nature

...............

(!) :) 'Tis true, though.

(!) [Jimmy] Isn't WWW::Mechanize neat?

(!) Yep - I've wondered about how to do this kind of thing in the past, and it's impressive just how easy WWW::Mechanize makes it.

(!) [Jimmy] I was wondering how to deal with cookies -- "Wow! Cookies are free!". It rocks. I still don't trust myself enough to automate my maintenance payments though.

(!) You could always have it pause and display the setup for final approval before you actually commit.

(!) [Jimmy] Heh. It's a bit much for something I only have to do once a fortnight
I later asked Ben if he'd mind me passing this thread for use in TAG -- Jimmy

(!) Fine by me, Jimmy. I enjoyed the two of us cooperating to make the beast behave, anyway.

(!) [Jimmy] So did I. I really enjoyed that "oh no, it's not..." moment when I sent you the second regex (that was typed with my paternal "I know you're up to no good" squint).

(!) I already abstracted the password string into a variable def at the top of the script, and did a little more cleanup before sending it to Rick. Latest version appended.

See attached admreqrm-2.pl.txt

(!) [Jimmy] If you're willing to put yourself through my debugging process again, it should be easy add an option to grab the content instead of deleting.

(!) Heck, I've read the module docs by now. That's how I finally hammered my end of it into shape. Actually, it looks like you should be able to do the whole task with WWW::Mechanize (TokeParser is obviously needed simply for its regex capabilities) - that might be worth looking at before we go ahead and pub the results.

(!) [Jimmy] I was thinking that too. The doubt that was lurking in the back of my mind came out of hiding: it'd need something extra -- an option to pass message ids to not delete. (Not a problem, it's just the nagging of the "you're forgetting something" thought).

(!) Something like this, you mean?

delete_this_id() unless grep /$id/, @keep;

Take a look at the regex-based script version I sent you; it would be easy enough to tweak.

(!) [Jimmy] Something like that. Can 'unless' be followed by braces, or would it need a 'do'?

(!) "do" is pretty special in Perl, and doesn't have anything to do with conditionals (although it can be used with loops.) Sure, you can have a statement block after "unless" (it's just syntactic sugar for "!if"); what you can't have is an "elseunless". :)

(!) [Jimmy] Do widzenia! (there's this cute Polish girl...)

This page edited and maintained by the Editors of Linux Gazette
HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/


Each TAG thread Copyright © its authors, 2005

Published in issue 110 of Linux Gazette January 2005

<-- 2c Tips | TAG Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Knowledge Base | News Bytes -->
Tux