develooper Front page | perl.libwww | Postings from January 2001

Re: LWP Automated Form Submission - Question

Thread Previous | Thread Next
From:
Sean M. Burke
Date:
January 15, 2001 14:25
Subject:
Re: LWP Automated Form Submission - Question
Message ID:
3.0.6.32.20010115152443.007bae50@mail.spinn.net
At 08:43 AM 2001-01-15 -0600, Chong, Arthur wrote:
>
>I am trying to automate URL submissions into Search Engines.
>The format they seem to accept is of GET Forms, with different parameters.
>[...]
>sub get_urls {
>    
>    %url = ( 
>       altavista => { 
>          lurl =>
>           "http://add-url.altavista.digital.com/cgi-bin/newurl",
>          param => [ad => 1, 
>                     q  => $in_url ]
>         },
>[...]

Incidentally, I once did such a thing.

And there's two things I found that may be of interest to you and others:

1) I also added something that would consider the returned content and, for
search engines where this was possible, throw an alert if the search engine
said that it tried accessing the web page whose URL you'd submitted, but
found it inaccessible.  The way this was done varied from engine to engine,
but it involved things like
  freak_out() unless $ret->content =~ m<Thank you!>;
or
  freak_out() if $ret->content =~ m<Error:>


2) I dimly remember that one search engine actually seemed to care about
the order of the form variables, and/or what characters in them were actually
%-encoded.  I don't remember which engine, nor do I remember the details of
the %-encoding thing, but the upshot was:
a) I couldn't store the form parameters as:
          param => { ad => 1, q  => $in_url }
(which is what I'm normally used to, since usually who cares about the
order?), but instead had to do what you do:
          param => [ ad => 1, q  => $in_url ]
b) I had to do something specifying what got %-encoded as the GET query was
being made.  Can't remember how or what, and this may have been to cater to
a search engine submissions-accepter that no longer exists, or has changed
its format to not caring anymore.  This was years ago.


(And now a digression, from the "Don't Get Me Started" file:)

I have occasionally considered digging out this old code, prettying it up,
and making out of it a CPAN-published module-suite such that you'd call it as:

  use Vroomvroom;  # or whatever

which would go looking for what-all engines it knew how to submit to (each
a different module-file, say), and add their names to @Vroomvroom::engines.
 Then one could do something like:

  $Vroomvroom::contact_email = 'mojojojo@evilmonkeys.int';
  foreach my $u (@my_urls) {
    foreach my $e (@Vroomvroom::engines) {
      print
       Vroomvroom::submit($u, $e)
        ? "okay on $e submission of $u\n"
        : "nogo on $e submission of $u: $Vroomvroom::ERROR\n"
      ;        
    }
  }

But I decided not to, for two reasons:
1) It sounds like real work maintaining such a thing for an indefinite
number of engine submission URLs.  And, in a very profound sense, I don't
think I could be made to care about whether they'd work; notably, I
wouldn't really notice if they broke.

2) So far, for the dozens of modules I've put in CPAN, I've gotten nothing
but intelligent email about them -- intelligent questions of varying
degrees of familiarity with my documentation, intelligent suggestions for
patches, etc.

But I have a horrible creeping feeling that if I wrote a module such as
I've described, I would discover how /stupid/ email can be.  In my
occasional contacts with the world of "professional webmasters", I have
found that 
the more penny-ante they get, the more they are likely to evidence an
addled-brained obsession (as opposed to healthy concern) with the question
of IS MY SITE IN THE SEARCH ENGINES????.  And if I wrote a Vroomvroom
module, as above, I would get constant email to the effect of:

  hey mr burke i used yr CPAN modules VROOM VROOM and it said it
  submit my site to ALTABISTA and then i looked their for it
  and IT WASN"T THERE.  IS YOUR MODULE BREAKEN?
  your friend,
  Habip APONGAPONGA
  (SUPER WEBMASTER, psychobillyfrikout.com! psychobillyfrikout.com!)

[The names have been changed to protect the addled-brained]

As a demonstration of how addled it is possible for brains to get, I
recently had a conversation with such a superwebmaster who said that, "of
course, automated search engine submissions just don't work!".  I asked how
this hyoomon had arrived at that conclusion -- "Some research I did!".  I
will spare you the details of what passed for research in this entity's
mind, but the upshot was that she decided that indefinite kinds of voodoo
and psychic jiu-jitsu were employed by the All-Knowing Lord High Masters Of
The Search Engines to distinguish URL-submitting HTTP sessions that come
from a real person feeding a request by hand into a browser, from ones that
come from anything else.

This raised questions in my mind, like: does it jinx it if you paste the
URL into the submission form, instead of typing it in?  Does similar
jinxing result from having several submission windows open at once?  Do you
have to light a candle for each submission, and intone "Om Mani Lycos Hum"?

But I ended up just saying that I would be happy if the creature in
question could actually show me test and control groups of URLs "manually"
submitted versus ones "automatedly" submitted, and demonstration that the
former were in some search engine, while the latter were missing.  She
didn't seem interested in such a formality as reproducable proof.

Altho I suppose I could try this myself.

--
Sean M. Burke  sburke@cpan.org  http://www.spinn.net/~sburke/


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About