Front page | perl.perl5.porters |
Postings from March 2000
Re: CPAN.org stats
From: Roland Giersig
March 23, 2000 05:07
Re: CPAN.org stats
Message ID: 38DA16EE.2E238CEC@alcatel.at
Ask Bjoern Hansen wrote:
> On Wed, 22 Mar 2000, Matt Sergeant wrote:
> > > The cpan.org (+ftp.perl.org etc) ftp/web server alone is throwing more
> > > than 2.5GB of Perl modules and source code out a day. (average over the
> > > last ~3 weeks)
> > Might be really cool to get a breakdown of stats for module authors...
> yes, but it's not that simple[tm].
Hmm, but doesn't sound Too Difficult[tm] either.
> People mirroring everything also fetches the never used foobar-0.1 module.
But you can find out if somebody mirrors everything, no? A complete
download from the same IP is definitely detectable (detectible? sp?).
> A module updated 3 times in a month gets downloaded 30 times. Another
> module doing the same thing haven't been updated for 2 months but got
> downloaded 12 times. Which one is more popular?
That's in the eye of the beholder [another tm]. Just gather
and publish raw data, leave the interpretation to somebody else.
> A module from a popular Bundle (say Bundle::CPAN) gets downloaded a
> lot, but might never be used (CPAN::WAIT maybe).
This is trickier, but again, correlating modules and timestamps
should be able to detect a Bundle download, especially when
-MCPAN is used to install it...
> We can only get logs from some mirrors. There could be a big difference in
> how the different mirrors are used. (ftp.perl.org probably have more
> people mirroring everything than most other sites, funet too). There's a
> lot of private mirrors too.
Well, asking shouldn't hurt. I guess at least the official CPAN
mirrors would gladly provide that excerpt from their logs, or they
wouldn't be mirroring CPAN, would they?
> Modules for handling different character sets are probably more popular
> with the mirrors in Europe and Asia.
This is an interesting conjecture that could be proven by
having such statistics. It's not an issue, IMHO.
> The CPAN site referenced from search.cpan.org might have a larger number
> of downloads from users new to Perl. (assuming that more experienced users
> will use the CPAN module).
Again, not an issue, but a (dis)provable outcome.
> Point being that it's hard to make any reasonable numbers and if you do
> that, it will still be easy to read all sorts of stuff that is not there
> into them.
But that's just lies, damned lies and statistics[tm]. And there
are several things that might be ineresting or just fun to
watch. For example, I'd really like to see how long it takes
for the perl-5.6.0-release news to travel around. This should
be quite evident when viewing a hits vs. time graph for that
> But if someone wants to be a CPAN statistican and get logs from around the
> world I'll be happy to provide all the logs from my site, cpu power and
> diskspace to crunch the numbers and whatnot. :)
Hmm, I cannot give a full commitment right now, but I'd
really love to dig into that in my sparse spare time.
So maybe someone can send me some excerpts from their
CPAN logs and I'll see what I can come up with (promising
PS: Ahem, in hindsight, please contact me if you have some logs
for me and maybe we can arrange some anon ftp. I'd rather not
have my mailbox split with gigs of logfiles... ;-)