Front page | perl.perl5.porters |
Postings from October 2003
Re: new slurp module
Thread Previous
|
Thread Next
From:
Abigail
Date:
October 23, 2003 05:56
Subject:
Re: new slurp module
Message ID:
20031023125559.GA18577@abigail.nl
On Thu, Oct 23, 2003 at 05:45:05AM -0400, Uri Guttman wrote:
> >>>>> "A" == Abigail <abigail@abigail.nl> writes:
>
> A> What's ugly or not is very subjective. The speed argument doesn't
> A> quite convince me. How often do programs slurp in lots of files?
>
> template systems, config files, language source, etc. many types of
> files are slurped and sometimes lots of them. as i write in the article,
> slurping and then munging/parsing/whatever on the whole file can be much
> faster than classic line by line. so the speed does matter and why not
> make it as fast as possible since it is a module that could be called
> often.
I assume you were comparing File::Slurps methods with other methods
that slurp in the whole file. And as the following benchmark suggests
are the other idioms to slurp in an entire file faster (with the exception
of using `cat`):
Running test with 1 bytes
Rate cat slurp do open sysread
cat 745/s -- -97% -98% -98% -99%
slurp 24139/s 3138% -- -32% -32% -62%
do 35275/s 4632% 46% -- -1% -45%
open 35759/s 4697% 48% 1% -- -44%
sysread 64056/s 8493% 165% 82% 79% --
Running test with 10 bytes
Rate cat slurp do open sysread
cat 623/s -- -97% -98% -98% -99%
slurp 23420/s 3657% -- -33% -35% -62%
do 34825/s 5486% 49% -- -3% -44%
open 35896/s 5658% 53% 3% -- -42%
sysread 62029/s 9850% 165% 78% 73% --
Running test with 100 bytes
Rate cat slurp do open sysread
cat 1115/s -- -95% -97% -97% -98%
slurp 23332/s 1992% -- -33% -34% -62%
do 34966/s 3035% 50% -- -1% -42%
open 35360/s 3070% 52% 1% -- -42%
sysread 60755/s 5347% 160% 74% 72% --
Running test with 1000 bytes
Rate cat slurp do open sysread
cat 1627/s -- -93% -95% -95% -97%
slurp 23354/s 1336% -- -30% -32% -61%
do 33334/s 1949% 43% -- -3% -45%
open 34494/s 2021% 48% 3% -- -43%
sysread 60625/s 3627% 160% 82% 76% --
Running test with 10000 bytes
Rate cat slurp do open sysread
cat 1013/s -- -95% -96% -96% -98%
slurp 20773/s 1950% -- -22% -28% -62%
do 26700/s 2535% 29% -- -7% -51%
open 28789/s 2741% 39% 8% -- -48%
sysread 54929/s 5320% 164% 106% 91% --
Running test with 1000000 bytes
Rate cat slurp do open sysread
cat 175/s -- -10% -19% -59% -59%
slurp 195/s 12% -- -10% -55% -55%
do 217/s 24% 11% -- -49% -50%
open 430/s 146% 120% 98% -- -0%
sysread 430/s 146% 121% 98% 0% --
Running test with 10000000 bytes
Rate cat slurp do sysread open
cat 20.2/s -- -3% -11% -55% -56%
slurp 20.7/s 3% -- -9% -54% -54%
do 22.8/s 13% 10% -- -50% -50%
sysread 45.2/s 124% 118% 99% -- -0%
open 45.5/s 125% 119% 100% 0% --
This is the program that I used to create the figures above:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw /cmpthese/;
use File::Slurp;
# Prepare some files.
my @sizes = (1, 10, 100, 1_000, 10_000, 1_000_000, 10_000_000);
my $base = "/tmp/data";
foreach my $size (@sizes) {
my $file = "$base.$size";
open my $fh => "> $file" or die;
print $fh " " x $size;
close $fh or die;
}
foreach my $size (@sizes) {
our ($r1, $r2, $r3, $r4, $r5);
our $file = "$base.$size";
print "Running test with $size bytes\n";
cmpthese -10 => {
slurp => '$::r1 = read_file $::file;',
do => '$::r2 = do {local (@ARGV, $/) = $::file; <>};',
open => 'open my $fh => $::file or die; undef $/; $::r3 = <$fh>;',
sysread => 'open my $fh => $::file or die;
sysread $fh => $::r4, -s $::file;',
cat => '$::r5 = `cat $::file`;',
};
die '$r1 ne $r2' if $r1 ne $r2;
die '$r2 ne $r3' if $r2 ne $r3;
die '$r3 ne $r4' if $r3 ne $r4;
die '$r4 ne $r5' if $r4 ne $r5;
die 'Wrong size' if length ($r1) != $size;
print "\n";
}
END {unlink map {"$base.$_"} @sizes}
__END__
Frankly, I don't see much reason for a File::Slurp addition to the
core. The current idioms to slurp in a whole file are small (in chars),
and faster than File::Slurp. Granted, File::Slurp has a write_file
method, but I think reading entire files at once is much more common
than writing them.
> the reason to put it in core is to have a standard (not just cpan)
> module to do slurping. it simplifies the operation, makes it more
> maintainable (no multiple idioms to remember) and is faster. many other
> modules are in core with less than that.
You don't have to remember multiple idioms, one is enough. As you can
see, the current idioms are short enough to be not significant harder
to remember than 'use File::Slurp; $text = read_file $file'. Unless you
mean you have to remember multiple idioms because you are a maintainer
and you have to maintain code written by different people, using different
idioms. In that case, the last thing you want is to have to remember yet
another idiom (the current idioms don't go away).
Abigail
Thread Previous
|
Thread Next