develooper Front page | perl.beginners | Postings from February 2009

Re: processing large datafiles

Thread Previous
From:
Chas. Owens
Date:
February 17, 2009 13:06
Subject:
Re: processing large datafiles
Message ID:
58ce48dc0902171306s2c5ff2a5jc8fe827772df62c4@mail.gmail.com
On Tue, Feb 17, 2009 at 12:39, Jim Gibson <jimsgibson@gmail.com> wrote:
snip
>> open(MAP,"file.map") || die;
>
> It is better to use lexical variables for file handles, say why the open
> fails if it does not succeed, and use the low-precedence 'or' instead of
> '||':
>
> open( my $map, "file.map") or die("Can't open file.map: $!");
snip

If you are using the low precedence or, you don't need the
parenthesis; and you should always use the three argument version of
open instead of the two argument version to avoid possible code
injection and weird filename attacks:

open my $map, "<", "file.map"
    or die "could not open file.map: $!";

snip
>> if ($snp[0] =~ /Chromosome/) {next};
>
> The eq operator will be faster than a regular expression:
>
>    if( $type eq 'Chromosome' ) {
snip

These are not equivalent statements (the regex is not anchored).
Also, avoid making statements about efficiency without a benchmark to
back it up.  Regexes can be faster than string comparisons*.

snip
> close(OUT1) and die("Error writing to $out1: $!");
snip

That should be

close OUT1
    or die "could not close OUT1: $!";

The close function returns true if "returning true only if IO buffers
are successfully flushed and closes the system file descriptor".
Since or uses short-circuit logic, if the first statement is true it
doesn't execute the second statement.

* and here is the benchmark and the code used to generate the benchmark:
equal:
0
regex:
0
string of length 1
            Rate equal regex
equal 13435080/s    --  -44%
regex 23842620/s   77%    --
string of length 10
            Rate equal regex
equal 20579528/s    --   -5%
regex 21575636/s    5%    --
string of length 100
            Rate regex equal
regex 10922666/s    --  -23%
equal 14277994/s   31%    --
string of length 1000
            Rate regex equal
regex 14448882/s    --  -13%
equal 16657660/s   15%    --

#!/usr/bin/perl

use strict;
use warnings;

use List::Util;
use Benchmark;

my $s = "foo";

my %subs = (
	regex => sub {
		return $s =~ /^Chromosome$/ ? 1 : 0;
	},
	equal => sub {
		return $s eq "Chromosome" ? 1 : 0;
	},
);

for my $sub (sort keys %subs) {
	print "$sub:\n", $subs{$sub}->(), "\n";
}

for my $n (1, 10, 100, 1_000) {
	$s = rstr($n);
	print "string of length $n\n";
	Benchmark::cmpthese(-1, \%subs);
}

sub rstr {
	return join '', map { ("a" .. "z")[int rand 26] } 1 .. int rand shift
}


-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About