develooper Front page | perl.beginners | Postings from December 2002

Re: file into memory

Thread Previous | Thread Next
From:
Paul Johnson
Date:
December 28, 2002 16:38
Subject:
Re: file into memory
Message ID:
20021229003838.GC1072@pjcj.net
On Tue, Dec 24, 2002 at 10:44:29AM -0800, R. Joseph Newton wrote:

> > > would like to suck the whole file into memory and processing
> > > each line from
> > > there.

Assuming we are still talking about the line as given:

@xx = <FILE>;

> Bob Showalter wrote:

> > I'm not sure what that accomplishes for you, but...
> 
> Really?  In that case, you may be vastly underestimating the cost of
> I/O on performance.  Bear in mind that memory has a responce time in
> the 10- ns range, while disk accesses range in the milliseconds, while
> your process goes into the wait queue.  There is every reason in the
> world to seek to bring in the maximum usable data with each access.

But there are also some reasons not to.  Specifically, if you program
runs fast enough and well enough, and trying to bring in more data at
once makes it harder to program or understand or maintain, then I would
suggest that would be counter productive.

> Remember: while your logical file pointer may remain fixed, patiently
> awaiting your next get, the system has moved on, served other users,
> and may have moved the physical RW head many tracks away.  I think
> it's a very good rule of thumb to stick to one I/O channel at a time,
> and to do no more processing during access than is absolutely
> necessary to ensure data integrity.

In the absence of other constraints that seems reasonable.  However, I
see two problems.  Firstly, there is a lot of C code behind the simple
line "@xx = <FILE>;".  There may be more interrupts processing a line at
a time (or there might not), but slurping does not guarantee no
interrupts.

The second problem is more serious.  You may have a very big file to
slurp in.  In any case, you will need spare RAM to match the size of
your file, plus a bit for perl.  This RAM might not be available and you
will then require the use of swap space.  At this point your solution
probably has worse characteristics than a line by line approach, both
for itself and for other processes.

> Thrashing does not make for good programming.

Right.  But I see more scope for thrashing with a more memory intensive
solution.

Of course, all this is heavily dependent on the program you are running,
the other processes running, the hardware, its configuration, the OS
etc.  These issues are rarely black and white.

-- 
Paul Johnson - paul@pjcj.net
http://www.pjcj.net

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About