develooper Front page | perl.libwww | Postings from August 2003

TreeBuilder cgi memory problems

Thread Next
August 7, 2003 12:34
TreeBuilder cgi memory problems
Message ID:

Having a potential TreeBuilder memory problem when using it to parse
through a large HTML table (> 2K rows) where the memory allocation grows to
about 20M on my server and never goes down even after finishing with the
HTML and TreeBuilder structures. The Perl script runs as a CGI and Apache
gives up after awhile with the following line in the error logs - "Out of
Memory !!" 

I've used the Solaris "top" utility and put some prints stmts in to monitor
the cgi running. The memory allocation grows to 20M and never goes below
that even after leaving the TreeBuilder portion of the CGI and after
cleaning up the structures. I do build a couple Arrays and Hash tables
during the TreeBuilder processing but get rid of those (except for one) as
well as getting rid of the original HTML being parsed by TreeBuilder.

Basically the cgi obtains a remote HTML page and uses TreeBuilder to parse
info from definitions in each row of a table in the HTML. It saves some of
the info it finds (after doing some data/time sorting of the data) then
releases the TreeBuilder structure and original HTML that was read in.     

Here's a snipit of the perl cgi (any assistance is most appreciated) -  

$MainPage = get($myURL);          # Get the main page 
$MainPage =~ s/\"//g;             # Double quotes to nothing
$MainPage =~ s/\'//g;             # Single quotes to nothing
$MainPage =~ s/\s+/ /g;           # Compress spaces/tabs

$tree = HTML::TreeBuilder->new;     # Create empty TreeBuilder tree
$tree->parse($MainPage);               # TreeBuilder real work done here
 foreach $row ($tree->find_by_tag_name('tr')){   # Search in each table row
    foreach $child ($row->content_list) {
       if (ref $child and $child->tag eq 'td') {    # Look at td's
    if ($Entries[0] !~ /[0-9]/) {     # Exit if first td isn't an item num 
    push(@InActive, $Entries[0]);

    $item = "$Entries[0] $Entries[4]";
    $end_time = $Entries[2] ;
    ($end_time, $junk) = split(/$PDT/, $end_time, 2);
    ($mon, $RestOfTime) = split(/-/, $end_time, 2);
    $mon = $Months{$mon};
    ($mday, $RestOfTime) = split(/-/, $RestOfTime, 2);
    $year = substr($RestOfTime,0,2);
    $year = (($year + 2000) - 1900); 
    $RestOfTime = substr($RestOfTime,-8,8);
    ($hours, $RestOfTime) = split(/:/, $RestOfTime, 2);
    ($min, $sec) = split(/:/, $RestOfTime, 2);
    $end_time = timegm($sec,$min,$hours,$mday,$mon,$year);    # Convert the
date/time into a GMT string for later sorting 

    while ($Ending_Times{$end_time}) {   # Bump end_time till it's unique
        $end_time += 1;
    $Ending_Times{$end_time} = "$item" ;

$MainPage = '';  # Recover some storage
$junk = '';      
$tree = $tree->delete;  # Now that we're done with it, we must destroy the
tree storage

foreach $key(sort(keys(%Ending_Times))) {
   push(@ItemList, $Ending_Times{$key});    # Put info into @Itemlist

%Ending_Times=();    # Recover some storage

mail2web - Check your email from the web at .

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About