develooper Front page | perl.libwww | Postings from September 2001

memory leaking and URI

Thread Next
From:
Eduardo M. Cavalcanti
Date:
September 27, 2001 01:30
Subject:
memory leaking and URI
Message ID:
5.1.0.14.0.20010926191148.00aa9a90@mail.organox.com.br
Hi,
We have been using LWP::UserAgent , HTML::TreeBuilder, HTTP::Request (and 
dependencies of them) for doing web content retrieving and filtering. PERL 
and these modules have of been of greatest value for our task, thanks.

One of the scripts that we created started consuming a huge, absurd ammount 
of RAM. It continued after we optimized the creation and disposal of 
HTML::Treebuilder objects. We tracked the problem down to the URI 
bundle.   Substituting 3 files with ones from an old version solved the 
problem. The files are: Escape.pm, WithBase.pm, _server.pm. I donĀ“t know 
which is the old version, but the new one was URI 1.17.

Probably this directly related to the problem described and solved in
http://archive.develooper.com/libwww@perl.org/msg02782.html

Follows a dump of a comparison of files of the different versions of URI.

Please email if more detail is needed.

-------------------------------------------------------------
C:\works\uri_cmp>fc .\good\* .\bad\*
Comparing files .\GOOD\data.pm and .\BAD\data.pm
FC: no differences encountered

Comparing files .\GOOD\Escape.pm and .\BAD\Escape.pm
***** .\GOOD\Escape.pm
#
# $Id: Escape.pm,v 3.16 2000/08/16 18:45:23 gisle Exp $
#
***** .\BAD\Escape.pm
#
# $Id: Escape.pm,v 3.19 2001/08/24 17:25:43 gisle Exp $
#
*****

***** .\GOOD\Escape.pm
This module provides functions to escape and unescape URI strings as
defined by RFC 2396.  URIs consist of a restricted set of characters,
denoted as C<uric> in RFC 2396.  The restricted set of characters
***** .\BAD\Escape.pm
This module provides functions to escape and unescape URI strings as
defined by RFC 2396 (and updated by RFC 2732).
URIs consist of a restricted set of characters,
denoted as C<uric> in RFC 2396.  The restricted set of characters
*****

***** .\GOOD\Escape.pm
   "A" .. "Z", "a" .. "z", "0" .. "9",
   ";", "/", "?", ":", "@", "&", "=", "+", "$", ",",   # reserved
   "-", "_", ".", "!", "~", "*", "'", "(", ")"
***** .\BAD\Escape.pm
   "A" .. "Z", "a" .. "z", "0" .. "9",
   ";", "/", "?", ":", "@", "&", "=", "+", "$", ",", "[", "]",   # reserved
   "-", "_", ".", "!", "~", "*", "'", "(", ")"
*****

***** .\GOOD\Escape.pm
The default set of characters to be escaped is all those which are
I<not> part of the C<uric> character class shown above.

***** .\BAD\Escape.pm
The default set of characters to be escaped is all those which are
I<not> part of the C<uric> character class shown above as well as the
reserved characters.  I.e. the default is:

   "^A-Za-z0-9\-_.!~*'()"

*****

***** .\GOOD\Escape.pm

Copyright 1995-2000 Gisle Aas.

***** .\BAD\Escape.pm

Copyright 1995-2001 Gisle Aas.

*****

***** .\GOOD\Escape.pm
@EXPORT_OK = qw(%escapes);
$VERSION = sprintf("%d.%02d", q$Revision: 3.16 $ =~ /(\d+)\.(\d+)/);

***** .\BAD\Escape.pm
@EXPORT_OK = qw(%escapes);
$VERSION = sprintf("%d.%02d", q$Revision: 3.19 $ =~ /(\d+)\.(\d+)/);

*****

***** .\GOOD\Escape.pm
             # Because we can't compile the regex we fake it with a cached sub
             $subst{$patn} =
               eval "sub {\$_[0] =~ s/([$patn])/\$escapes{\$1}/g; }";
             Carp::croak("uri_escape: $@") if $@;
***** .\BAD\Escape.pm
             # Because we can't compile the regex we fake it with a cached sub
             (my $tmp = $patn) =~ s,/,\\/,g;
             $subst{$patn} =
               eval "sub {\$_[0] =~ s/([$tmp])/\$escapes{\$1}/g; }";
             Carp::croak("uri_escape: $@") if $@;
*****

***** .\GOOD\Escape.pm
     } else {
         # Default unsafe characters. (RFC 2396 ^uric)
         $text =~ s/([^;\/?:@&=+\$,A-Za-z0-9\-_.!~*'()])/$escapes{$1}/g;
     }
***** .\BAD\Escape.pm
     } else {
         # Default unsafe characters.  RFC 2732 ^(uric - reserved)
         $text =~ s/([^A-Za-z0-9\-_.!~*'()])/$escapes{$1}/g;
     }
*****

Comparing files .\GOOD\file.pm and .\BAD\file.pm
FC: no differences encountered

Comparing files .\GOOD\ftp.pm and .\BAD\ftp.pm
FC: no differences encountered

Comparing files .\GOOD\gopher.pm and .\BAD\gopher.pm
FC: no differences encountered

Comparing files .\GOOD\Heuristic.pm and .\BAD\Heuristic.pm
FC: no differences encountered

Comparing files .\GOOD\http.pm and .\BAD\http.pm
***** .\GOOD\http.pm
         }
         $$other =~ s/(%[0-9A-F]{2})/$unreserved_escape{$1} || $1/ge;
         $other->path("/") if $slash_path;
***** .\BAD\http.pm
         }
         $$other =~ s/(%[0-9A-F]{2})/exists $unreserved_escape{$1} ?
                                            $unreserved_escape{$1} : $1/ge;
         $other->path("/") if $slash_path;
*****

Comparing files .\GOOD\https.pm and .\BAD\https.pm
FC: no differences encountered

Comparing files .\GOOD\ldap.pm and .\BAD\ldap.pm
FC: no differences encountered

Comparing files .\GOOD\mailto.pm and .\BAD\mailto.pm
FC: no differences encountered

Comparing files .\GOOD\news.pm and .\BAD\news.pm
FC: no differences encountered

Comparing files .\GOOD\nntp.pm and .\BAD\nntp.pm
FC: no differences encountered

Comparing files .\GOOD\pop.pm and .\BAD\pop.pm
FC: no differences encountered

Comparing files .\GOOD\rlogin.pm and .\BAD\rlogin.pm
FC: no differences encountered

Comparing files .\GOOD\rsync.pm and .\BAD\rsync.pm
FC: no differences encountered

Comparing files .\GOOD\snews.pm and .\BAD\snews.pm
FC: no differences encountered

Comparing files .\GOOD\telnet.pm and .\BAD\telnet.pm
FC: no differences encountered

Comparing files .\GOOD\URL.pm and .\BAD\URL.pm
FC: no differences encountered

Comparing files .\GOOD\WithBase.pm and .\BAD\WithBase.pm
***** .\GOOD\WithBase.pm

sub base {
***** .\BAD\WithBase.pm

sub can {                                  # override UNIVERSAL::can
     my $self = shift;
     $self->SUPER::can(@_) || (
       ref($self)
       ? $self->[0]->can(@_)
       : undef
     )
}

sub base {
*****

***** .\GOOD\WithBase.pm
     if (@_) { # set
         my $new_base = @_;
         $new_base = $new_base->abs if ref($new_base);  # ensure absoluteness
***** .\BAD\WithBase.pm
     if (@_) { # set
         my $new_base = shift;
         $new_base = $new_base->abs if ref($new_base);  # ensure absoluteness
*****

***** .\GOOD\WithBase.pm
     my $self = shift;
     bless [$self->[0]->clone, $self->[0]], ref($self);
}
***** .\BAD\WithBase.pm
     my $self = shift;
     my $base = $self->[1];
     $base = $base->clone if ref($base);
     bless [$self->[0]->clone, $base], ref($self);
}
*****

Comparing files .\GOOD\_foreign.pm and .\BAD\_foreign.pm
FC: no differences encountered

Comparing files .\GOOD\_generic.pm and .\BAD\_generic.pm
FC: no differences encountered

Comparing files .\GOOD\_login.pm and .\BAD\_login.pm
FC: no differences encountered

Comparing files .\GOOD\_query.pm and .\BAD\_query.pm
FC: no differences encountered

Comparing files .\GOOD\_segment.pm and .\BAD\_segment.pm
FC: no differences encountered

Comparing files .\GOOD\_server.pm and .\BAD\_server.pm
***** .\GOOD\_server.pm
         $tmp = "" unless defined $tmp;
         my $ui;
         $ui = $1 if $tmp =~ s/^([^@]*@)//;
         $tmp =~ s/^[^:]*//;        # get rid of old host
         my $new = shift;
         if (defined $new) {
             $new =~ s/[@]/%40/g;   # protect @
             $tmp = ($new =~ /:/) ? $new : "$new$tmp";
         }
         $tmp = "$ui$tmp" if defined $ui;
         $self->authority($tmp);
     }
     return undef if !defined($old) || $old !~ /^(?:[^@]*@)?([^:]*)/;
     return uri_unescape($1);
}
***** .\BAD\_server.pm
         $tmp = "" unless defined $tmp;
         my $ui = ($tmp =~ /^([^@]*@)/) ? $1 : "";
         my $port = ($tmp =~ /(:\d+)$/) ? $1 : "";
         my $new = shift;
         $new = "" unless defined $new;
         if (length $new) {
             $new =~ s/[@]/%40/g;   # protect @
             $port = $1 if $new =~ s/(:\d+)$//;
         }
         $self->authority("$ui$new$port");
     }
     return undef unless defined $old;
     $old =~ s/^[^@]*@//;
     $old =~ s/:\d+$//;
     return uri_unescape($old);
}
*****

Comparing files .\GOOD\_userpass.pm and .\BAD\_userpass.pm
FC: no differences encountered


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About