develooper Front page | perl.perl5.porters | Postings from January 2005

Storable, XML::LibXML, and caching large DOM trees

From:
Douglas Webb
Date:
January 4, 2005 10:16
Subject:
Storable, XML::LibXML, and caching large DOM trees
Message ID:
58A1E8BE7D402C4FABD75BB6D08E0C021E2920@mx2003nyc.wkhmr.loc
Hello, and thank you in advance for any assistance you can provide me.
 
I've got a simple problem: I've got a 42MB xml document which I need to
parse with XML::LibXML and transform with XML::LibXSLT in two seconds or
less, without using excessive memory. (I'm delivering a web page, the server
already has the most memory it can have, and there may be concurrent
requests.)
 
Ideally, I need to do this without changing the xml file (the content is not
under my control.) 
Caching the DOM tree in memory isn't an option; this is just one file out of
hundreds that I need to handle. (The average size is around 1MB; other than
this file, the next largest is 10MB.)
 
My httpd processes are using up to 500MB of ram after a number of requests.
In a test script, parsing this file uses about 160MB of ram, which is in
line with the libxml2 documentation. I'm assuming that most of the extra ram
in my live environment is being used by libxslt.
 
What I'd like to do is parse the document once, then use Storable to dump it
to disk. 
I tried this:
 
my $doc = $parser->parse_file('xml');
nstore $doc, 'cache';
 
But all I got was the perl wrapper object, not the real DOM tree, which
presumably is held internally by libxml2.
 
Is there a way to cache the DOM tree, or am I out of luck? I didn't see
anything in the libxml2 documentation about this; just a suggestion to use
the streaming interface for large files (which I don't think would allow me
to do the xslt transform, even if the perl binding supported that
interface.)
 
Thanks for your help.
Douglas Webb
 
 


This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to which they are addressed.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and delete
this e-mail from your system. If you are not the intended recipient you are
notified that disclosing, copying, forwarding or otherwise distributing or
taking any action in reliance on the contents of this information is
strictly prohibited. 





nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About