develooper Front page | perl.beginners | Postings from August 2009

Re: HTML::TreeBuilder encode symbols as html entities

Thread Previous
Roman Makurin
August 14, 2009 06:58
Re: HTML::TreeBuilder encode symbols as html entities
Message ID:
On Fri, Aug 14, 2009 at 5:35 PM, Shawn H. Corey<> wrote:
> Roman Makurin wrote:
>> dump result is html encoded entities:
>> <h4> @
>>  <a class="a01" href="hidden_url" rel="bookmark"
>> title="&#x421;&#x441;&#x44B;&#x43B;&#x43A;&#x430; ">@
>> all html entities are valid unicode code points of symbols. But why
>> HTML::TreeBuilder convert symbols to entities ?
> Because some browsers do not understand Unicode.  Or they didn't.
>> If I just do
>> print $content, $/;
>> everything is ok, all symbols are symbols not html encoded entities.
> Yes, this output is to your screen, not to a browser, so it's encoding in
> way that would make it readable.

I used such scheme with lots of utf8 encoded pages and problems arise
only with this page. Why in one cases HTML::TreeBuilder produces
human readable output and in others not ?

I really dont understand it :(

> --
> Just my 0.00000002 million dollars worth,
>  Shawn
> Programming is as much about organization and communication
> as it is about coding.
> I like Perl; it's the only language where you can bless your
> thingy.

If you think of MS-DOS as mono, and Windows as stereo,
 then Linux is Dolby Digital and all the music is free...

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About