develooper Front page | perl.beginners | Postings from August 2009

Re: HTML::TreeBuilder encode symbols as html entities

Thread Previous
From:
Roman Makurin
Date:
August 14, 2009 06:58
Subject:
Re: HTML::TreeBuilder encode symbols as html entities
Message ID:
46e5b4ee0908140657me7d5773k9b76a0054da335@mail.gmail.com
On Fri, Aug 14, 2009 at 5:35 PM, Shawn H. Corey<shawnhcorey@gmail.com> wrote:
> Roman Makurin wrote:
>>
>> dump result is html encoded entities:
>>
>> <h4> @0.1.5.1
>>  <a class="a01" href="hidden_url" rel="bookmark"
>> title="&#x421;&#x441;&#x44B;&#x43B;&#x43A;&#x430; ">@0.1.5.1.0
>>
>> all html entities are valid unicode code points of symbols. But why
>> HTML::TreeBuilder convert symbols to entities ?
>
> Because some browsers do not understand Unicode.  Or they didn't.
>
>>
>> If I just do
>> print $content, $/;
>> everything is ok, all symbols are symbols not html encoded entities.
>
> Yes, this output is to your screen, not to a browser, so it's encoding in
> way that would make it readable.
>

I used such scheme with lots of utf8 encoded pages and problems arise
only with this page. Why in one cases HTML::TreeBuilder produces
human readable output and in others not ?

I really dont understand it :(

>
> --
> Just my 0.00000002 million dollars worth,
>  Shawn
>
> Programming is as much about organization and communication
> as it is about coding.
>
> I like Perl; it's the only language where you can bless your
> thingy.
>



-- 
If you think of MS-DOS as mono, and Windows as stereo,
 then Linux is Dolby Digital and all the music is free...

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About