develooper Front page | perl.perl5.porters | Postings from October 2011

Re: The "Unicode Bug"

Thread Previous | Thread Next
From:
Mons Anderson
Date:
October 17, 2011 07:33
Subject:
Re: The "Unicode Bug"
Message ID:
201110171833.56223.inthrax@gmail.com
On Monday 17 October 2011 17:43:13 Tom Christiansen wrote:
> > Ok, I'll try to explain.
> > I write xml parser.
> > It should parse byte streams.
>
> Ah, so you panic on code points that are larger than 255?  That's not
> very friendly.  However can you know how to interpret those bytes?
>
> --tom

No, I don't panic.
Everything correctly works with either flagged and unflagged salars, whose 
unicode characters greater than 255.

Problem, from point of Eric's view, is with downgraded scalars with values 
between 7f-ff
With those upgraded, everything is ok, since their byte buffer seems to be 
correct utf-8 sequence for corresponding chars.

Because downgraded \xb2 (\262 without flag) equals in perl context to upgraded 
\xb2 (\302\262 + UTF flag internally) Eric says that parser must handle them 
equally.


-- 
Vladimir Perepelitsa aka Mons Anderson
<inthrax@gmail.com> / #99779956

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About