develooper Front page | perl.perl5.porters | Postings from February 2012

Re: RFC & PROPOSAL: add perlunicook.pod to std docset

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
February 29, 2012 11:58
Subject:
Re: RFC & PROPOSAL: add perlunicook.pod to std docset
Message ID:
14983.1330545471@chthon
And still more bugs.  If you run pod2html on a podfile that
has codepoints in the 128-255 range, such as pod/perl5158delta.pod,
then you get illegal output:

    <meta http-equiv="content-type" content="text/html; charset=utf-8" />

    ...


    <p>Abhijit Menon-Sen, Alan Haggai Alavi, Alexandr Ciornii, Andy Dougherty, Brian Fraser, Chris &#39;BinGOs&#39; Williams, Craig A. Berry, Darin McBride, Dave Rolsky, David Golden, David Leadbeater, David Mitchell, Dominic Hargreaves, Eric Brine, Father Chrysostomos, Florian Ragwitz, H.Merijn Brand, Juerd Waalboer, Karl Williamson, Leon Timmermans, Marc Green, Max Maischein, Nicholas Clark, Paul Evans, Rafael Garcia-Suarez, Rainer Tammer, Reini Urban, Ricardo Signes, Robin Barker, Shlomi Fish, Steffen M?ller, Todd Rinaldo, Tony Cook, Yves Orton, Zefram, ?var Arnfj?r? Bjarmason.</p>

It’s emitting raw bytes but claiming they're UTF-8.  They aren't.

    $ blead -C0 utils/pod2html < pod/perl5158delta.pod > /tmp/foo.html

    $ perl -C0 -S tcgrep '\P{ASCII}' /tmp/foo.html |uniquote -v
    uniquote: read failure: utf8 "\xFC" does not map to Unicode at standard input line 1
    Exit 1

    $ perl -C0 -S tcgrep '\P{ASCII}' /tmp/foo.html | uniquote -b
    <p>Abhijit Menon-Sen, Alan Haggai Alavi, Alexandr Ciornii, Andy Dougherty, Brian Fraser, Chris &#39;BinGOs&#39; Williams, Craig A.
    Berry, Darin McBride, Dave Rolsky, David Golden, David Leadbeater, David Mitchell, Dominic Hargreaves, Eric Brine, Father Chrysostomos,
    Florian Ragwitz, H.Merijn Brand, Juerd Waalboer, Karl Williamson, Leon Timmermans, Marc Green, Max Maischein, Nicholas Clark, Paul
    Evans, Rafael Garcia-Suarez, Rainer Tammer, Reini Urban, Ricardo Signes, Robin Barker, Shlomi Fish, Steffen M\xFCller, Todd Rinaldo,
    Tony Cook, Yves Orton, Zefram, \xC6var Arnfj\xF6r\xF0 Bjarmason.</p>

You can't do that if you claim your charset is utf-8 in the meta http-equiv.

--tom

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About