develooper Front page | perl.perl5.porters | Postings from August 2008

Re: Re-gaining 'O's

Thread Previous | Thread Next
From:
H.Merijn Brand
Date:
August 25, 2008 22:58
Subject:
Re: Re-gaining 'O's
Message ID:
20080826075831.3224902f@pc09.procura.nl
On Mon, 25 Aug 2008 18:03:48 -0700, Russ Allbery <rra@stanford.edu>
wrote:

> "H.Merijn Brand" <h.m.brand@xs4all.nl> writes:
> 
> > We're now running in circles.
> >
> > I'm not arguing that that is not (or is) the correct solution, but I
> > want the CORE test suite to PASS for above combination. Currently all
> > smokes are spitting 'F' for FAIL, where in my explanation it should be
> > PASS.
> >
> > Can you make the CORE tests PASS without altering your POV on the
> > module itself?
> 
> ...I'm sorry, I had completely misunderstood your original message.
> Please ignore all my previous contributions to this thread; I was
> hopelessly confused.
> 
> Okay, I can now reproduce this.  What's happening is that the output from
> the script is being double-converted, so already Unicode output is being
> converted to Unicode again.  For some reason, Perl doesn't think that the
> string that Pod::Man is writing is already in Unicode and is therefore
> assuming that it's in a legacy character set and reconverting to Unicode
> on output to the file handle.
> 
> I think this is a Pod::Simple bug, since it's responsible for handling the
> character set of the input and handling processors UTF-8.  I think it's
> doing that but not tagging the strings properly so that Perl knows that
> they're UTF-8.  Looking at the Pod::Simple source, it apparently pays no
> attention to PERL_UNICODE and only realizes that its input is UTF-8 if
> there's a BOM in the file.  The following patch fixes the test case:

Hmm, not in my case: It makes that combination PASS, but breaks the
other tests, so I'll wait till the rest of the patch.

TEST for PERL_UNICODE=--undef-- LANG=C
1..7
ok 1
ok 2
not ok 3
Expected
========
.SH "BEYONCÉ"
.IX Header "BEYONCÉ"
Beyoncé!  Beyoncé!  Beyoncé!!
.PP
.Vb 3
\&    Beyoncé!  Beyoncé!
\&      Beyoncé!  Beyoncé!
\&        Beyoncé!  Beyoncé!
.Ve
.PP
Older versions did not convert Beyoncé in verbatim.

Output
======
.SH "BEYONC�
.IX Header "BEYONC�
Beyonc�  Beyonc�  Beyonc�!
.PP
.Vb 3
\&    Beyonc�  Beyonc�
\&      Beyonc�  Beyonc�
\&        Beyonc�  Beyonc�
.Ve
.PP
Older versions did not convert Beyonc�in verbatim.

ok 4
ok 5
ok 6
ok 7
TEST for PERL_UNICODE=--undef-- LANG=en_US.utf8
1..7
ok 1
ok 2
not ok 3
Expected
========
.SH "BEYONCÉ"
.IX Header "BEYONCÉ"
Beyoncé!  Beyoncé!  Beyoncé!!
.PP
.Vb 3
\&    Beyoncé!  Beyoncé!
\&      Beyoncé!  Beyoncé!
\&        Beyoncé!  Beyoncé!
.Ve
.PP
Older versions did not convert Beyoncé in verbatim.

Output
======
.SH "BEYONC�
.IX Header "BEYONC�
Beyonc�  Beyonc�  Beyonc�!
.PP
.Vb 3
\&    Beyonc�  Beyonc�
\&      Beyonc�  Beyonc�
\&        Beyonc�  Beyonc�
.Ve
.PP
Older versions did not convert Beyonc�in verbatim.

ok 4
ok 5
ok 6
ok 7
TEST for PERL_UNICODE= LANG=C
1..7
ok 1
ok 2
not ok 3
Expected
========
.SH "BEYONCÉ"
.IX Header "BEYONCÉ"
Beyoncé!  Beyoncé!  Beyoncé!!
.PP
.Vb 3
\&    Beyoncé!  Beyoncé!
\&      Beyoncé!  Beyoncé!
\&        Beyoncé!  Beyoncé!
.Ve
.PP
Older versions did not convert Beyoncé in verbatim.

Output
======
.SH "BEYONC�
.IX Header "BEYONC�
Beyonc�  Beyonc�  Beyonc�!
.PP
.Vb 3
\&    Beyonc�  Beyonc�
\&      Beyonc�  Beyonc�
\&        Beyonc�  Beyonc�
.Ve
.PP
Older versions did not convert Beyonc�in verbatim.

ok 4
ok 5
ok 6
ok 7
TEST for PERL_UNICODE= LANG=en_US.utf8
1..7
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7

> --- a/t/man-options.t
> +++ b/t/man-options.t
> @@ -91,6 +91,8 @@ __DATA__
>  ###
>  utf8 1
>  ###
> +=encoding utf-8
> +
>  =head1 BEYONCÉ
>  
>  Beyoncé!  Beyoncé!  Beyoncé!!
> 
> by telling Pod::Simple explicitly that the input is UTF-8.  perlpod does
> sort of imply that this is required:
> 
>    "=encoding encodingname"
>        This command is used for declaring the encoding of a document.
>        Most users won’t need this; but if your encoding isn’t US-ASCII or
>        Latin-1, then put a "=encoding encodingname" command early in the
>        document so that pod formatters will know how to decode the
>        document.
> 
> so this is a valid patch to use.  perlpodspec goes on at greater length:
> 
>    "=encoding encodingname"
>        This command, which should occur early in the document (at least
>        before any non-US-ASCII data!), declares that this document is
>        encoded in the encoding encodingname, which must be an encoding
>        name that Encoding recognizes.  (Encoding’s list of supported
>        encodings, in Encode::Supported, is useful here.)  If the Pod
>        parser cannot decode the declared encoding, it should emit a
>        warning and may abort parsing the document altogether.
> 
>        A document having more than one "=encoding" line should be
>        considered an error.  Pod processors may silently tolerate this if
>        the not-first "=encoding" lines are just duplicates of the first
>        one (e.g., if there’s a "=use utf8" line, and later on another
>        "=use utf8" line).  But Pod processors should complain if there are
>        contradictory "=encoding" lines in the same document (e.g., if
>        there is a "=encoding utf8" early in the document and "=encoding
>        big5" later).  Pod processors that recognize BOMs may also complain
>        if they see an "=encoding" line that contradicts the BOM (e.g., if
>        a document with a UTF-16LE BOM has an "=encoding shiftjis" line).
> 
> I think it's debatable whether this is the correct behavior for
> Pod::Simple; it seems to me that if PERL_UNICODE is set and we're in a
> UTF-8 locale, Pod::Simple should assume all input is Unicode, since that's
> kind of what that setting says.  But I will include the test case patch
> anyway in the next release of Pod::Man since given the current
> specification it's required for Unicode input to be recognized properly.
> 
> I'm very sorry for my fairly useless previous responses when I didn't
> understand what you were asking.

No need for apologies, lets go make perl better :)

-- 
H.Merijn Brand          Amsterdam Perl Mongers  http://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin.
http://mirrors.develooper.com/hpux/           http://www.test-smoke.org/
http://qa.perl.org      http://www.goldmark.org/jeff/stupid-disclaimers/

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About