develooper Front page | perl.perl5.porters | Postings from March 2007

Re: [PATCH] lib/Pod/Html.pm plus a funky UT8-8 regex bug

Thread Previous | Thread Next
From:
demerphq
Date:
March 21, 2007 04:27
Subject:
Re: [PATCH] lib/Pod/Html.pm plus a funky UT8-8 regex bug
Message ID:
9b18b3110703210426n3bc125afk7a3764d45f5b149f@mail.gmail.com
On 3/21/07, Jarkko Hietaniemi <jhi@iki.fi> wrote:
> The attached patch fixes (or at last papers over) the htmlview.t failure
> that was recently introduced by change #30584 [1] (or rather, a bug that
> was unearthed by #30584).  The failure was seen at least in Tru64 and
> HP-UX and only under UTF-8 locales [2], [3].  The failure is (at least
> in Tru64, I haven't seen the details in HP-UX) that two of the html item
> anchors don't turn out as expected:
>
> # ! <li><strong><a name="mat" class="item">Mat&lt;!&gt;</a></strong>
> # ! <li><strong><a name="mat___" class="item">Mat&lt;!&gt;</a></strong>
>
> # ! <li><strong><a name="mat2" class="item">Mat</a></strong>
> # ! <li><strong><a name="mat" class="item">Mat</a></strong>
>
> The patch doesn't fix the bug, it just changes the regex so that the bug
> is not hit.  The bug itself requires demerphq :-)  I tried looking at
> whether the [[:punct:]] would be different under locales "C" and
> "fi_FI.UTF-8", but not that easy -- the [[:punct:]] are identical.
> (Also note how in the second case '2' gets lost.)
>
> NOTE that perl -C is *not* used, and Pod::Html does no utf8 stuff,
> and the $text in fragment_id_readable() does *not* have UTF8 bit on,
> but still the locale being a UTF-8 locale changes how things match.
>
> Attached are my best attempts at finding what is different, namely
> the re debug traces from inside the fragement_id_readable() $text
> substitute statements, one with locale "C" and one with a UTF-8 locale
> when matching with $text as 'Mat<!>':

I dont see how to replicate these results to investigate further.

Could you help me out by giving me the debug output from:

  'Mat<!>'=~/[[:punct:]\s]+/

under both cases please?

and or

$str='Mat<!>';
$str=~s/[[:punct:]\s]+//g;

under use locale and not as well?

I cant see any reason that this doesnt work as expected. And when i
try it here it does work as expected. :-()

Cheers,
Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About