develooper Front page | perl.perl5.porters | Postings from March 2012

[perl #112140] Pod::HTML should use a proper Unicode-aware definition of "word character"

Thread Next
Nicholas Clark
March 30, 2012 08:54
[perl #112140] Pod::HTML should use a proper Unicode-aware definition of "word character"
Message ID:
# New Ticket Created by  Nicholas Clark 
# Please include the string:  [perl #112140]
# in the subject line of all future correspondence about this issue. 
# <URL: >

Pod::Html has this:

    use locale; # make \w work right in non-ASCII lands

It was added in 1998 by this commit:

commit 3ec0728814aeaba716081748626d6940892a1796
Author: Fyodor Krasnov <>
Date:   Tue Nov 24 22:00:36 1998 +0300

    Pod::Html and Pod::Text were not locale-savvy:
    for example in =head1 all non-ASCII-\w-runs were
    turned into underscores in NAME tags.  This could
    result in several NAME tags becoming identical.
    Reported by:
    Subject: pod2html vs Russian Characters
    Message-Id: <>
    p4raw-id: //depot/cfgperl@2435

The code referenced is this:

# similar to htmlify, but turns non-alphanumerics into underscores
sub anchorify {
    my ($anchor) = @_;
    $anchor = htmlify($anchor);
    $anchor =~ s/\W/_/g;
    return $anchor;

At first glance it would seem better to replace that \W with a POSIX character
class or Unicode property that reliably expresses the intent.

However, with the refactor to use Pod::Simple::XHTML &anchorify is no longer
used by any code within Pod::Html, and the only external user on CPAN* seems
to be installhtml. Hence it's not clear if a better plan is to deprecate the
function. (And similarly htmlify, as it's unused)

Nicholas Clark

* There are several copies and derivatives of Pod::HTML on CPAN - I couldn't
  spot anything using Pod::HTML::anchorify

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About