develooper Front page | perl.perl5.porters | Postings from March 2007

[perl #41685] [PATCH] v5.8.8 pod2html -- Add new option --prelink to treat <pre>..</pre> URLs

Thread Next
From:
Jari Aalto
Date:
March 3, 2007 14:43
Subject:
[perl #41685] [PATCH] v5.8.8 pod2html -- Add new option --prelink to treat <pre>..</pre> URLs
Message ID:
rt-3.6.HEAD-2051-1172946661-1068.41685-75-0@perl.org
# New Ticket Created by  Jari Aalto 
# Please include the string:  [perl #41685]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=41685 >



This is a bug report for perl from jaalto@cante.cante.net,
generated with the help of perlbug 1.35 running under perl v5.8.8.


-----------------------------------------------------------------
[Please enter your report here]

FOREWORD

The http://perldoc.perl.org/perlpod.html reads:

    Verbatim Paragraph 
    ======================

    Verbatim paragraphs are usually used for presenting a codeblock or
    other text which does not require any special parsing or
    formatting, and which shouldn't be wrapped.

    A verbatim paragraph is distinguished by having its first
    character be a space or a tab. (And commonly, all its lines begin
    with spaces and/or tabs.) It should be reproduced exactly, with
    tabs assumed to be on 8-column boundaries. There are no special
    formatting codes, so you can't italicize or anything like that. A
    \ means \, and nothing else.

The key here is "It should be reproduced exactly". 

THE PROBLEM

The current pod2html however tries to convert anything looking like
URLs inside verbatim paragraphs as clickable links. An example:

    =pod

    See this command line example:

        wget http://www.example.com/path/index.html

     =cut

It is surprising to see "www.example.com" made clickable.

SOLUTION

The verbatim paragraph is best treates "as is" with no 
automatic url conversions, because it mainly contains
code examples and all that.

The attached patch turn off link creation in <pre>...</pre> and adds
new option --prelink to turn on the old behavior where found URLs,
even in <pre>..</pre> are converted into links

Jari


[ /usr/share/perl/5.8.8/Pod/Html.pm ]

=== modified file 'Html.pm'
--- Html.pm     2007-03-03 17:02:49 +0000
+++ Html.pm     2007-03-03 18:22:12 +0000
@@ -143,6 +143,13 @@
 Specify the HTML file to create.  Output goes to STDOUT if no outfile
 is specified.

+=item pre-area-links
+
+    --prelink
+
+Read text in <pre>..<pre> area and convert all found URLs to <a
+href=...> links.
+
 =item podpath

     --podpath=name:...:name
@@ -221,7 +228,7 @@
 my($Dircache, $Itemcache);
 my @Begin_Stack;
 my @Libpods;
-my($Htmlroot, $Htmldir, $Htmlfile, $Htmlfileurl);
+my($Htmlroot, $Htmldir, $Htmlfile, $Htmlfileurl, $Prelink);
 my($Podfile, @Podpath, $Podroot);
 my $Css;

@@ -287,6 +294,7 @@
     $Quiet = 0;                        # not quiet by default
     $Verbose = 0;              # not verbose by default
     $Doindex = 1;              # non-zero if we should generate an index
+    $Prelink = 0;              # Treat <pre>..</pre> for URLsx
     $Backlink = '';            # text for "back to top" links
     $Listlevel = 0;            # current list depth
     @Listend = ();             # the text to use to end the list.
@@ -633,6 +641,7 @@
                    page names like those that appear in L<> links.
   --outfile      - filename for the resulting html file (output sent to
                    stdout by default).
+  --prelink      - Convert URLs in <pre>..</pre> to clickable links.
   --podpath      - colon-separated list of directories containing library
                    pods (empty by default).
   --podroot      - filesystem base directory from which all relative paths
@@ -651,7 +660,8 @@
 sub parse_command_line {
     my ($opt_backlink,$opt_cachedir,$opt_css,$opt_flush,$opt_header,$opt_help,
        $opt_htmldir,$opt_htmlroot,$opt_index,$opt_infile,$opt_libpods,
-       $opt_netscape,$opt_outfile,$opt_podpath,$opt_podroot,$opt_quiet,
+       $opt_netscape,$opt_outfile,$opt_prelink,
+        $opt_podpath,$opt_podroot,$opt_quiet,
        $opt_recurse,$opt_title,$opt_verbose,$opt_hiddendirs);

     unshift @ARGV, split ' ', $Config{pod2html} if $Config{pod2html};
@@ -670,6 +680,7 @@
                            'libpods=s'  => \$opt_libpods,
                            'netscape!'  => \$opt_netscape,
                            'outfile=s'  => \$opt_outfile,
+                           'prelink'    => \$opt_prelink,
                            'podpath=s'  => \$opt_podpath,
                            'podroot=s'  => \$opt_podroot,
                            'quiet!'     => \$opt_quiet,
@@ -695,6 +706,7 @@
     $Podfile  = $opt_infile   if defined $opt_infile;
     $HiddenDirs = $opt_hiddendirs if defined $opt_hiddendirs;
     $Htmlfile = $opt_outfile  if defined $opt_outfile;
+    $Prelink  = $opt_prelink  if defined $opt_prelink;
     $Podroot  = $opt_podroot  if defined $opt_podroot;
     $Quiet    = $opt_quiet    if defined $opt_quiet;
     $Recurse  = $opt_recurse  if defined $opt_recurse;
@@ -1284,6 +1296,55 @@
     pop( @Begin_Stack );
 }

+sub make_hrefs {
+    my $rest = shift;
+
+    # Look for embedded URLs and make them into links.  We don't
+    # relativize them since they are best left as the author intended.
+
+    my $urls = '(' . join ('|', qw{
+                http
+                telnet
+               mailto
+               news
+                gopher
+                file
+                wais
+                ftp
+            } )
+        . ')';
+
+    my $ltrs = '\w';
+    my $gunk = '/#~:.?+=&%@!\-';
+    my $punc = '.:!?\-;';
+    my $any  = "${ltrs}${gunk}${punc}";
+
+    $rest =~ s{
+       \b                      # start at word boundary
+       (                       # begin $1  {
+           $urls :             # need resource and a colon
+           (?!:)               # Ignore File::, among others.
+           [$any] +?           # followed by one or more of any valid
+                               #   character, but be conservative and
+                               #   take only what you need to....
+       )                       # end   $1  }
+       (?=
+           &quot; &gt;         # maybe pre-quoted '<a href="...">'
+       |                       # or:
+           [$punc]*            # 0 or more punctuation
+           (?:                 #   followed
+               [^$any]         #   by a non-url char
+           |                   #   or
+               $               #   end of the string
+           )                   #
+       |                       # or else
+           $                   #   then end of the string
+        )
+      }{<a href="$1">$1</a>}igox;
+
+    $rest;
+}
+
 #
 # process_pre - indented paragraph, made into <pre></pre>
 #
@@ -1337,48 +1398,8 @@
                  "$1$url" ;
               }xeg;

-    # Look for embedded URLs and make them into links.  We don't
-    # relativize them since they are best left as the author intended.
-
-    my $urls = '(' . join ('|', qw{
-                http
-                telnet
-               mailto
-               news
-                gopher
-                file
-                wais
-                ftp
-            } )
-        . ')';
-
-    my $ltrs = '\w';
-    my $gunk = '/#~:.?+=&%@!\-';
-    my $punc = '.:!?\-;';
-    my $any  = "${ltrs}${gunk}${punc}";
-
-    $rest =~ s{
-       \b                      # start at word boundary
-       (                       # begin $1  {
-           $urls :             # need resource and a colon
-           (?!:)               # Ignore File::, among others.
-           [$any] +?           # followed by one or more of any valid
-                               #   character, but be conservative and
-                               #   take only what you need to....
-       )                       # end   $1  }
-       (?=
-           &quot; &gt;         # maybe pre-quoted '<a href="...">'
-       |                       # or:
-           [$punc]*            # 0 or more punctuation
-           (?:                 #   followed
-               [^$any]         #   by a non-url char
-           |                   #   or
-               $               #   end of the string
-           )                   #
-       |                       # or else
-           $                   #   then end of the string
-        )
-      }{<a href="$1">$1</a>}igox;
+
+    $rest = make_hrefs($rest) if $Prelink;

     # text should be as it is (verbatim)
     $$text = $rest;




[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=medium
---
Site configuration information for perl v5.8.8:

Configured by Debian Project at Wed Dec  6 23:17:41 UTC 2006.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.18.3, archname=i486-linux-gnu-thread-multi
    uname='linux saens 2.6.18.3 #1 smp sat nov 25 13:39:52 est 2006 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.1.2 20061115 (prerelease) (Debian 4.1.1-20)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.6.so, so=so, useshrplib=true, libperl=libperl.so.5.8.8
    gnulibc_version='2.3.6'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    

---
@INC for perl v5.8.8:
    /home/jaalto/var/lib/code/perl
    /etc/perl
    /usr/local/lib/perl/5.8.8
    /usr/local/share/perl/5.8.8
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.8.7
    /usr/local/share/perl/5.8.7
    /usr/local/lib/perl/5.8.4
    /usr/local/share/perl/5.8.4
    .

---
Environment for perl v5.8.8:
    HOME=/home/jaalto
    LANG (unset)
    LANGUAGE (unset)
    LC_ALL=en_US
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/bin:/home/jaalto/var/link/bin:/sbin:/bin:/usr/bin:/usr/sbin:/usr/share/bin:/usr/bin/X11:/usr/games
    PERL5LIB=/home/jaalto/var/lib/code/perl
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About