develooper Front page | perl.perl5.porters | Postings from August 2001

perlpodspec, draft 1

Thread Previous
From:
Sean M. Burke
Date:
August 11, 2001 03:36
Subject:
perlpodspec, draft 1
Message ID:
3.0.6.32.20010811043605.007ad750@mail.spinn.net
(CC any replies to pod-people@perl.org too; sorry if this is a repost.)

=head1 NAME

perlpodspec - Plain Old Documentation: specification and notes

=head1 DESCRIPTION

This document is detailed notes on the pod markup language.  Most
people will only have to read L<perlpod|perlpod> to know how to write
pod, but this document may answer some incidental questions to do
with parsing and rendering POD.

=head1 Pod Definitions

Pod is embedded in files, typically Perl source files -- although you
can write a file that's nothing but pod.

A B<line> in a file consists of zero or more non-newline characters,
terminated by either a newline or the end of the file.

A B<newline sequence> is usually a platform-dependent concept, but
pod parsers should understand it to mean any of CR (ASCII 13), LF
(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in
addition to any other system-specific meaning.  The first CR/CRLF/LF
sequence in the file may be used as the basis for identifying the
newline sequence for parsing the rest of the file.

A B<blank line> is a line consisting entirely of zero or more spaces
(ASCII 32) or tabs (9), and terminated by a newline or end-of-file.
A B<non-blank line> is a line containing one or more characters other
than space or tab (and terminated by a newline or end-of-file).

(I<Note:> Many older pod parsers did not accept a line consisting of
spaces/tabs and then a newline as a blank line -- the only lines they
considered blank were lines consisting of I<no characters at all>,
terminated by a newline.)

B<Whitespace> is used in this document as a blanket term for spaces,
tabs, and newline sequences.

A B<pod parser> is a module meant for parsing pod (regardless of
whether this involves calling callbacks or building a parse tree or
directly formatting it).  A B<pod formatter> (or B<pod translator>)
is a module or program that converts pod to some other format (HTML,
plaintext, TeX, PostScript, RTF).  A B<pod processor> might be a
formatter or translator, or might be a program that does something
else with the pod (like wordcounting it, scanning for index points,
etc.).

Pod content is contained in B<pod blocks>.  A pod block starts with a
line that matches <m/^=[a-zA-Z]/>, and continues up to the next line
that matches C<m/^=cut/> -- or up to the end of the file, if there is
no C<m/^=cut/> line.

=for comment
 The current perlsyn says:
 [beginquote]
   Note that pod translators should look at only paragraphs beginning
   with a pod directive (it makes parsing easier), whereas the compiler
   actually knows to look for pod escapes even in the middle of a
   paragraph.  This means that the following secret stuff will be ignored
   by both the compiler and the translators.
      $a=3;
      =secret stuff
       warn "Neither POD nor CODE!?"
      =cut back
      print "got $a\n";
   You probably shouldn't rely upon the warn() being podded out forever.
   Not all pod translators are well-behaved in this regard, and perhaps
   the compiler will become pickier.
 [endquote]
 I think that those paragraphs should just be removed; paragraph-based
 parsing  seems to have been largely abandoned, because of the hassle
 with non-empty blank lines messing up what people meant by "paragraph".
 Even if the "it makes parsing easier" bit were especially true,
 it wouldn't be worth the confusion of having perl and pod2whatever
 actually disagree on what can constitute a pod block.

Within a pod block, there are B<pod paragraphs>.  A pod paragraph
consists of non-blank lines of text, separated by one or more blank
lines.

For purposes of pod processing, there are four types of paragraphs in
a pod block:

=over

=item *

A command paragraph.  The first line of this paragraph must match
C<m/^=[a-zA-Z]/>.  Command paragraphs are typically one line, as in:

  =head1 NOTES

  =item *

But they may span several (non-blank) lines:

  =for comment
  Hm, I wonder what it would look like if
  you tried to write a BNF for POD from this.

I<Some> command paragraphs allow interior sequences in their content
(i.e., after the part that matches C<m/^=[a-zA-Z]\S*\s*/>), as in:

  =head1 Did You Remember to C<use strict;>?

(In other words, the pod processor for "head1" will apply the same
processing to "Did You Remember to CE<lt>use strict;>?" that it would
to an ordinary paragraph -- i.e., interior sequences (like
"CE<lt>...>" are parsed (and presumably formatted appropriately), and
whitespace in the form of literal spaces and/or tabs is not significant.

=item *

A B<verbatim paragraph>.  The first line of this paragraph must be a
literal space or tab, and this paragraph must not be inside a "=begin
I<identifier>", ... "=end I<identifier>" sequence where
"I<identifier>" begins with something other than a colon (":").

Whitespace I<is> significant (although a parser should support the
optional expansion of tabs to spaces), and interior sequences are not
parsed.

=item *

An B<ordinary paragraph>.  An ordinary paragraph is distinguished by
the fact that the first line matches neither C<m/^=[a-zA-Z]/> nor
C<m/^ \t/>, I<and> by not being inside a "=begin I<identifier>",
... "=end I<identifier>" sequence, where "I<identifier>" begins with
something other than a colon (":").

=item *

A B<data paragraph>.  This is a paragraph that I<is> inside a "=begin
I<identifier>" ... "=end I<identifier>" sequence where
"I<identifier>" does I<not> begin with a literal colon (":").  In
some sense, a data paragraph is not part of pod at all, since it's
not subject to pod parsing; but it is specified here, since pod
parsers need to be able to call an event for it, or store it in some
form in a parse tree, or at least just parse I<around> it.

=back

=head1 Notes on Implementing Pod Processors

In the remainder of this document, "must" / "must not", "should" /
"should not", and "may" have their conventional (cf. RFC 2119)
meanings: "X must do Y" means that if X doesn't do Y, it's against
this specification, and should really be fixed.  "X should do Y"
means that it's recommended, but X may fail to do Y, if there's a
good reason.  "X may do Y" is merely a note that X can do Y at will
(although it is up to the reader to detect any connotation of "and I
think it would be I<nice> if X did Y" versus "it wouldn't really
I<bother> me if X did Y").

=over

=item *

Pod formatters should tolerate lines in verbatim blocks that are of
any length, even if that means having to break them to avoid text
running off the side of the page.

=item *

Pod parsers must recognize I<all> of the three well-known newline
formats: CR, LF, and CRLF.  See L<perlport|perlport>.

=item *

Pod parsers should accept input lines that are of any length.

=item *

Since Perl recognizes a Unicode Byte Order Mark at the start of files
as signaling that the file is Unicode encoded as in UTF-16 (whether
big-endian or little-endian), pod parsers should do the same.
Otherwise, the character encoding should be understood as being
UTF-8.

=item *

Pod processors must treat a "=for [label] [content...]" paragraph as
meaning the same thing as a "=begin [label]" paragraph, content, and
an "=end [label]" paragraph.  (The parser may conflate these two
constructs, or may leave them distinct, in the expectation that the
formatter will nevertheless treat them the same.)

=item *

When rendering pod to a format that allows comments (i.e., to nearly
any format other than plaintext), a pod formatter must insert comment
text identifying its name and version number, and the name and
version numbers of any modules it might be using to process the pod.
Minimal examples:

  %% POD::Pod2PS v3.14159, using POD::Parser v1.92
  
  <!-- Pod::HTML v3.14159, using POD::Parser v1.92 -->
  
  {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
  
  .\" Pod::Man version 3.14159, using POD::Parser version 1.92

Formatters may also insert additional comments, including: the
release date of the pod formatter program, the contact address for
the author(s) of the formatter, the current time, the name of input
file, the formatting options in effect, version of Perl used, etc.

Formatters may also choose to note errors/warnings as comments,
besides or instead of emitting them otherwise (as in messages to
STDERR, or C<die>ing).

=item *

Pod parsers I<may> emit warnings or error messages ("Unknown sequence
EE<lt>zslig>!") to STDERR, but I<must> allow I<suppressing> all such
STDERR output, instead reporting errors/warnings in some other way,
whether by triggering a callback, or noting errors in some attribute
of the document object, or some similarly unobtrusive mechanism.

=item *

In paragraphs where interior sequences (like EE<lt>...>, BE<lt>...>)
are understood (i.e., I<not> verbatim paragraphs, but I<including>
ordinary paragraphs, and command paragraphs that produce renderable
text, like "=head1"), literal whitespace should generally be considered
"insignificant", in that one literal space has the same meaning as any
(nonzero) number of literal spaces, literal newlines, and literal tabs
(as long as this produces no blank lines, since those would terminate
the paragraph).  Pod parsers should compact literal whitespace in each
processed paragraph, but may provide an option for overriding this
(since some processing tasks do not require it), or may follow
additional special rules (for example, specially treating
period-space-space or period-newline sequences).

=item *

Pod parsers should not, by default, try to coerce apostrophe (') and
quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to
turn backtick (`) into anything else but a single backtick character
(distinct from an openquote character!), nor "--" into anything but
two minus signs.  They I<must never> do any of those things to text
in CE<lt>...> sequences, and never I<ever> to text in verbatim
paragraphs.

=item *

When rendering pod to a format that has two kinds of hyphens (-), one
which is a nonbreaking hyphen, and one which is a breakable hyphen
(as in "object-oriented", which can be split across lines as
"object-", newline, "oriented"), formatters are encouraged to
generally translate "-" to nonbreaking hyphen, but may apply
heuristics to convert some of these to breaking hyphens.

=item *

Pod formatters should make reasonable efforts to keep words of Perl
code from being broken across lines.  For example, "Foo::Bar" in some
formatting systems is seen as eligible for being broken across lines
as "Foo::" newline "Bar" or even "Foo::-" newline "Bar".  This should
be avoided where possible, either by disabling all line-breaking in
mid-word, or by wrapping particular words with internal punctuation
in "don't break this across lines" codes (which in some formats may
not be a single code, but might be a matter of inserting non-breaking
zero-width spaces between every pair of characters in a word.)

=item *

Pod parsers should expand tabs in verbatim paragraphs as they are
processed, before passing them to the formatter or other processor.
Parsers may also allow an option for overriding this.

=item *

Pod parsers should remove newlines from the end of paragraphs before
passing them to the parser.

=item *

Pod parsers, when reporting errors, should make some effort to report
an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near
line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph
number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!").  Where
this is problematic, the paragraph number should at least be
accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in
Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for
the CE<lt>interest rate> attribute...'")

=item *

Pod parsers, when processing a series of verbatim paragraphs one
after another, should consider them to be one large verbatim
paragraph that happens to contain blank lines.  I.e., these two
lines, which have an blank line between them:

	use Foo;

	print Foo->VERSION

should be unified into one paragraph ("\tuse Foo;\n\n\tprint
Foo->VERSION") before being passed to the formatter or other
processor.  Parsers may also allow an option for overriding this.

While this might be too cumbersome to implement in event-based pod
parsers, it is straightforward for parsers that return parse trees.

=item *

Pod formatters, where feasible, are advised to avoid splitting short
verbatim paragraphs (under twelve lines, say) across pages.

=item *

Pod parsers must treat a line with only spaces and/or tabs on it as a
"blank line" such as separates paragraphs.  (Some older parsers
recognized only two adjacent newlines as a "blank line" but would not
recognize a newline, a space, and a newline, as a blank line.  This
is noncompliant behavior.)

=item *

Authors of pod formatters/processors should make every effort to
avoid writing their own pod parser.  There are already several in
CPAN, with a wide range of interface styles -- and one of them,
Pod::Parser, comes with modern versions of Perl.

=item *

Characters in pod documents may be conveyed either as literals, or by
number in EE<lt>n> sequences, or by an equivalent mnemonic, as in
EE<lt>eacute> which is equivalent to EE<lt>233>.

Characters in the range 32-126 refer to US-ASCII characters, which
all pod formatters must render faithfully.  Characters in the ranges
0-31 and 127-159 should not be used, except for the literal sequences
for newline (13, 13 10, or 13), and tab (9).

Characters in the range 160-255 refer to Latin-1 characters (also
defined there by Unicode, with the same meaning).  Characters above
255 should be understood to refer to Unicode characters.  Be warned
that some formatters cannot reliably render anything outside 32-126;
and many are able to handle 32-126 and 160-255, but nothing above
255.

=item *

Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" sequences for
less-than and greater-than, pod parsers must understand "EE<lt>sol>"
for "/" (solidus, slash), and "EE<lt>verbar>" for "|" (vertical bar,
pipe).  Pod parsers should also understand "EE<lt>lchevron>" and
"EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e.,
"left-pointing double angle quotation mark" = "left pointing
guillemet" and "right-pointing double angle quotation mark" = "right
pointing guillemet".  (These look like little "<<" and ">>", and they
are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>"
and "EE<lt>raquo>".)

=item *

Pod parsers should understand all "EE<lt>html>" sequences as defined
in the entity declarations in the most recent XHTML specification at
C<www.W3.org>.  Pod parsers must understand at least the entities
that define characters in the range 160-255 (Latin-1).  Pod parsers,
when faced with some unknown "EE<lt>I<identifier>>" sequence,
shouldn't simply replace it with nullstring (by default, at least),
but may pass it through as a sequence consisting of a literal E,
less-than, I<identifier>, greater-than.  Or pod parsers may offer the
alternative option of representing such unknown
"EE<lt>I<identifier>>" sequences as an unprocessed E sequence with
content "I<identifier>" (as opposed to all the known E sequences
which normally get expanded to the literal characters that they stand
for), in the hopes that it may have special meaning to a given
formatter -- or that the pod formatter may pass along a warning about
this unknown sequence.

=item *

Pod parsers must also support the XHTML sequences "EE<lt>quot>" for
character 34 (doublequote, "), "EE<lt>amp>" for character 38
(ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, ').

=item *

It is up to individual pod formatter to display good judgment when
confronted with an unrenderable character (which is distinct from an
unknown EE<lt>thing> sequence that we couldn't resolve to anything,
renderable or not).  It is good practice to map Latin letters with
diacritics (like "EE<lt>eacute>"/"EE<lt>299>") to the corresponding
unaccented US-ASCII letter (like a simple character 101, "e"), but
clearly this is often not feasable, and an unrenderable character may
be represented as "?", or the like.  A pod formatter may also note,
in a comment, a list of what unrenderable characters were
encountered.

=item *

EE<lt>...> may freely appear in any interior sequence (other than
EE<lt>...> or a ZE<lt>>!).  That is, "XE<lt>The EE<lt>euro>1,000,000
Solution>" is valid, as is "LE<lt>The EE<lt>euro>1,000,000
Solution|Million::Euros>".

=item *

In parsing pod, a notably tricky part is the correct parsing of
(potentially nested!) interior sequences.  Implementors should
consult the code in the C<parse_text> routine in Pod::Parser as an
example of a correct implementation.

=item *

Some pod formatters output to formats that implement nonbreaking
spaces as an individual character (which I'll call "NBSP"), and
others output to formats that implement nonbreaking spaces just as
spaces wrapped in a "don't break this across lines" code.  Note that
at the level of pod, both sorts of codes can occur: pod can contain a
NBSP character (whether as a literal, or as a "EE<lt>160>" or
"EE<lt>nbsp>" sequence); and pod can contain "SE<lt>foo
IE<lt>barE<gt> baz>" sequences, where "mere spaces" (character 32) in
such sequences are taken to represent nonbreaking spaces.  Pod
parsers should consider supporting the optional parsing of "SE<lt>foo
IE<lt>barE<gt> baz>" as if it were
"fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the
optional parsing of groups of words joined by NBSP's as if each group
were in a SE<lt>...> sequence, so that formatters may use the
representation that maps best to what the output format demands.

=item *

If you think that you want to add a new command to pod (like, say, a
"=biblio" command) to pod, consider whether you could get the same
effect with a for or begin/end sequence: "=for bibio ..."  or "=begin
biblio" ... "=end biblio".  Pod processors that don't understand
"=for biblio", etc, will simply ignore it, whereas they may complain
loudly if they see "=bibio".

=back

=head1 About LE<lt>...> Sequences

As you can tell from a glance at L<perlpod|perlpod>, the LE<lt>...>
sequence is the most complex sequence in POD.  These points will
hopefully clarify what it means and how processor should deal with
it.

=over

=item *

In parsing an LE<lt>...> sequence, pod parsers must note four
attributes:

=over

=item 1

The link-text.  If there is none, this must be undef.  (E.g., in
"LE<lt>Perl Functions/perlfunc>", the link-text is "Perl Functions".
In "LE<lt>Time::HiRes>" and even "LE<lt>|Time::HiRes>", there is no
link text.  Note that link text may contain formatting.)

=item 2

The link-text -- but if there was none, then the text that we'll
infer in its place.  (E.g., for "LE<lt>Getopt::Std>", the inferred
link text is "Getopt::Std".)

=item 3

The name, or undef if none.  (E.g., in "LE<lt>Perl
Functions/perlfunc>", the name -- also sometimes called the page --
is "perlfunc".  In "LE<lt>/CAVEATS>", the name is undef.)

=item 4

The section (AKA "item" in older perlpods), or undef if none.  E.g.,
in L<Getopt::Std/DESCRIPTION>, "DESCRIPTION" is the section.  (Note
that this is not the same as a manpage section like the "5" in "man 5
crontab".  "Section Foo" in the pod sense means the part of the text
that's introduced by the heading or item whose text is "Foo".

=back

Pod parsers may also note additional attributes including:

=over

=item 5

A flag for whether item 3 (if present) is a URL (like
"http://lists.perl.org" is), in which case there should be no section
attribute; a pod name (like "perldoc" and "Getopt::Std" are); or
possibly a man page name (like "crontab(5)" is).

=item 6

The original LE<lt>...> content, before EE<lt>...> sequences
are expanded.

=back

For example:

  L<Foo::Bar>
    =>  undef,                          # link text
        "Foo::Bar",                     # possibly inferred link text
        "Foo::Bar",                     # name
        undef,                          # section
        'pod',                          # what sort of link
        "Foo::Bar"                      # original content

  L<Perlport's section on NL's|perlport/Newlines>
    =>  "Perlport's section on NL's",   # link text
        "Perlport's section on NL's",   # possibly inferred link text
        "perlport",                     # name
        "Newlines",                     # section
        'pod',                          # what sort of link
        "Perlport's section on NL's|perlport/Newlines" # orig. content

  L<perlport/Newlines>
    =>  undef,                          # link text
        "section "Newlines" in perlport", # possibly inferred link text
        "perlport",                     # name
        "Newlines",                     # section
        'pod',                          # what sort of link
        "perlport/Newlines"             # original content

  L<crontab(5)/DESCRIPTION>
    =>  undef,                          # link text
        "DESCRIPTION in crontab(5)",    # possibly inferred link text
        "crontab(5)",                   # name
        "DESCRIPTION",                  # section
        'man',                          # what sort of link
        "crontab(5)/DESCRIPTION"        # original content

  L</DESCRIPTION>
    =>  undef,                          # link text
        "DESCRIPTION",                  # possibly inferred link text
        undef,                          # name
        "DESCRIPTION",                  # section
        'pod',                          # what sort of link
        "/DESCRIPTION"                  # original content

  L<http://www.perl.org/>
    =>  undef,                          # link text
        "http://www.perl.org/",         # possibly inferred link text
        "http://www.perl.org/",         # name
        undef,                          # section
        'url',                          # what sort of link
        "http://www.perl.org/"          # original content

Note that you can distinguish URL-links from anything else by the
fact that they match C<m/^\w+\:[^:]\S+$/>.  So
C<LE<lt>http://www.perl.comE<gt>> is a URL, but
C<LE<lt>HTTP::ResponseE<gt>> isn't.

=item *

In case of LE<lt>...> sequences with no "text|" part in them,
formatters have exhibited great variation in actually displaying the
link or cross reference.  For example, LE<lt>crontab(5)> might render
as "in the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage"
or just "C<crontab(5)>".

It is recommended that processors use as little wording as possible,
in these cases where the link text has to be inferred.  Render
"LE<lt>Foo::Bar>" as just "Foo::Bar", and render
LE<lt>http://www.perl.org> as just "http://www.perl.org".  Section
links, which are less commonly used, should probably be rendered as
follows: Render "LE<lt>/Constructor Methods>" as "Constructor
Methods", and render "LE<lt>Foo::Bar/Constructor Methods>" as
"Constructor Methods in Foo::Bar"

Of course, if the authors providing LE<lt><text|...>, then this
avoids the problem entirely; but the above formatting conventions
mean authors won't have to keep using the absurd formatting as in:
"Whatever can't be done with LE<lt>LWP::Simple|LWP::Simple> should be
done with LE<lt>LWP::UserAgent|LWP::UserAgent>."

=item *

Note that section names might contain markup.  I.e., if a section
starts with:

  =head2 About the C<-M> Operator

(or "=item About the CE<lt>-M> Operator"), then a link to it would
look like this:

  L<somedoc/About the C<-M> Operator>

Formatters may choose to ignore the markup for purposes of resolving
the link and use only the renderable characters in the section name,
as in:

  <h1><a name="About_the_-M_Operator">About the <code>-M</code>
  Operator</h1>
  
  ...
  
  <a href="somedoc#About_the_-M_Operator">About the <code>-M</code>
  Operator" in somedoc</a>

=item *

Authors wanting to link to a particular (absolute) URL, must do so
only with "LE<lt>scheme:...>" sequences (like
LE<lt>http://www.perl.org>), and must not attempt "LE<lt>Some Site
Name|scheme:...>" sequences.  This restriction avoids many problems
in parsing and rendering LE<lt>...> sequences.

=item *

In a LE<lt>text|....> sequence, text may contain interior sequences
for formatting, as in LE<lt>text|....>

For LE<lt>...> sequences without a "name|" part, only EZ<...> and
ZE<lt>> sequences may occur -- no other interior sequences.  That is,
you should not use "LE<lt>BE<lt>Foo::Bar>>".

Authors must not nest LE<lt>...> sequences.  Anything like "LE<lt>The
LE<lt>Foo::Bar> man page>" may be treated as an error.

=item *

Note that pod authors may use formatting sequences inside the "text"
part of "LE<lt>text|name>" (and so on for LE<lt>text|/"sec">).

In other words, this is valid:

  Go read L<the docs on C<$.>|perlvar/"$.">

Some output formats that do allow rendering "LE<lt>...>" sequences as
hypertext, might not allow the link-text to be formatted; formatters
will have to just ignore that formatting then.

=back




=head1 About =over...=back Regions

"=over"..."=back" regions are used for various kinds of list-like
structures.  (I use the term "region" here simply as a collective
term for everything from the "=over" to the matching "=back".)

=over

=item *

The non-zero I<number> in "=over I<number>" ... "=back" is used for
giving the formatter a clue as to how many "spaces" (or tab stops, or
ems, etc.) it should tab over, although many formatters will have to
convert this to an absolute measurement that may not exactly match
with the size of spaces in the current font.  Other formatters may
completely ignore the number.  It is unclear what number a numberless
"=over" equates to.

=item *

Authors of pod formatters are reminded that "=over" ... "=back" may
map to several different constructs in your output format.  For
example, in converting pod to (X)HTML, it can map to any of
<ul>...</ul>, <ol>...</ol>, or <dl>...</dl> (and possibly even
<blockquote>...</blockquote>).  Similarly, "=item" can map to <li> or
<dl>.

=item *

Each "=over" ... "=back" region should be one of the following:

=over

=item *

An "=over" ... "=back" region containing only "=item *" commands,
each followed by some number of ordinary/verbatim paragraphs, other
nested "=over" ... "=back" regions, "=for..." paragraphs, and
"=begin"..."=end" regions.

(Pod processors should tolerate a bare "=item" as if it were "=item
*".)  Whether "*" is rendered as a literal asterisk, an "o", or as
some kind of real bullet character, is left up to the pod formatter,
and may depend on the level of indenting and/or nesting.

=item *

An "=over" ... "=back" region containing only
C<m/^=item\s+\d+\.?\s*$/> lines, each one (or each group of them)
followed by some number of ordinary/verbatim paragraphs, other nested
"=over" ... "=back" regions, or "=for..." paragraphs, and
"=begin"..."=end" sequences.  Note that the numbers must start at one
in each section, and must proceed in order and without skipping
numbers.

(Pod processors must tolerate lines like "=item 1" as if they were
"=item 1.", with the period.)

=item *

An "=over" ... "=back" region containing only "=item [text]"
commands, each one (or each group of them) followed by some number of
ordinary/verbatim paragraphs, other nested "=over" ... "=back"
regions, or "=for..." paragraphs, and "=begin"..."=end" regions.

The text in the "=item [text]" paragraph should not match
C<m/^=item\s+\d+\.?\s*$/> or C<m/^=item\s+\*\s*$/>, nor should it
match just C<m/^=item\s*$>.

=item *

Some parsers may also support an additional kind of "=over"
... "=back" region: one with no "=item" paragraphs, but I<only>
ordinary/verbatim paragraphs, and possibly also some nested "=over"
... "=back" regions, "=for..." paragraphs, and "=begin"..."=end"
regions.  Such an itemless "=over" ... "=back" region in pod is
equivalent in meaning to an "<blockquote>...</blockquote>" element in
HTML.

=back

=item *

While pod processors should try to tolerate any amount of text in the
"=item [text...]" paragraph, users should be advised that using more
than sixty-five renderable characters may not come out well in some
formats.

=item *

No "=over" ... "=back" region can contain headings.  Processors may
treat such a heading as an error.

=item *

Processors must tolerate an "=over" list that goes off the end of the
document (i.e., which has no matching "=back"), but they may warn
about such a list.

=back




=head1 About Data Paragraphs

Data paragraphs are typically used for inlining non-pod data that is
to be used (typically passed through) when rendering the document to
a specific format:

  =begin rtf
  
  \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
  
  =end rtf

The same effect could, incidentally, be achieved with a single "=for"
paragraph:

  =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}

(Although that is not formally a data paragraph, it has the same
meaning as one.)

Another example of a data paragraph:

  =begin html
  
  I like <em>PIE</em>!
  
  <hr>Especially pecan pie!
  
  =end html

If these were ordinary paragraphs, the pod parser would try to
expand the "EE<lt>/em>" (in the first paragraph) as an interior
sequence, just like "EE<lt>lt>" or "EE<lt>eacute>".  But since this
is in a "=begin I<identifier>"..."=end I<identifier>" region I<and>
the identifier "html" doesn't begin have a ":" prefix, the contents
of this region are stored as data paragraphs, instead of being
processed as ordinary paragraphs (or if they began with a spaces
and/or tabs, as verbatim paragraphs).

As a further example: At time of writing, no "biblio" identifier is
supported, but suppose some processor were written to recognize it as
sequence for (say) denoting a bibiographic reference (necessarily
containing internal sequences in ordinary paragraphs).  The fact that
"biblio" paragraphs were meant for ordinary processing would be
indicated by prefacing each "bibio" identifier with a colon:

  =begin :biblio

  Wirth, Niklaus.  1976.  I<Algorithms + Data Structures =
  Programs.>  Prentice-Hall, Englewood Cliffs, NJ.

  =end :biblio

This would signal to the parser that paragraphs in this begin...end
region are subject to normal handling as ordinary/verbatim paragraphs
(while still tagged as meant only for processors that understand the
"biblio" identifier).  The same effect could be had with:

  =for :biblio
  Wirth, Niklaus.  1976.  I<Algorithms + Data Structures =
  Programs.>  Prentice-Hall, Englewood Cliffs, NJ.

The ":" on these identifiers means simply "process this stuff
normally, even though the result will be for some special target".
I suggest that parser APIs report "biblio" as the target identifier,
but also report that it had a ":" prefix.  (And similarly, with the
above "html", report "html" as the target identifier, and note the
I<lack> of a ":" prefix.)

=head1 SEE ALSO

L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">,
L<podchecker>

=head1 AUTHOR

Sean M. Burke

=cut


--
Sean M. Burke    sburke@cpan.org    http://www.spinn.net/~sburke/

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About