develooper Front page | perl.perl6.language | Postings from March 2005

Re: [Fwd: Re: [RFC] A more extensible/flexible POD (ROUGH-DRAFT)]

Thread Previous | Thread Next
From:
Sam Vilain
Date:
March 17, 2005 22:02
Subject:
Re: [Fwd: Re: [RFC] A more extensible/flexible POD (ROUGH-DRAFT)]
Message ID:
423A6ED4.4070808@vilain.net
Damian Conway wrote:
> [No, I'm not back; I'm just passing by. But I feel that I need to 
> comment on this whole issue]

Thanks!  This message has lots of useful information that I would have 
otherwise probably missed.

It seems that the basic premise of the POD document object model gels 
well with that early design document, so I look forward to being able to 
flesh out the details.

Using ^=\s to delimit a line starting with a = will interfere with the 
Kwid method of:

  = Heading

  foo

Which I was imagining would be converted to a DOM tree that when 
represented in the "Normative XML" would look like:

  <sect1>
    <title>Heading</title>
    <para>foo</para>
  </sect1>

That's sort of DocBook style, and in fact I was thinking that for the 
internal representation, DocBook node names could be used where there is 
no other better alternative.  Of course, non-documentation things like 
Test fragments or inclusions of external entities, like UML diagrams 
won't have a representation in DocBook :-).

The uses of a leading = in a paragraph are fairly uncommon.  For 
instance, when quoting POD you would simply indent it a bit to make it 
verbatim and there is no issue.

I see a middle ground; that is, `=` quoting is only is allowed if it 
directly follows the initial POD marker;

  =head1 Foo
  =
  = =head1
  = =
  = = =head1 That's just getting ridiculous

Which I see as represented by;

  <sect1>
    <title>Foo</title>
    <para>=head1
  =
  = =head1 That's just getting ridiculous</para>
  </sect1>

Which of course would lose the ='s.  But that's OK, because if you 
wanted verbatim you could have just indented the block.

If you wanted to lead a normal paragraph with it, you'd just use the 
normally implicit =para (equivalent to =pod):

  =para
  =
  = = This is what a Kwid =head1 looks like

As for going with =kwid to denote the starting of kwid, I have so far 
been pessimistically assuming that something like `=dialect kwid`, or 
`=use kwid` (as described in the design doc you attached) would be 
required.  However, we could allow `=unknown`, where `unknown` is an 
unknown keyword, to try to load Pod::Dialect::unknown, and hope like 
hell it provides the Role of Pod::Dialect.

While the `^=` escaping is “active”, the presence or absence of 
whitespace following the initial `=` will delimit breaks in paragraphs. 
  This has to be so, otherwise the previous example would have been:

  <sect1>
    <title>Foo

  =head1
  =
  = =head1 That's just getting ridiculous
  </title>
  </sect1>

Which is just plain silly.  This follows what people are used to with 
POD - blank lines must be empty, not just no non-whitespace characters 
(an increasingly vague concept these days).

So, the POD processing happens in 3 levels (note: the first isn't really 
mentioned in perlpodspec.kwid, which is a bug);

=list
- chunkification from the original source, into POD paragraphs, which 
may or may not include an initial `^=foo` marker.  At *this* level, the 
only escaping that happens is the `^=` escaping.

That's all that needs to happen while the code is being read, and for 
most code that is how the POD will remain, in memory, somewhere 
intermingled with the Parse Tree for the code, so that the code can 
still be spat back out by the P6 equivalent of `B::Deparse`

- parsing of these raw chunks into a real POD DOM.  Please, tired XML 
veterans, please don't get upset by the use of the term "DOM", I think 
the last thing anyone wants is to have studlyCaps functions like 
`getElementById` and `createTextNode`.  It is the tree concept itself 
which is important, and this pre-dates XML anyway.

Strictly speaking, this step actually converts POD paragraph chunk 
events into POD DOM events.  These can be used to build a real DOM, for 
instance if you need to do an XPath style query for a link (I was amazed 
that someone's actually gone and built Pod::XPath!), or they might 
simply be passed onto the next stage by an output processor with no 
intermediate tree being built.

So, at this point, dialects get hooks to perform custom mutation of POD 
paragraph events into DOM events, and the arbitrator of this process 
ensures that the output events are well "balanced" by spitting out 
closing tags where it has to.  They can store state in their parser 
object, but none of this state will be preserved past the parsing state.
However, the nodes that they "spit out" after this point may still not 
be "core" POD, such as for includes or out-of-band objects.  These hooks 
will be sufficient to allow them to hijack subsequent chunks that would 
otherwise be served to other dialects, ie, they can choose to 
"arbitrate" subsequent chunks.

I'm aiming to make it so that it is possible for dialects to be "round 
trip safe", by being able to go back from this DOM state to the original 
POD paragraph chunks.  This would require dialects to "play nice" of 
course, but is a potential option to help make things like smart text 
editors be able to automatically syntax highlight POD dialects :).

Linking will be in terms of this intermediate tree, so you won't be able 
to link to included portions of manual pages :).  I'm not sure whether 
that matters.

- "output ready" form may also either be a stream of events or a DOM 
tree.  In this mode, all of the events from the first stage are simply 
fed through a loopback preprocessor, which asks Dialects to convert 
their non-core nodes to core nodes, or drop them, or whatever.  At this 
point, the structure can have handles to out of band objects like 
images, etc - that can't be converted to XML.  Again, dialects are 
capable of arbitrating the loopback process for any events that *follow* 
theirs.

Of course, documents that are not in a dialect (and do not have nodes 
that `=include` and suchlike) will not need any pre-processing to be 
ready for “output”.

=end list

If there is anything that you think is ghastly wrong with the above 
picture, let me know of course, but I don't think it's actually all that 
much different from what has to go on under the hood in a Pod parser or 
markup tool, anyway.  In particular, MarkOv - as the author of the most 
comprehensive POD markup system there is, this means you!  :-)

There is a big question about inline styles still open, and how 
converting paragraph bodies to a series of POD events works (clearly, 
this is essential for single-paragraph Kwid list blocks, etc) - but I'm 
hoping the answer will just smack me in the face as I start to work with 
ingy on the prototype implementation, and specifying the details of what 
node types the POD DOM and/or DTD allows.

Now, I've done plenty of planning for this now, it's even looking 
hopeful!  So time for me to keep quiet until I've built something :-).

Sam.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About