develooper Front page | perl.libwww | Postings from January 2002

Re: Fixing opening/closing tags.

Thread Previous
From:
Bill Moseley
Date:
January 6, 2002 09:24
Subject:
Re: Fixing opening/closing tags.
Message ID:
3.0.3.32.20020106092436.02931ff0@pop3.hank.org
At 05:41 PM 01/06/02 +0100, Reinier Post wrote:
>>    <b>This <em>is something -- really</em> -- awkward</b> without doubt
>> 
>> Ends up:
>> 
>>    <b>This <em>is something</em></b>
>>    <b><em>really</em></b>
>>    <b>awkward</b> without doubt
>
>This looks complex enough to merit an exact specification before you
>look for solutions.  As far as I can see you want to do two separate
>transformations:
>
> 1) all '--' within text content are replaced with "\n"
>
> 2a) (purely technical) all text content elements containing "\n" are split
>    such that the "\n" ends up in a separate text element that I'll call
>    "a newline element"
> 2b) all elements containing a newline element as child are split on the
>    newline element, pushing the newline element one level up, unless
>    such a split is invalid according to the DTD
>where 2b is applied until it no longer applies.

Yes, I think that's correct (you are using \n instead of a double dash as
the split point).

In simple terms, I'm taking a string that may have some type of (correctly
balanced) markup.  Splitting it on /\s*--\s*/, and then those parts are
going to end up as the text element of links.

my @tag_stack;

for my $parts ( split /\s*--\s*/, $orig_string ) {
    my $text = balance_tags( $part, \@tag_stack )
    my $href = build_href( $part );
    push @links, qq[<a href="$href">$text</a>];
}



>Example: 
>
>   <b>shouting: <em>hello\nworld</em></b>

...
>and after another 2b
>
>       +- "shouting: "
>       |
>    b -+
>       |
>       + em -+- "hello"
>
>    "\n"
>
>    b -+- em -+- "world"

Yep.

So my next question is how to make those transformations with
HTML::TreeBuilder.  I don't see much problem writing that balance_tags()
sub above, but it would be nice to see how to do it with TreeBuilder.

Thanks,





-- 
Bill Moseley
mailto:moseley@hank.org

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About