develooper Front page | perl.perl5.porters | Postings from January 2016

RFC: qr/\b{sbA} \b{sbB}/

Thread Next
From:
Karl Williamson
Date:
January 23, 2016 05:53
Subject:
RFC: qr/\b{sbA} \b{sbB}/
Message ID:
56A31536.60800@khwilliamson.com
The Unicode sentence boundary definition is for word processing text, 
where the word processor wraps things for display without having a hard 
newline there.  Any hard \n will cause a double space in the output. 
Thus hard \n in the text is interpreted as a paragraph separator.

But there is another type of text where \n is simply a line terminator 
and isn't a paragraph separator at all.  The Unicode algorithm doesn't 
work on these.

It occurred to me that it would be easy to have two types of sentence 
boundaries, one for each type of text.  I'm calling them sbA and sbB for 
the time being.  One could have, or not, an re pragma to switch the 
meaning of plain \b{sb} to one or the other.

Do you think this is a good/bad/indifferent idea?

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About