develooper Front page | perl.perl5.porters | Postings from May 2015

Re: RFC: what to do about bitwise string operators

Thread Previous | Thread Next
From:
Father Chrysostomos
Date:
May 4, 2015 21:20
Subject:
Re: RFC: what to do about bitwise string operators
Message ID:
4588CF93-5963-499F-B968-B1F891EF1EDF@cpan.org

On May 2, 2015, at 11:12 AM, Karl Williamson <public@khwilliamson.com> wrote:

> When | & ^ ~ are executed on strings, or when the new |. &. ^. ~. operators are run, the internal representation of those strings is relied on (and hence exposed).  This means different behaviors will often result on EBCDIC vs ASCII platforms.
> 
> More importantly, whether a string is in UTF-8 or not may affect the result.  There is no such problem if the string is comprised solely of ASCII characters (on ASCII machines or ASCII-equivalent characters plus the C1 controls on EBCDIC machines), which is why people may not have been bitten much by this in the past.
> 
> So what to do if the string has non-ASCII characters and is in UTF-8?  I see the following possibilities:
> 
> A) no change from current behavior, document it better.  (This is what will happen in v5.22)
> 
> B) warn
> 
> C) Do the operation on the underlying code points (that is effectively convert to U32 or U64 before the operation, and convert back at the end)
> 
> D) Downgrade if possible and leave the result downgraded, or possibly upgrade the result.  I suppose warn if not possible to downgrade
> 
> E) **Your ideas here**

I vote for D, since it closely matches what is done elsewhere.  Alternatively we could croak for any characters > 255, which I think we do for some sys calls already.  In either case it would be ‘Wide character in whatever’.

Sorry for being silent of late.  I have suddenly had an extra work load that used up all my spare time, but things seem to be quieting down a little now.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About