develooper Front page | perl.perl5.porters | Postings from May 2015

RFC: what to do about bitwise string operators

Thread Next
From:
Karl Williamson
Date:
May 2, 2015 18:12
Subject:
RFC: what to do about bitwise string operators
Message ID:
55451380.2000909@khwilliamson.com
When | & ^ ~ are executed on strings, or when the new |. &. ^. ~. 
operators are run, the internal representation of those strings is 
relied on (and hence exposed).  This means different behaviors will 
often result on EBCDIC vs ASCII platforms.

More importantly, whether a string is in UTF-8 or not may affect the 
result.  There is no such problem if the string is comprised solely of 
ASCII characters (on ASCII machines or ASCII-equivalent characters plus 
the C1 controls on EBCDIC machines), which is why people may not have 
been bitten much by this in the past.

So what to do if the string has non-ASCII characters and is in UTF-8?  I 
see the following possibilities:

A) no change from current behavior, document it better.  (This is what 
will happen in v5.22)

B) warn

C) Do the operation on the underlying code points (that is effectively 
convert to U32 or U64 before the operation, and convert back at the end)

D) Downgrade if possible and leave the result downgraded, or possibly 
upgrade the result.  I suppose warn if not possible to downgrade

E) **Your ideas here**

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About