develooper Front page | perl.perl5.porters | Postings from May 2015

Re: RFC: what to do about bitwise string operators

Thread Previous | Thread Next
Father Chrysostomos
May 4, 2015 21:20
Re: RFC: what to do about bitwise string operators
Message ID:

On May 2, 2015, at 11:12 AM, Karl Williamson <> wrote:

> When | & ^ ~ are executed on strings, or when the new |. &. ^. ~. operators are run, the internal representation of those strings is relied on (and hence exposed).  This means different behaviors will often result on EBCDIC vs ASCII platforms.
> More importantly, whether a string is in UTF-8 or not may affect the result.  There is no such problem if the string is comprised solely of ASCII characters (on ASCII machines or ASCII-equivalent characters plus the C1 controls on EBCDIC machines), which is why people may not have been bitten much by this in the past.
> So what to do if the string has non-ASCII characters and is in UTF-8?  I see the following possibilities:
> A) no change from current behavior, document it better.  (This is what will happen in v5.22)
> B) warn
> C) Do the operation on the underlying code points (that is effectively convert to U32 or U64 before the operation, and convert back at the end)
> D) Downgrade if possible and leave the result downgraded, or possibly upgrade the result.  I suppose warn if not possible to downgrade
> E) **Your ideas here**

I vote for D, since it closely matches what is done elsewhere.  Alternatively we could croak for any characters > 255, which I think we do for some sys calls already.  In either case it would be ‘Wide character in whatever’.

Sorry for being silent of late.  I have suddenly had an extra work load that used up all my spare time, but things seem to be quieting down a little now.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About