develooper Front page | perl.perl5.porters | Postings from June 2015

Re: RFC: what to do about bitwise string operators

Thread Previous | Thread Next
Ricardo Signes
June 12, 2015 19:15
Re: RFC: what to do about bitwise string operators
Message ID:
* Karl Williamson <> [2015-05-02T14:12:16]
> When | & ^ ~ are executed on strings, or when the new |. &. ^. ~. operators
> are run, the internal representation of those strings is relied on (and
> hence exposed).  This means different behaviors will often result on EBCDIC
> vs ASCII platforms.
> More importantly, whether a string is in UTF-8 or not may affect the result.
> There is no such problem if the string is comprised solely of ASCII
> characters (on ASCII machines or ASCII-equivalent characters plus the C1
> controls on EBCDIC machines), which is why people may not have been bitten
> much by this in the past.

Karl and I discussed this at YAPC.

My current thinking:

A string's codepoints should be treated as octets and operated upon bitwise.
There should be no "Unicode bug."  "😊" & "😟" should raise an exception.

  String with code points over 0xFF may not be used as bit strings on %s side
  of %s operator

  ("left", "&.")

On the other hand, ("😊" | "😟") should return "😐".

(That's a joke.  Please don't.)


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About