develooper Front page | perl.perl5.porters | Postings from June 2015

Re: RFC: what to do about bitwise string operators

Thread Previous | Thread Next
From:
Ricardo Signes
Date:
June 12, 2015 19:15
Subject:
Re: RFC: what to do about bitwise string operators
Message ID:
20150612191506.GA30452@cancer.codesimply.com
* Karl Williamson <public@khwilliamson.com> [2015-05-02T14:12:16]
> When | & ^ ~ are executed on strings, or when the new |. &. ^. ~. operators
> are run, the internal representation of those strings is relied on (and
> hence exposed).  This means different behaviors will often result on EBCDIC
> vs ASCII platforms.
> 
> More importantly, whether a string is in UTF-8 or not may affect the result.
> There is no such problem if the string is comprised solely of ASCII
> characters (on ASCII machines or ASCII-equivalent characters plus the C1
> controls on EBCDIC machines), which is why people may not have been bitten
> much by this in the past.

Karl and I discussed this at YAPC.

My current thinking:

A string's codepoints should be treated as octets and operated upon bitwise.
There should be no "Unicode bug."  "😊" & "😟" should raise an exception.
Probably:

  String with code points over 0xFF may not be used as bit strings on %s side
  of %s operator

  ("left", "&.")

On the other hand, ("😊" | "😟") should return "😐".

(That's a joke.  Please don't.)

-- 
rjbs

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About