develooper Front page | perl.perl5.porters | Postings from May 2015

Re: RFC: what to do about bitwise string operators

Thread Previous
Paul "LeoNerd" Evans
May 2, 2015 19:00
Re: RFC: what to do about bitwise string operators
Message ID:
On Sat, 2 May 2015 19:35:25 +0100
Zefram <> wrote:

> Karl Williamson wrote:
> >So what to do if the string has non-ASCII characters and is in UTF-8?
> I reckon the bitwise string ops should be defined to operate on
> logical octets [\x00-\xff].  Whether an octet string is stored in
> upgraded form shouldn't affect the logical result.  It should be *as
> if* the inputs are downgraded internally, but with upgraded inputs
> it's OK for that logical result to be output in upgraded form.  If
> there's a non-octet in the string, croak.
> >C) Do the operation on the underlying code points (that is
> >effectively convert to U32 or U64 before the operation, and convert
> >back at the end)
> That would imply that ~"\xaa" would be "\x{ffffff55}", rather than the
> present "\x55".  If done consistently, it would be very surprising to
> most existing users of the bitwise ops.  If only done for strings that
> contain non-octets, then the behaviour is inconsistent, which is also
> surprising.  See [perl #63574] which discussed this issue six years
> ago, with a statement of "we decided on the inconsistent behaviour".

We already have much precedent with warnings of "wide characters":

  $ perl -E 'syswrite STDOUT, "\x{301}"'
  Wide character in syswrite at -e line 1.

So it could be quite easily argued that the bitwise ops would do
similar; a hypothetical

  Wide character in bitwise and (&.) at -e line 1.

Paul "LeoNerd" Evans  |

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About