develooper Front page | perl.perl5.porters | Postings from April 2019

[perl #84670] unpack(C => ...) on string with UTF8 FLAG without <usebytes> may return value more than 255

From:
Karl Williamson via RT
Date:
April 16, 2019 04:07
Subject:
[perl #84670] unpack(C => ...) on string with UTF8 FLAG without <usebytes> may return value more than 255
Message ID:
rt-4.0.24-16947-1555387616-707.84670-15-0@perl.org
On Mon, 28 Feb 2011 15:06:10 -0800, perl5-porters@ton.iguana.be wrote:
> In article <rt-3.6.HEAD-24085-1298398878-801.84670-15-0@perl.org>,
>         "Eric Brine via RT" <perlbug-followup@perl.org> writes:
> > On Tue Feb 22 10:13:05 2011, ikegami@adaelis.com wrote:
> >> You didn't say what you expect it to do. I suppose it could throw an
> >> exception, but the current behaviour is quite reasonable to me.
> >
> > $ perl -we'printf "%02X\n", unpack "N", "\0\0\0\x{442}"'
> > Character(s) in 'N' format wrapped in unpack at -e line 1.
> > 42
> >
> > $ perl -wle'printf "%02X\n", unpack "C", "\x{442}"'
> > 442
> >
> > I suppose the latter could do like the former (warn and "& 0xFF" the
> > input), but the latter's behaviour is so much more useful.
> 
> Actually when I made the unicode pack/unpack patch the "C" format was
> seen as a possible backward incompatibility problem and on p5p I was
> asked to add another character to mean "full single character
> semantics",
> which became the "W" (word) character. But I only did that for pack it
> seems:
> 
> perl -wle 'print ord pack("C", 1000)'
> Character in 'C' format wrapped in pack at -e line 1.
> 232
> 
> perl -wle 'print ord pack("W", 1000)'
> 1000
> 
> So the "C" format basically works "modulo 256"
> 
> I think its entirely reasonable to have the same behaviour for unpack
> so that
> 
> unpack "C", "\x{442}" would give 66 (1090 % 256) together with a
> format
> wrap warning
> (notice that it still won't give 209 which is a nonsense answer
> corresponding to internal details)
> 
> The admittedly much more sane behaviour of returning 1090 would still
> be
> available with W,
> 
> unpack "W", "\x{442}" would give 1090
> 
> This woould be completely in line with the documented (in perldoc -f
> pack)
> 
> C   An unsigned char (octet) value.
> W   An unsigned char value (can be greater than 255).
> 
> "W" was always meant as the unicode sane version of "C"
> 
> I can make a patch if people agree with this...

No one responded to this.  It looks ok to me.
-- 
Karl Williamson

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=84670



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About