Front page | perl.perl5.porters |
Postings from July 2017
Re: Behavior of bitwise ops on unencountered wide characters
Thread Previous
|
Thread Next
From:
Graham Knop
Date:
July 13, 2017 10:52
Subject:
Re: Behavior of bitwise ops on unencountered wide characters
Message ID:
CAM=m89Fwjpu3fA_D=KfzwwRYt89s8QYLLfEA1JK82VV4pZ-h-w@mail.gmail.com
On Wed, Jul 12, 2017 at 7:45 PM, demerphq <demerphq@gmail.com> wrote:
> On 12 July 2017 at 19:02, Karl Williamson <public@khwilliamson.com> wrote:
>> On 07/12/2017 04:50 PM, Sawyer X wrote:
>>>
>>>
>>>
>>> On 07/11/2017 01:09 PM, Karl Williamson wrote:
>>>>
>>>> On 07/10/2017 11:12 PM, Father Chrysostomos wrote:
>>>>>
>>>>> Karl Williamson wrote:
>>>>>>
>>>>>> I don't yet have a fully formulated opinion on this, but one question I
>>>>>> would have is "How is this different from division by 0" that people
>>>>>> seem to deal ok with.
>>>>>
>>>>>
>>>>> Fatal division by zero is ancient. Fatalizing bitwise operations on
>>>>> utf8 breaks stuff.
>>>>>
>>>>> As I suggested in another thread (I seem to have been ignored), it
>>>>> would be *much* kinder to users to make it a warning. (Wide character
>>>>> in blah blah blah.) That way users who care can fatalize it, or sup-
>>>>> press it. You have the best of all three worlds.
>>>>>
>>>>
>>>> I believe I've referred to your suggestion in some thread. It is the
>>>> minimum we should do. And others believe it should be deprecated.
>>>
>>>
>>> There is a specific cost here Graham noted. This method is currently
>>> used to determine if a variable is a number without loading "B", which
>>> isn't cheap. While it is a simple argument of "users shouldn't care,"
>>> serializations (like JSON) need to be able to map them to their right
>>> type. It would be nice if there was a way to do this without B.
>>>
>>
>> It would be good to have some alternative that requires only a cheaply
>> loaded, or internal module, something named like "Internals" that provides a
>> clear access path for the things we have determined warrant it, such as
>> Graham's use case. He had to explain to me how it worked, and he had to
>> explain to Yves as well.
>
> The problem is he isn't really correct. I have been down this path
> before in Sereal and it hurts. See below.
>
>> That demonstrates is is non-obvious. When the
>> tools aren't available, people will do clever, but non-maintainable things
>> to get what they need. But it is best to furnish the tools when it becomes
>> known that they would be useful.
>
> Unfortunately, I never replied to Graham, which I should have.
>
> On 16 June 2017 at 13:04, Graham Knop <haarg@haarg.org> wrote:
>> On Thu, Jun 15, 2017 at 3:55 AM, demerphq <demerphq@gmail.com> wrote:
>>> On 9 June 2017 at 11:17, Graham Knop <haarg@haarg.org> wrote:
>>>> The result of ($var ^ "") can tell you the
>>>> status of the internal flags for a if value is a valid number, which
>>>> is needed for serialization.
>>>
>>> You mean it can tell you if something is a number that has never been
>>> stringified right?
>>>
>>> Can you explain this one a bit more?
>>
>> $number & "" -> 0
>> $string & "" -> ""
>>
>> The flags being checked are SVp_IOK or SVp_NOK. A number that has
>> been stringified will still register as a number based on this check.
>
> This is not always the case. These kind of checks are inherently problematic.
>
> perl -MDevel::Peek -le'$s="0e1"; 0+$s; print $s & ""; Dump($s);'
> 0
> SV = PVNV(0x1710550) at 0x1731b28
> REFCNT = 1
> FLAGS = (IOK,NOK,POK,pIOK,pNOK,pPOK)
> IV = 0
> NV = 0
> PV = 0x1720d80 "0e1"\0
> CUR = 3
> LEN = 16
>
> $ perl -MDevel::Peek -le'$s=" 10 "; 0+$s; print $s & ""; Dump($s);'
> 0
> SV = PVIV(0x1da3f60) at 0x1da9b28
> REFCNT = 1
> FLAGS = (IOK,POK,pIOK,pPOK)
> IV = 10
> PV = 0x1d98d80 " 10 "\0
> CUR = 4
> LEN = 16
>
> $ perl -MDevel::Peek -le'$s="000"; 0+$s; print $s & ""; Dump($s);'
> 0
> SV = PVIV(0x17c2f60) at 0x17c8b28
> REFCNT = 1
> FLAGS = (IOK,POK,pIOK,pPOK)
> IV = 0
> PV = 0x17b7d80 "000"\0
> CUR = 3
> LEN = 16
>
> I am sorry, I very very very much sympathise with your desire to tell
> numbers from strings, but you simply can't reliably use our flags to
> do it. At least currently.
>
> Our current flags do NOT provide a way to track the origin type of a
> variable. It is that simple. We have discussed how we could change the
> meaning of the flags so we /could/ track the origin type, but we have
> not done so, and any code that tries to do so is inherently flawed.
>
> cheers,
> Yves
>
> Another example, I think that the output of these two could
> legitimately change if we were to optimise things:
>
> $ perl -MDevel::Peek -le'$s=1; $s.""; print $s & ""; Dump($s);'
> 0
> SV = PVIV(0x217bf50) at 0x2181b18
> REFCNT = 1
> FLAGS = (IOK,POK,pIOK,pPOK)
> IV = 1
> PV = 0x2170d70 "1"\0
> CUR = 1
> LEN = 16
>
>
> $ perl -MDevel::Peek -le'$s=1; $s.=""; print $s & ""; Dump($s);'
>
> SV = PVIV(0x222cf50) at 0x2232b18
> REFCNT = 1
> FLAGS = (POK,pPOK)
> IV = 1
> PV = 0x2221d70 "1"\0
> CUR = 1
> LEN = 16
>
>
>
> --
> perl -Mre=debug -e "/just|another|perl|hacker/"
The actual check in use does not only check the flags. The full check is:
no warnings 'numeric';
if (length((my $dummy = '') & $value)
&& 0 + $value eq $value
&& $value * 0 == 0) {
...;
}
The dummy variable is a workaround for perl 5.8, where a "" & $value
operation would set flags on the "" string so the check would only
work once.
This is obviously not perfect, because perl's string model doesn't
carry the original type. That's one of the reasons I haven't pushed
for an API for this, because I know it has inherent flaws.
But we don't have the luxury of ignoring the number/string distinction
in JSON. Some way of controlling the types is required. The
mechanism I've come up with covers most cases of unintentional type
changes, allows for intentional type changes, and will round trip
properly.
The alternative to checking these flags with the bitwise ops is to
check them with B. There is no alternate option that involves not
checking the flags.
I plan to work around the original issue, so you are welcome to break
bitwise ops on wide characters if you really want to.
Thread Previous
|
Thread Next