develooper Front page | perl.perl5.porters | Postings from October 2016

Re: [perl #126310] no "Malformed UTF-8 character" warning onsingle-quoted strings under "use utf8"

Thread Previous
From:
Karl Williamson
Date:
October 13, 2016 18:49
Subject:
Re: [perl #126310] no "Malformed UTF-8 character" warning onsingle-quoted strings under "use utf8"
Message ID:
67259b20-7f35-8c76-63a6-c53ff713e750@khwilliamson.com
On 09/16/2016 04:44 PM, Father Chrysostomos via RT wrote:
> On Fri Sep 16 13:34:55 2016, khw wrote:
>> On 09/16/2016 06:46 AM, Florian Schlichting wrote:
>>> Hi Karl,
>>>
>>> Father Chrysostomos wrote:
>>>> On Wed Aug 31 20:35:02 2016, khw wrote:
>>>>> Is the attach3ed like what you mean?
>>>>
>>>> Yes, that would work.
>>>>
>>>> It would be nice, too, if we could add the `near such and such' that
>>>> yyerror normally does. Maybe yyerror could have an extra option to
>>>> croak
>>>> instead of calling qerror. It already has a flags field.
>>>
>>> thanks for looking into this issue. I tested your patch and can
>>> confirm
>>> that it correctly treats single and double quotes the same:
>>>
>>> % ./perl -C0 -le 'print qq(print "\xB0C";)' | ./perl -I'lib' -Mutf8
>>> -CS % -l
>>> Malformed utf8 at - line 1.
>>>
>>> % ./perl -C0 -le 'print qq(print \x27\xB0C\x27;)' | ./perl -I'lib'
>>> -Mutf8 -CS -l
>>> Malformed utf8 at - line 1.
>>>
>>>
>>> However, I feel a little uneasy about dying altogether. Currently
>>> Perl
>>> issues just a warning ("Malformed UTF-8 character") and that seems to
>>> be
>>> the approach with UTF-8 issues encountered in other places in toke.c
>>> as
>>> well. Most of the time, these will be strings displayed to the user,
>>> and
>>> they will mostly still be legible even with a few characters garbled
>>> or
>>> skipped. Don't you think "complain and carry on" is what users would
>>> expect?
>>>
>>> Florian
>>>
>>>
>>
>> But we are running into segfaults because of trying to keep going in
>> the
>> face of malformed UTF-8.  I'm thinking the lesson should be to give up
>> when we find it, and this is a reasonable place to start.  There are
>> places where malformed UTF-8 is fatal.
>
> I agree.  If perl keeps going, then even if it does not crash, it will die on those malformed strings later.
>

blead now has improved diagnostics for when malformations occur.  I am 
thinking that these should be turned on unconditionally when this error 
occurs, as we are going to immediately die anyway  Any opposition?

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About