develooper Front page | perl.perl5.porters | Postings from October 2016

Re: [perl #126310] no "Malformed UTF-8 character" warning onsingle-quoted strings under "use utf8"

Thread Previous
Karl Williamson
October 13, 2016 18:49
Re: [perl #126310] no "Malformed UTF-8 character" warning onsingle-quoted strings under "use utf8"
Message ID:
On 09/16/2016 04:44 PM, Father Chrysostomos via RT wrote:
> On Fri Sep 16 13:34:55 2016, khw wrote:
>> On 09/16/2016 06:46 AM, Florian Schlichting wrote:
>>> Hi Karl,
>>> Father Chrysostomos wrote:
>>>> On Wed Aug 31 20:35:02 2016, khw wrote:
>>>>> Is the attach3ed like what you mean?
>>>> Yes, that would work.
>>>> It would be nice, too, if we could add the `near such and such' that
>>>> yyerror normally does. Maybe yyerror could have an extra option to
>>>> croak
>>>> instead of calling qerror. It already has a flags field.
>>> thanks for looking into this issue. I tested your patch and can
>>> confirm
>>> that it correctly treats single and double quotes the same:
>>> % ./perl -C0 -le 'print qq(print "\xB0C";)' | ./perl -I'lib' -Mutf8
>>> -CS % -l
>>> Malformed utf8 at - line 1.
>>> % ./perl -C0 -le 'print qq(print \x27\xB0C\x27;)' | ./perl -I'lib'
>>> -Mutf8 -CS -l
>>> Malformed utf8 at - line 1.
>>> However, I feel a little uneasy about dying altogether. Currently
>>> Perl
>>> issues just a warning ("Malformed UTF-8 character") and that seems to
>>> be
>>> the approach with UTF-8 issues encountered in other places in toke.c
>>> as
>>> well. Most of the time, these will be strings displayed to the user,
>>> and
>>> they will mostly still be legible even with a few characters garbled
>>> or
>>> skipped. Don't you think "complain and carry on" is what users would
>>> expect?
>>> Florian
>> But we are running into segfaults because of trying to keep going in
>> the
>> face of malformed UTF-8.  I'm thinking the lesson should be to give up
>> when we find it, and this is a reasonable place to start.  There are
>> places where malformed UTF-8 is fatal.
> I agree.  If perl keeps going, then even if it does not crash, it will die on those malformed strings later.

blead now has improved diagnostics for when malformations occur.  I am 
thinking that these should be turned on unconditionally when this error 
occurs, as we are going to immediately die anyway  Any opposition?

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About