On 09/16/2016 06:46 AM, Florian Schlichting wrote: > Hi Karl, > > Father Chrysostomos wrote: >> On Wed Aug 31 20:35:02 2016, khw wrote: >>> Is the attach3ed like what you mean? >> >> Yes, that would work. >> >> It would be nice, too, if we could add the `near such and such' that >> yyerror normally does. Maybe yyerror could have an extra option to croak >> instead of calling qerror. It already has a flags field. > > thanks for looking into this issue. I tested your patch and can confirm > that it correctly treats single and double quotes the same: > > % ./perl -C0 -le 'print qq(print "\xB0C";)' | ./perl -I'lib' -Mutf8 -CS % -l > Malformed utf8 at - line 1. > > % ./perl -C0 -le 'print qq(print \x27\xB0C\x27;)' | ./perl -I'lib' -Mutf8 -CS -l > Malformed utf8 at - line 1. > > > However, I feel a little uneasy about dying altogether. Currently Perl > issues just a warning ("Malformed UTF-8 character") and that seems to be > the approach with UTF-8 issues encountered in other places in toke.c as > well. Most of the time, these will be strings displayed to the user, and > they will mostly still be legible even with a few characters garbled or > skipped. Don't you think "complain and carry on" is what users would > expect? > > Florian > > But we are running into segfaults because of trying to keep going in the face of malformed UTF-8. I'm thinking the lesson should be to give up when we find it, and this is a reasonable place to start. There are places where malformed UTF-8 is fatal.Thread Previous | Thread Next