develooper Front page | perl.perl5.porters | Postings from June 2013

Re: [perl #113824] Regexp error messages are not UTF8-clean

Thread Previous | Thread Next
From:
demerphq
Date:
June 18, 2013 10:11
Subject:
Re: [perl #113824] Regexp error messages are not UTF8-clean
Message ID:
CANgJU+X7eL1njtVi2j2i2sGQkSb9=OK-R1bZKW0tkG7xs8e0pQ@mail.gmail.com
On 18 June 2013 12:06, Leon Timmermans <fawaka@gmail.com> wrote:
> On Tue, Jun 18, 2013 at 11:53 AM, demerphq <demerphq@gmail.com> wrote:
>> This is a Perl API fail. I do not see how it can be fixed without
>> grevious trauma. Apparently much of our internal error message
>> handling code is not UTF8 safe.
>>
>> See the code for vFAIL() in regcomp.c which calls Perl_croak() which
>> calls vcroak().
>>
>> The interface for Perl_croak() and friends do not support UTF8 at all.
>> They accept only a char* pointer, and have no facility for a UTF8
>> flag.
>>
>> We could fix the direct problem by rewriting all the code in the regex
>> engine which uses UTF8, but imo that is just a bandage. The real
>> problem is our core API's were never modernized to work properly with
>> Unicode.
>>
>> IMO, this ticket should be closed as a "won't fix", or merged with a
>> ticket which relates to our internal error reporting API's lacking
>> proper Unicode support and fixed as part of resolving THAT ticket.
>>
>> Also IMO, if we want to really fix this stuff we should just bite the
>> bullet, deprecate ALL of the char * only internal API's and switch to
>> something that ALWAYS includes a utf8 flag. Across the board.
>
> You can croak an SV actually. It's ugly (setting $@ to the error and
> then croaking NULL), but possible.

But that is a bandage. This is a pervasive problem, and IMO should be
addressed systematically not on a per-incident basis.

cheers
Yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About