develooper Front page | perl.perl5.porters | Postings from December 2010

Re: [perl #58182] Inconsistent and wrong handling of 8th bit setchars with no locale

Thread Previous | Thread Next
December 2, 2010 06:34
Re: [perl #58182] Inconsistent and wrong handling of 8th bit setchars with no locale
Message ID:
On 23 September 2008 18:03, Dave Mitchell <> wrote:
> On Mon, Sep 22, 2008 at 09:55:23PM +0200, Juerd Waalboer wrote:
>> It's a bug. A known and old bug, but it must be fixed some time.
> Here's a general suggestion related to fixing Unicode-related issues.
> A well-known issue is that the SVf_UTF8 flag means two different things:
>    1) whether the 'sequence of integers' are stored one per byte, or use
>    the variable-length utf-8 encoding scheme;
>    2) what semantics apply to that sequence of integers.
> We also have various bodges, such as attaching magic to cache utf8
> indexes.
> All this stems from the fact that there's no space in an SV to store all
> the information we want. So....
> How about we remove the SVf_UTF8 flag from SvFLAGS and replace it with an
> Extended String flag. This flag indicates that prepended to the SvPVX
> string is an auxilliary structure (cf the hv_aux struct) that contains all the
> extra needed unicodish info, such as encoding, charset, locale, cached
> indexes etc etc. This then both allows us to disambiguate the meaning of
> SVf_UTF8 (in the aux structure there would be two different flags for the
> two meanings), but would also provide room for future enhancements (eg
> space for a UTF32 flag should someone wish to implement that storage
> format).
> Just a thought...


perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About