On 23 September 2008 18:03, Dave Mitchell <davem@iabyn.com> wrote: > On Mon, Sep 22, 2008 at 09:55:23PM +0200, Juerd Waalboer wrote: >> It's a bug. A known and old bug, but it must be fixed some time. > > Here's a general suggestion related to fixing Unicode-related issues. > > A well-known issue is that the SVf_UTF8 flag means two different things: > > 1) whether the 'sequence of integers' are stored one per byte, or use > the variable-length utf-8 encoding scheme; > > 2) what semantics apply to that sequence of integers. > > We also have various bodges, such as attaching magic to cache utf8 > indexes. > > All this stems from the fact that there's no space in an SV to store all > the information we want. So.... > > How about we remove the SVf_UTF8 flag from SvFLAGS and replace it with an > Extended String flag. This flag indicates that prepended to the SvPVX > string is an auxilliary structure (cf the hv_aux struct) that contains all the > extra needed unicodish info, such as encoding, charset, locale, cached > indexes etc etc. This then both allows us to disambiguate the meaning of > SVf_UTF8 (in the aux structure there would be two different flags for the > two meanings), but would also provide room for future enhancements (eg > space for a UTF32 flag should someone wish to implement that storage > format). > > Just a thought... ++ yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next