develooper Front page | perl.perl5.porters | Postings from December 2000

UTF8 flag and sv_utf8_upgrade

From:
Nick Ing-Simmons
Date:
December 12, 2000 04:30
Subject:
UTF8 flag and sv_utf8_upgrade
Message ID:
200012121230.MAA11096@mikado.tiuk.ti.com
Nick Ing-Simmons <nik@tiuk.ti.com> writes:
>B. When I fix A in Tk sources it core dumps in a manner which suggests
>   something has done heap-overrun.

Debugging this I discovered a "feature" of sv_utf8_upgrade:

void
Perl_sv_utf8_upgrade(pTHX_ register SV *sv)
{
    char *s, *t;
    bool hibit;

    if (!sv || !SvPOK(sv) || SvUTF8(sv))
	return;

    /* This function could be much more efficient if we had a FLAG in SVs
     * to signal if there are any hibit chars in the PV.
     */
    for (s = t = SvPVX(sv), hibit = FALSE; t < SvEND(sv) && !hibit; t++)
	if (*t & 0x80)
	    hibit = TRUE;

    if (hibit) {
	STRLEN len = SvCUR(sv) + 1; /* Plus the \0 */
	SvPVX(sv) = (char*)bytes_to_utf8((U8*)s, &len);
	SvCUR(sv) = len - 1;
	SvLEN(sv) = len; /* No longer know the real size. */
	SvUTF8_on(sv);
	Safefree(s); /* No longer using what was there before. */
    }
}

Tk wants UTF8 so every time it sees an SV without SvUTF8 it calls
upgrade. Code above then scans it sees it is all ASCII and exits.
A few C statements later we do it again, and again, and ...

Is there any reason NOT to turn on SvUTF8 once we have established 
that it is valid UTF8 - even if only because it has no high bit chars?
Should perl do this or should Tk do it ?



-- 
Nick Ing-Simmons <nik@tiuk.ti.com>
Via, but not speaking for: Texas Instruments Ltd.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About