develooper Front page | perl.perl5.porters | Postings from June 2002

Re: Another Unicode s/// buglet?

Thread Previous | Thread Next
Jarkko Hietaniemi
June 26, 2002 09:57
Re: Another Unicode s/// buglet?
Message ID:
On Wed, Jun 26, 2002 at 05:43:07PM +0100, Hugo van der Sanden wrote:
> SADAHIRO Tomoyuki <> wrote:
> :With Perl 5.8.0 RC2 (or plus Change 17353),
> :there is something strange.
> :
> :In $unicode =~ s/$regex/$bytes/,
> :$bytes is not upgraded,
> :and a malformed Unicode string is generated.
> :
> :$unicode =~ s/$regex/$bytes/e is ok, though.
> As far as I can tell, this is missing code rather than buggy code:
> coping with a non-utf8 replacement string does not seem to have
> been catered for in this class of cases.
> Attached patch passes all existing tests here, as well as some new ones.

Patches passing over the Atlantic... I already patched this with
#17358 (and plugging a leak with #17362).  But I gladly took your new
tests :-)

> Due to the current RC status, I've taken the simplest approach I could
> see, but there may be higher performance alternatives: the upgrade is
> done regardless of whether the replacement string is ever needed, and
> since it is not done in place, the upgrade will be repeated each time
> it is needed. That means if you expect to perform the same substitution
> on many utf8 strings, it would probably be faster if you ensure that
> the replacement string is utf8.

> +	    SV* sv = sv_newmortal();
> +	    SvSetMagicSV(sv, dstr);

Hmmm, I don't do anything special with magic.

> +	    sv_utf8_upgrade(sv);
> +	    c = SvPV(sv, clen);
> +	    doutf8 = TRUE;
> +	}

$jhi++; #
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About