develooper Front page | perl.perl5.porters | Postings from March 2003

Re: [perl #21395] rcatline doesn't grok utf8

Thread Next
Nicholas Clark
March 1, 2003 10:24
Re: [perl #21395] rcatline doesn't grok utf8
Message ID:
On Sat, Mar 01, 2003 at 05:45:19AM -0000, Adrian Enache wrote:
> On Sat, Mar 01, 2003 at 11:21:34AM +0900, Inaba Hiroto wrote:
> > Enache Adrian wrote:
> > > Sorry for the hasty patch. I just skipped the scalar-utf8/file-utf8 case.
> s/scalar-utf8/scalar-non_utf8/

> The only thing I see is: create a mortal SV, do all the sv_gets() job
> on it as it were the actual SV to append to, sv_utf8_upgrade() it and
> then concatenate the real & the mortal SV's. That's horrible :-(

I think that there are 4 basic combinations (obviously)

			non-utf8	utf8
scalar	non-utf8           1		  2
	utf8		   3		  4

but these can be subdivided depending on whether the scalar contains
characters in the range 128-255 and 256+
eg for maximum efficiency if the file handle is non-utf8, but the scalar is
marked as utf8, then check to see whether it can be downgraded to 8 bit
might be worthwhile to code, because that saves having to upgrade all the
incoming data. However, there are slight internal differences between the
same data stored as 8 bit and utf8 (specifically whether accented characters
count as letters in regexps) so it may not be a good idea to do this.

Whatever we do, remember that 5.6.x and earlier didn't even use rcatline, so
they always read into a new scalar then appended.

Nicholas Clark

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About