Dan Kogai wrote: > Porters, > > In the recent discussion in various perl-related MLs in Japanese, I > have discovered a problem that the encoding pragma does not work on > such multibyte encodings as Shift_JIS which uses 0x00-0x7f ranges in > the 2nd byte. Though not test I am pretty sure big5 is also prone to > this. <Skip sample script hex dump> > The perl script is a valid perl script in Shift JIS but the quoted > character (U+80fd, \x94\x5c in Shift_JIS) uses \x5c in the 2nd byte, > mangling the script. The encoding pragma needs to be parsable > ASCII-wise. > Fortunately, the encoding pragma offers a different approach via > Filter=>1. ... Attached patch(for breadperl @18609) is an attempt to fix this problem without Filter=>1 option. It does: - Modify method_decode (Encode/Encode.xs) and do_encode (Encode/encengine.c) to take terminator argument - Add a method cat_decode to Encoding object which take destination, source, offset and terminator as arguments. (Implemented packages are: Encode::XS, Encode::utf8 and Encode::JP::JIS7) - Add a function sv_cat_decode() to append decoded UTF8 string with offset and terminator using method cat_decode. - When scan_str() parses input with PL_encoding, use sv_cat_decode() with PL_linestr+ offset and specified terminator. - In fact, I have started to make this patch for Subject: Re: [PATCH] [perl #16823] quote-operators don't work with utf8-delimiters Date: Sun, 1 Dec 2002 18:01:51 +0200 From:Jarkko Hietaniemi <jhi@iki.fi> So parsing under `use utf8' is also changed in scan_str(). Though not concerns the main intent, modifies sv_recode_to_utf8() to - Change !DO_UTF8(sv) to !Sv_UTF8(sv) && !IN_BYTES - Add save_re_context() - Retract my useless code which checks UTF8_IS_INVARIANT -- Inaba Hiroto <inaba@st.rim.or.jp>Thread Previous | Thread Next