Front page | perl.perl5.porters |
Postings from May 2003
[perl #22261] Unrecognised BOM when reading a file larger than 1k with encoding(UTF-16)
Thread Next
From:
Jeremy Devenport
Date:
May 21, 2003 08:21
Subject:
[perl #22261] Unrecognised BOM when reading a file larger than 1k with encoding(UTF-16)
Message ID:
rt-22261-57895.19.9802527015977@bugs6.perl.org
# New Ticket Created by Jeremy Devenport
# Please include the string: [perl #22261]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt2/Ticket/Display.html?id=22261 >
This is a bug report for perl from jeremyd713@hotmail.com,
generated with the help of perlbug 1.34 running under perl v5.8.0.
-----------------------------------------------------------------
[Please enter your report here]
The following code fails with perl 5.8.0:
# This will succeed until input.txt is >1k
open IN, "<:raw:encoding(utf16)", "input.txt";
while (<IN>) {
# do nothing
}
close IN;
UTF-16:Unregognised BOM 4f00 at 23924.pl line 3, <IN> line 27.
The typo (Unregognised) is fixed in 5.8.x but the error still hits.
This bug makes it tricky to work with UTF-16 files (the predominent flavor
of unicode on Windows).
Changing the :encoding from UTF-16 to UTF-16LE will make the error go away
but then the BOM will actually show up in the text.
It looks like none of the current tests catch this because none of them
store more than one buffer worth of data in their test files. The test below
demonstrates the bug on my system (not sure if it's written correctly for BE
or EBCDIC systems), note that it only fails if $count is set to 512 or
greater (causing the file to be larger than 1k).
#!./perl -w
BEGIN {
if ($ENV{'PERL_CORE'}){
chdir 't';
unshift @INC, '../lib';
}
unless (find PerlIO::Layer 'perlio') {
print "1..0 # Skip: not perlio\n";
exit 0;
}
}
print "1..4\n";
my $utf16 = "utf16$$";
my $utf8 = "utf8$$";
my $count = 512;
# write a BOM and then $count UTF-16 'A' characters
if (open(UTF, ">$utf16")) {
binmode(UTF, ":bytes");
print UTF "\xff\xfe" . ("\x41\x00" x $count);
close UTF or die "Could not close: $!";
}
{
use Encode;
open(my $i,'<:encoding(UTF-16)',$utf16);
print "ok 1\n";
open(my $o,'>:utf8',$utf8);
print "ok 2\n";
print $o readline($i);
print "ok 3\n";
close($o) or die "Could not close: $!";
close($i);
}
if (open(UTF, "<$utf8")) {
binmode(UTF, ":bytes");
print "not " unless <UTF> eq 'A' x $count;
print "ok 4\n";
close UTF;
}
END {
unlink($utf16, $utf8);
}
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=high
---
Site configuration information for perl v5.8.0:
Configured by jeremyd at Mon May 19 22:33:21 PDT 2003.
Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
Platform:
osname=openbsd, osvers=3.2, archname=OpenBSD.i386-openbsd
uname='openbsd badger.internal 3.2 badger#1 i386 '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-fno-strict-aliasing -I/usr/local/include',
optimize='-O2',
cppflags='-fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='2.95.3 20010125 (prerelease)',
gccosandvers='openbsd3.2'
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib
libs=-lgdbm -lm -lc -lutil
perllibs=-lm -lc -lutil
libc=/usr/lib/libc.so.28.5, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=define, ccdlflags=' '
cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.0:
/home/jeremyd/myperl/lib/5.8.0/OpenBSD.i386-openbsd
/home/jeremyd/myperl/lib/5.8.0
/home/jeremyd/myperl/lib/site_perl/5.8.0/OpenBSD.i386-openbsd
/home/jeremyd/myperl/lib/site_perl/5.8.0
/home/jeremyd/myperl/lib/site_perl
.
---
Environment for perl v5.8.0:
HOME=/home/jeremyd
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/jeremyd/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/games:.
PERL_BADLANG (unset)
SHELL=/usr/local/bin/bash
_________________________________________________________________
Add photos to your messages with MSN 8. Get 2 months FREE*.
http://join.msn.com/?page=features/featuredemail
Thread Next
-
[perl #22261] Unrecognised BOM when reading a file larger than 1k with encoding(UTF-16)
by Jeremy Devenport