Front page | perl.perl5.porters |
Postings from June 2001
[ID 20010628.002] uc (and lc) of same character differs if it isutf8 encoded
Thread Next
From:
Nicholas Clark
Date:
June 28, 2001 06:11
Subject:
[ID 20010628.002] uc (and lc) of same character differs if it isutf8 encoded
Message ID:
20010628141137.P59620@plum.flirble.org
This is a bug report for perl from nick@talking.bollo.cx,
generated with the help of perlbug 1.33 running under perl v5.7.1.
-----------------------------------------------------------------
[Please enter your report here]
I'm assuming it's a bug that uc() for accented characters in the range 196-255
differs depending on whether they happen to be UTF8 encoded or not.
I shouldn't be able to detect the internal state of UTF8 encoding in any
way from a perl script, should I?
The difference is certainly present in 5.6.1, and I assume is in everything
post 5.005.
Is this the suggested way to supply a "test case" with bug reports?
On 5.6.1 and bleadperl the following give ok, not ok.
(ie perl reports that the first two scalars are equal, yet uc() gives different
results)
5.005_03 reports ok,ok; but uc doesn't change either lower case character,
as 5.005_03 isn't assuming that they are e accutes.
I would expect that Unicode aware perl should give ok,ok, but I'm not sure how
this is reconciled with the desire to have uc() give the same backwards
compatible result as 5.005_03.
#!/usr/local/bin/perl -w
{
my ($e_accute_utf) = my ($e_accute) = chr 0xE9;
$e_accute_utf .= chr 300;
chop $e_accute_utf;
my $E_accute = uc $e_accute;
my $E_accute_utf = uc $e_accute_utf;
if ($e_accute_utf eq $e_accute) {
print "ok\n";
} else {
print "not ok # '$e_accute_utf' ne '$e_accute'\n";
}
if ($E_accute_utf eq $E_accute) {
print "ok # '$E_accute_utf' eq '$E_accute'\n";
} else {
print "not ok # '$E_accute_utf' ne '$E_accute'\n";
}
}
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
Site configuration information for perl v5.7.1:
Configured by nclark at Thu Jun 28 09:57:50 BST 2001.
Summary of my perl5 (revision 5.0 version 7 subversion 17) configuration:
Platform:
osname=linux, osvers=2.2.19pre17, archname=i686-linux
uname='linux nclark 2.2.19pre17 #2 wed may 2 13:59:30 gmt 2001 i686 unknown '
config_args='-Dusedevel -Dcf_email=nick@talking.bollo.cx -Ubincompat5005 -Uinc_version_list -Uversiononly -Uuselongdouble -Uuse64bitint -de -Dcc=gcc-3.0'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
Compiler:
cc='gcc-3.0', ccflags ='-Wall -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2',
cppflags='-Wall -fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='3.0 20010402 (Debian prerelease)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, usemymalloc=n, prototype=define
Linker and Libraries:
ld='gcc-3.0', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lgdbm -ldbm -ldb -ldl -lm -lc -lcrypt -lutil
perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil
libc=/lib/libc-2.2.3.so, so=so, useshrplib=false, libperl=libperl.a
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
DEVEL10995
---
@INC for perl v5.7.1:
/usr/local/lib/perl5/5.7.1/i686-linux
/usr/local/lib/perl5/5.7.1
/usr/local/lib/perl5/site_perl/5.7.1/i686-linux
/usr/local/lib/perl5/site_perl/5.7.1
/usr/local/lib/perl5/site_perl
.
---
Environment for perl v5.7.1:
HOME=/home/nclark
LANG=C
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/nclark/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11/bin:/usr/bin/X11:/usr/contrib/bin:/usr/games:/usr/sbin:/usr/ucb:/sbin:/usr/etc:/data3/src/emacs/bin/i386-unknown-bsdi2.1/
PERL_BADLANG (unset)
SHELL=/bin/bash
Thread Next
-
[ID 20010628.002] uc (and lc) of same character differs if it isutf8 encoded
by Nicholas Clark