develooper Front page | perl.perl5.porters | Postings from June 2001

[ID 20010628.002] uc (and lc) of same character differs if it isutf8 encoded

Thread Next
Nicholas Clark
June 28, 2001 06:11
[ID 20010628.002] uc (and lc) of same character differs if it isutf8 encoded
Message ID:
This is a bug report for perl from,
generated with the help of perlbug 1.33 running under perl v5.7.1.

[Please enter your report here]

I'm assuming it's a bug that uc() for accented characters in the range 196-255
differs depending on whether they happen to be UTF8 encoded or not.
I shouldn't be able to detect the internal state of UTF8 encoding in any
way from a perl script, should I?

The difference is certainly present in 5.6.1, and I assume is in everything
post 5.005.

Is this the suggested way to supply a "test case" with bug reports?

On 5.6.1 and bleadperl the following give ok, not ok.
(ie perl reports that the first two scalars are equal, yet uc() gives different

5.005_03 reports ok,ok; but uc doesn't change either lower case character,
as 5.005_03 isn't assuming that they are e accutes.

I would expect that Unicode aware perl should give ok,ok, but I'm not sure how
this is reconciled with the desire to have uc() give the same backwards
compatible result as 5.005_03.

#!/usr/local/bin/perl -w

  my ($e_accute_utf) = my ($e_accute) = chr 0xE9;
  $e_accute_utf .= chr 300;
  chop $e_accute_utf;
  my $E_accute = uc $e_accute;
  my $E_accute_utf = uc $e_accute_utf;

  if ($e_accute_utf eq $e_accute) {
    print "ok\n";
  } else {
    print "not ok # '$e_accute_utf' ne '$e_accute'\n";
  if ($E_accute_utf eq $E_accute) {
    print "ok # '$E_accute_utf' eq '$E_accute'\n";
  } else {
    print "not ok # '$E_accute_utf' ne '$E_accute'\n";

[Please do not change anything below this line]
Site configuration information for perl v5.7.1:

Configured by nclark at Thu Jun 28 09:57:50 BST 2001.

Summary of my perl5 (revision 5.0 version 7 subversion 17) configuration:
    osname=linux, osvers=2.2.19pre17, archname=i686-linux
    uname='linux nclark 2.2.19pre17 #2 wed may 2 13:59:30 gmt 2001 i686 unknown '
    config_args='-Dusedevel -Ubincompat5005 -Uinc_version_list -Uversiononly -Uuselongdouble -Uuse64bitint -de -Dcc=gcc-3.0'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    cc='gcc-3.0', ccflags ='-Wall -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    cppflags='-Wall -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.0 20010402 (Debian prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='gcc-3.0', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldbm -ldb -ldl -lm -lc -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil
    libc=/lib/, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:

@INC for perl v5.7.1:

Environment for perl v5.7.1:
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About