develooper Front page | perl.perl5.porters | Postings from May 2012

[perl #33027] encode_entities, unicode and why   =~ /\s/ == false

From:
Brian Fraser via RT
Date:
May 24, 2012 18:41
Subject:
[perl #33027] encode_entities, unicode and why   =~ /\s/ == false
Message ID:
rt-3.6.HEAD-7788-1337910065-1570.33027-15-0@perl.org
On Mon Dec 13 06:23:27 2004, tieben wrote:
> 
> This is a bug report for perl from btietze@epo.org,
> generated with the help of perlbug 1.35 running under perl v5.8.5.
> 
> ./utest.pl
> #!/usr/bin/perl -w
> # Fr Dez  3 14:30:04 CET 2004
> # utest.pl
> # used for: demonstration of perl wierd behavior
> # 	why is the return value of decode_entities(" ") somehow
>    unicode but not unicode?
> # 	or why does /\s/ not recognize that magic char?
> use strict;
> use HTML::Entities;
> # main
> ###
> open THISFILE, $0; print "$0\n", <THISFILE>, "\n";
> my @cells;
> format STDOUT =
> | @<<<<<<<<< | @>>>>>> | @>>>>>> | @>>>>>> | @>>>>>> | @>>>>>> ||
>    @>>>>>> | @>>>>>> | @>>>>>> | @>>>>>> | @>>>>>>
> @cells
> .
> @cells = ("STRING", "utf8", "bool", "empty", "ws", "non ws", "utf8",
>    "bool", "empty", "ws", "non ws");
> write;
> foreach (
>     decode_entities("&nbsp"),
>     decode_entities("/&nbsp/"),
>     decode_entities(" "),
>     decode_entities(""),
>     "\n",
> ){
>     @cells = pt($_);
>     write;
> }
> # SUBS
> ###
> sub pt{
>     my $str = shift;
>     my @ret;
>     push @ret,	'"' . encode_entities($str) . '"',	# STRING
> 	utf8::is_utf8($str)	? "true": "false",	# utf8
> 		$str    	? "true": "false",	# bool
> 		$str=~/^$/	? "true": "false",	# empty
> 		$str=~/\s+/	? "true": "false",	# ws
> 		$str=~/\S+/	? "true": "false";	# non ws
>     utf8::upgrade($str);
>     push @ret,	utf8::is_utf8($str) ?  "true": "false",	#utf8
> 		$str    	? "true": "false",	# bool
> 		$str=~/^$/	? "true": "false",	# empty
> 		$str=~/\s+/	? "true": "false",	# ws
> 		$str=~/\S+/	? "true": "false";	# non ws
>     return @ret;
> }
> #	vim:    sw=4:ts=8:ai:
> 
> | STRING     |    utf8 |    bool |   empty |      ws |  non ws ||
>    utf8 |    bool |   empty |      ws |  non ws
> | "&nbsp;"   |   false |    true |   false |   false |    true ||
>    true |    true |   false |    true |   false
> | "/&nbsp;/" |   false |    true |   false |   false |    true ||
>    true |    true |   false |    true |    true
> | " "        |   false |    true |   false |    true |   false ||
>    true |    true |   false |    true |   false
> | ""         |   false |   false |    true |   false |   false ||
>    true |   false |    true |   false |   false
> | "          |   false |    true |    true |    true |   false ||
>    true |    true |    true |    true |   false
> ---
> Flags:
>     category=core
>     severity=medium
> ---
> This perlbug was built using Perl v5.8.5 - Fri Oct  1 23:29:33 UTC
>    2004
> It is being executed now by  Perl v5.8.5 - Fri Oct  1 23:24:00 UTC
>    2004.
> 
> Site configuration information for perl v5.8.5:
> 
> Configured by abuild at Fri Oct  1 23:24:00 UTC 2004.
> 
> Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
>   Platform:
>     osname=linux, osvers=2.6.8.1, archname=i586-linux-thread-multi
>     uname='linux g168 2.6.8.1 #1 smp thu jul 1 15:23:45 utc 2004 i686
>    i686 i386 gnulinux '
>     config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr
>    -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm
>    -Duseshrplib=true -Doptimize=-O2 -march=i586 -mcpu=i686
>    -fmessage-length=0 -Wall -Wall -pipe'
>     hint=recommended, useposix=true, d_sigaction=define
>     usethreads=define use5005threads=undef useithreads=define
>    usemultiplicity=define
>     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>     use64bitint=undef use64bitall=undef uselongdouble=undef
>     usemymalloc=n, bincompat5005=undef
>   Compiler:
>     cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
>    -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE
>    -D_FILE_OFFSET_BITS=64',
>     optimize='-O2 -march=i586 -mcpu=i686 -fmessage-length=0 -Wall
>    -Wall -pipe',
>     cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
>    -fno-strict-aliasing -pipe'
>     ccversion='', gccversion='3.3.4 (pre 3.3.5 20040809)',
>    gccosandvers=''
>     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>     d_longlong=define, longlongsize=8, d_longdbl=define,
>    longdblsize=12
>     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
>    lseeksize=8
>     alignbytes=4, prototype=define
>   Linker and Libraries:
>     ld='cc', ldflags =''
>     libpth=/lib /usr/lib /usr/local/lib
>     libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
>     perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
>     libc=, so=so, useshrplib=true, libperl=libperl.so
>     gnulibc_version='2.3.3'
>   Dynamic Linking:
>     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
>    -Wl,-rpath,/usr/lib/perl5/5.8.5/i586-linux-thread-multi/CORE'
>     cccdlflags='-fPIC', lddlflags='-shared'
> 
> Locally applied patches:
> 
> 
> ---
> @INC for perl v5.8.5:
>     /usr/lib/perl5/5.8.5/i586-linux-thread-multi
>     /usr/lib/perl5/5.8.5
>     /usr/lib/perl5/site_perl/5.8.5/i586-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.5
>     /usr/lib/perl5/site_perl
>     /usr/lib/perl5/vendor_perl/5.8.5/i586-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.5
>     /usr/lib/perl5/vendor_perl
>     .
> 
> ---
> Environment for perl v5.8.5:
>     HOME=/home/ben
>     LANG=de_DE.UTF-8
>     LANGUAGE (unset)
>     LD_LIBRARY_PATH (unset)
>     LOGDIR (unset)
>    
PATH=/home/ben/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/jvm/jre/bin
>     PERL_BADLANG (unset)
>     SHELL=/bin/bash

This is an instance of the Unicode Bug; in recent releases of Perl, it's
fixed by the unicode_strings feature (or the /u flag on regexen); So I'm
marking this as resolved.


---
via perlbug:  queue: perl5 status: new
https://rt.perl.org:443/rt3/Ticket/Display.html?id=33027



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About