Front page | perl.perl5.porters |
Postings from January 2004
[perl #25407] Erroneous change of text encoding by Parser
Thread Next
From:
Himanshu Garg
Date:
January 29, 2004 19:27
Subject:
[perl #25407] Erroneous change of text encoding by Parser
Message ID:
rt-3.0.8-25407-72404.8.8223640782563@perl.org
# New Ticket Created by Himanshu Garg
# Please include the string: [perl #25407]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=25407 >
To: perlbug@perl.org
Subject: Erroneous change of text encoding by HTML::Parser
Reply-To: himanshu@students.iiit.net
Message-Id: <5.8.0_3176_1075195949@anu127>
This is a bug report for perl from himanshu@students.iiit.net,
generated with the help of perlbug 1.34 running under perl v5.8.0.
The following program extracts (Arabic UTF-8 encoded) text from a string.
#################################################################
use HTML::Parser;
# set standard input, output error to utf8
binmode(STDOUT, ":utf8");
# Create parser object
my $p = HTML::Parser->new( api_version => 3, text_h => [\&text, "text"] );
$p->parse( "<html> <body> þçéóêçà</body> </html>");
sub text
{
my ($txt) = @_;
print $txt;
}
#################################################################
However it incorrectly changes the string encoding when outputting text. As a
result the string that was originally legible in the browser ( Arabic supported )
becomes illegible.
I am unable to pinpoint the source, because apparently the following file:-
/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/HTML/Parser.pm
does not contain any Parsing statements.
Thank You
Himanshu.
---
Flags:
category=library
severity=medium
---
Site configuration information for perl v5.8.0:
Configured by bhcompile at Sun Sep 1 23:55:07 EDT 2002.
Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
Platform:
osname=linux, osvers=2.4.18-11smp, archname=i386-linux-thread-multi
uname='linux daffy.perf.redhat.com 2.4.18-11smp #1 smp thu aug 15 06:41:59 edt 2002 i686 i686 i386 gnulinux '
config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
optimize='-O2 -march=i386 -mcpu=i686',
cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/include/gdbm'
ccversion='', gccversion='3.2 20020822 (Red Hat Linux Rawhide 3.2-5)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt -lutil
perllibs=-lnsl -ldl -lm -lpthread -lc -lcrypt -lutil
libc=/lib/libc-2.2.92.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.2.92'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.0:
/usr/lib/perl5/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/5.8.0
/usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.0
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.0
/usr/lib/perl5/vendor_perl
.
---
Environment for perl v5.8.0:
HOME=/home/guest
LANG=en_US.UTF-8
LANGUAGE (unset)
LC_ALL=en_US.UTF-8
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/guest/bin:/usr/java/j2sdk1.4.0_04/bin
PERL_BADLANG (unset)
SHELL=/bin/bash
Thread Next
-
[perl #25407] Erroneous change of text encoding by Parser
by Himanshu Garg