develooper Front page | perl.libwww | Postings from January 2006

LWP and UTF-8

Mattias Holmlund
January 3, 2006 10:35
LWP and UTF-8
Message ID:

I am trying to use WWW::Mechanize against a site that is UTF-8 encoded. 
It seems to work ok, but I get error-messages about UTF-8 on STDERR. If 
I use LWP::Simple to download the page, the same message appears, so I 
assume that the problem is in LWP.

A simple test-program that shows the behaviour:
#!/usr/bin/perl -w
use strict;
use LWP::Simple qw/get/;
my $data = get( "" );

This program prints the following message to STDERR:

Parsing of undecoded UTF-8 will give garbage when decoding entities at 
/usr/share/perl5/LWP/ line 114.

Using WWW::Mechanize to access the site gives the same error-message 
from several different source-files.

The site works just fine in Firefox and it includes a 
charset-specification in the Content-Type:
mattias@rob:~/development/trangsel$ wget -S 
           => `'
Connecting to||:443... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Tue, 03 Jan 2006 18:29:26 GMT
  Server: IBM_HTTP_Server
  Connection: close
  Content-Type: text/html; charset=UTF-8
  Content-Language: sv-SE
Length: unspecified [text/html]

    [ <=>                                 ] 5,302         --.--K/s

19:29:40 (82.63 KB/s) - `' saved [5302]

How can I avoid this error-message, both in LWP and in WWW::Mechanize? I 
am running perl 5.8.7, LWP 5.803 and WWW::Mechanize 1.12 on Debian testing.

/Mattias Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About