develooper Front page | perl.libwww | Postings from January 2006

LWP and UTF-8

From:
Mattias Holmlund
Date:
January 3, 2006 10:35
Subject:
LWP and UTF-8
Message ID:
43BAC3F4.1080507@holmlund.se
Hi,

I am trying to use WWW::Mechanize against a site that is UTF-8 encoded. 
It seems to work ok, but I get error-messages about UTF-8 on STDERR. If 
I use LWP::Simple to download the page, the same message appears, so I 
assume that the problem is in LWP.

A simple test-program that shows the behaviour:
#!/usr/bin/perl -w
use strict;
use LWP::Simple qw/get/;
my $data = get( "https://www.trangselskatt.vv.se/cts/open/loginPrompt.do" );

This program prints the following message to STDERR:

Parsing of undecoded UTF-8 will give garbage when decoding entities at 
/usr/share/perl5/LWP/Protocol.pm line 114.

Using WWW::Mechanize to access the site gives the same error-message 
from several different source-files.

The site works just fine in Firefox and it includes a 
charset-specification in the Content-Type:
mattias@rob:~/development/trangsel$ wget -S 
"https://www.trangselskatt.vv.se/cts/open/loginPrompt.do"
--19:29:39--  https://www.trangselskatt.vv.se/cts/open/loginPrompt.do
           => `loginPrompt.do.1'
Resolving www.trangselskatt.vv.se... 129.35.37.5
Connecting to www.trangselskatt.vv.se|129.35.37.5|:443... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Tue, 03 Jan 2006 18:29:26 GMT
  Server: IBM_HTTP_Server
  Connection: close
  Content-Type: text/html; charset=UTF-8
  Content-Language: sv-SE
Length: unspecified [text/html]

    [ <=>                                 ] 5,302         --.--K/s

19:29:40 (82.63 KB/s) - `loginPrompt.do.1' saved [5302]

How can I avoid this error-message, both in LWP and in WWW::Mechanize? I 
am running perl 5.8.7, LWP 5.803 and WWW::Mechanize 1.12 on Debian testing.

/Mattias






nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About