develooper Front page | perl.beginners | Postings from May 2008

Re: parsing CSV files with control and extended ASCII characters

Thread Previous | Thread Next
From:
David Newman
Date:
May 6, 2008 07:49
Subject:
Re: parsing CSV files with control and extended ASCII characters
On 3/20/08 5:05 PM, Gunnar Hjalmarsson wrote:
> David Newman wrote:
>> I have some CSV input files that contain control and extended ASCII 
>> characters,
> 
> <snip>
> 
>> The Text::CSV or Tie::Handle::CSV modules don't like these characters; 
>> the snippets below both return errors when they get to one.
> 
> <snip>
> 
>> my $csv = Text::CSV->new();
> 
> In the docs for Text::CSV, that way of creating a new object is 
> mentioned at the top of the SYNOPSIS section. The solution to your 
> problem is stated right after that.
> 
> So, the usual recommendation:
> 
> "Read the docs for the module you are using."
> 
> is very much applicable. ;-)

<time passes, seasons change, children grow up>

OK, thanks for this polite RTFM.

However, it doesn't answer the root question, namely how to parse text 
that contains Western European characters such as accents and umlauts.

I see from the Text::CSV documentation that this module handles only 
characters between 0x20 and 0x7e. I also see there is a binary mode for 
any character, but the documentation does not describe whether the 
module parses binary-mode characters the same way as ASCII characters.

This seems like a fairly standard problem. What's the "right" way (or, 
given perl culture, "a" way) to handle text outside the 0x20 to 0x7e range?

Many thanks!

dn




Thread Previous | Thread Next


Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About