develooper Front page | perl.beginners | Postings from February 2002

Re: Parsing a .csv file

Thread Previous
Dave Benware
February 11, 2002 18:45
Re: Parsing a .csv file
Message ID:
Steven Arbitman wrote:
> Hi all,
> I know parsing a comma-separated value file should be easy:
> @array = split /,/;  # just split the line on commas
> However, my input csv file looks like this:
> Name,"street,city,state,zip",phone,email,"comments, may have commas, 2"
> Note, not all fields have quotes, only those which contain commas have
> quotes.
> Even if I could get the input revised to split the address into several
> different fields (which I know would be a good idea), the comments remain a
> problem.
> I can solve the problem using the substr function to examine the incoming
> text char by char, replacing commas outside quotes with something else
> (tabs), and leaving commas inside quotes, then splitting the line on tabs:
>         $len = length ();
>         for ($in_quotes=$i=0; $i<$len; $i++) {
>                 if (substr($_,$i,1) eq "," and !$in_quotes) {
>                         substr($_,$i,1) = "\t";
>                 } elsif (substr($_,$i,1) eq '"') {
>                         substr($_,$i,1)= " ";
>                         if ($in_quotes) {$in_quotes = 0;}
>                         else {$in_quotes = 1;}
>                 }
>         }
>         @infields = split /\t/;
> This has got to be the slowest most inelegant way possible, but I don't see
> another.  Is there a better way?

Where does the input come from?

I wrote a csv flatfile database which has input from html forms.  In that
case, I translated ALL commas to a control character, (null, i think), such
as:  $value =~ s/,/\x00/g;.  I did this when originally parsing the STDIN.

Of course to parse the csv file, I have to translate those CTL characters
back to commas again *after* splitting the record.


Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About