develooper Front page | perl.beginners | Postings from February 2002

RE: Parsing a .csv file

Thread Previous | Thread Next
From:
Timothy Johnson
Date:
February 11, 2002 18:53
Subject:
RE: Parsing a .csv file
Message ID:
C0FD5BECE2F0C84EAA97D7300A500D5002580F3B@SMILEY

This is probably not the best way either, but here's how I handled a similar
situation.

open(INFILE,"myfile.csv");
while(<INFILE>){
     chomp $_;
     my @newarray;
     my @array = split /\"/,$_;  #First split by quotes
     for(my $i = 0;$array[$i];$i++){
          if(!($i % 2)){         #For even numbers split further by commas
               push @newarray,(split /,/,$array[$i]);
          }else{                 #For odd numbers leave intact
               push @newarray,$array[$i];
          }
     }
     foreach(@newarray){
        if($_ ne ''){            #remove empty strings
		     print "$_\n";
		}
	 }
}

of course, this assumes that you don't occasionally have empty strings in
your text.


-----Original Message-----
From: Steven Arbitman [mailto:info@starbits.com]
Sent: Monday, February 11, 2002 6:23 PM
To: beginners@perl.org
Subject: Parsing a .csv file


Hi all,

I know parsing a comma-separated value file should be easy:
@array = split /,/;  # just split the line on commas

However, my input csv file looks like this:
Name,"street,city,state,zip",phone,email,"comments, may have commas, 2"

Note, not all fields have quotes, only those which contain commas have
quotes.

Even if I could get the input revised to split the address into several
different fields (which I know would be a good idea), the comments remain a
problem.

I can solve the problem using the substr function to examine the incoming
text char by char, replacing commas outside quotes with something else
(tabs), and leaving commas inside quotes, then splitting the line on tabs:

	$len = length ();
	for ($in_quotes=$i=0; $i<$len; $i++) {
		if (substr($_,$i,1) eq "," and !$in_quotes) {
			substr($_,$i,1) = "\t";
		} elsif (substr($_,$i,1) eq '"') {
			substr($_,$i,1)= " ";
			if ($in_quotes) {$in_quotes = 0;}
			else {$in_quotes = 1;}
		}
	}
	@infields = split /\t/;

This has got to be the slowest most inelegant way possible, but I don't see
another.  Is there a better way?

Thanks,
Steve


-- 
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org


--------------------------------------------------------------------------------
This email may contain confidential and privileged 
material for the sole use of the intended recipient. 
If you are not the intended recipient, please contact 
the sender and delete all copies.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About