Front page | perl.beginners |
Postings from January 2002
RE: Count Words
Thread Previous
|
Thread Next
From:
Booher Timothy B 1stLt AFRL/MNAC
Date:
January 22, 2002 08:27
Subject:
RE: Count Words
Message ID:
30C9E24891FFD411B68A009027724CB701A22DDE@eg-002-015.eglin.af.mil
Wow -- that is really cool. I am going to go review hashes. How crazy
compact!
thanks a lot,
Tim
-----Original Message-----
From: Peter Scott [mailto:Peter@PSDT.com]
Sent: Tuesday, January 22, 2002 9:40 AM
To: Booher Timothy B 1stLt AFRL/MNAC; 'beginners@perl.org'
Subject: Re: Count Words
At 08:59 AM 1/22/02 -0600, Booher Timothy B 1stLt AFRL/MNAC wrote:
>I am trying to write a perl script to count the words (not counting
>duplicates) in a file based on the following definition of word:
>
>"A word is any collection of characters seperated by white space or
>punctuation characters such as {.!?,}"
>
>I have a lot of ideas, but also the suspicion that someone else has done
>this before. Here is my basic approach.
>
>--> create two-dimensional array with following axes {x = word.length, y =
>word.string}
>--> read line
> --> read first word
> --> compare word against entire column of similiar sized words
> if found then promote word one higher in column
> else add word to the bottom of the column and increment
word
>count
>
>Any ideas on a more efficient approach -- anything else out there that does
>this?
Whoa, sounds like someone hasn't met hashes yet.
Hashes are the first coolest thing you encounter when learning Perl (unless
you've come from awk, which I don't think you have).
If we accept the set of word characters as being defined by \w, your
problem can be solved with this code:
my %word;
while (<>) {
$word{$_}++ for /(\w+)/g;
}
Somewhat simpler than you were imagining? Here's how it works:
my %word;
Declare hash (since the code is going to run with "use strict").
while (<>) {
While we can read a line from either files named on the command line or
standard input, put the line into the variable $_
for /(\w+)/g;
Loop over all groups of consecutive word characters in $_, putting each one
into a temporary $_
$word{$_}++
Increment the count stored in the hash corresponding to that word. If
there isn't one there yet, create one with an initial value of 0, then add
1 to it.
After the end of the loop you can dump the concordance with something like:
print "$_: $word{$_}\n" for sort keys %word;
--
Peter Scott
Pacific Systems Design Technologies
http://www.perldebugged.com
Thread Previous
|
Thread Next