develooper Front page | perl.beginners | Postings from January 2002

using regular expressions to find sequences of items in data

Thread Next
From:
jon hans
Date:
January 6, 2002 02:43
Subject:
using regular expressions to find sequences of items in data
Message ID:
20020106070706.1201.qmail@web14607.mail.yahoo.com
#!/usr/bin/perl
#######################################################

I am trying to find all of the reoccurring sequences
excluding the sub sequences.

Maybe I am missing the obvious, but having a little
perl exposure and not being an expert perl programmer
I have hacked together some code that does some of
what I would like to do, but I know that there must be
a much better way of doing this. I just don't have any
ideas right now, having only had a couple hours sleep
in the last couple of days. :+( am I looking at this
all wrong? There should be some regular expression(s)
that would make this more maintainable and elegant.
:-)

I have used an array of items called @datalist and a
hash called %frequency that has a count of how often
each item occurs in the data list. I used tr to clean
the data of special characters if any and split on
white space into the @datalist array.

I would appreciate some help with this. Thanks

JH

#######################################################


# find frequency of all sequences of the given size
my $count = $first = $currentseq = 0; 
# size of sequence to look for
my $sizeof = 10; 

while ($first + $sizeof < $#datalist) {


#ugly
   if ( defined $frequency{$datalist[$first]} &&
defined $frequency{$datalist[$first+1]} &&
$frequency{$datalist[$first+2]} &&
$frequency{$datalist[$first+3]} &&
$frequency{$datalist[$first+4]} &&
$frequency{$datalist[$first+5]} &&
$frequency{$datalist[$first+6]} &&
$frequency{$datalist[$first+7]} &&
$frequency{$datalist[$first+8]} &&
$frequency{$datalist[$first+9]} ) {


# put a sequence together with a space separating
items
      $currentseq .= $datalist[ $first  ] ;
      for (my $count = 1; $count < $sizeof; ++$count)
{
         $currentseq .= " " . $datalist[ $first +
$count ] ;
      }
# increment count of sequence for the current one
      ++$current{ $currentseq }; 
   }
# next position in the data list
   ++$first; 
}


foreach ( keys ( %current ) ) {
# if no multiples remove sequence
   if ( $current{ $_ } < 2 ) {
	delete $current{ $_ } ;
   }

   my $currentsequence = $_ ; 
   my $numberof = $current{ $_ } ;

   foreach ( keys ( %lastseq ) ) {
# if the number of times the smaller sequence occurs
is # the same, then the shorter sequence is not needed
      if ( grep($_,$currentsequence) && $lastseq{ $_ }
== $numberof ) {
         delete $lastseq{ $_ } ;
      }
   }
}

#######################################################


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About