Front page | perl.beginners |
Postings from May 2011
Re: Help with regular expressions
Thread Previous
|
Thread Next
From:
Kenneth Wolcott
Date:
May 9, 2011 14:35
Subject:
Re: Help with regular expressions
Message ID:
BANLkTimTPfRLa_dvR8ZAy-181JP8Hz=iqA@mail.gmail.com
On Mon, May 9, 2011 at 12:04, Sandip Bhattacharya <
sandipb@foss-community.com> wrote:
> On Mon, May 9, 2011 at 11:44 PM, Tiago Hori <tiago.hori@gmail.com> wrote:
> > I am trying to write a small script to parse bibliographic references
> like
> > this:
> >
> > Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
> > reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
> >
> > What I want to be able to do eventually is parse each name separately and
> > associate that with the title. I am not sure how yet, but I haven't even
> got
> > there.
>
> I took a stab at this. It might not be perfect and catch all possible
> variations. But in any case, unless you have rules for the text in
> these entries, it is very difficult to catch them all.
>
> =========================================================
> #!/usr/bin/perl
> #
>
> use strict;
> use warnings;
>
> my $text = <<END;
> Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
> reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
> END
>
> my @authors=();
>
> # Extract authors
> # Assuming each author is composed of one of more matches of:
> # <SPACE>* WORD, <SPACE>* (ALPHABET PERIOD)+
> if (my @matches = $text =~ m/(\s*\w+,\s*(\w\.)+),/gs) {
> while(@matches) {
> my $match = shift @matches;
> my @comps = map {s/^ +//;s/ +$//;$_} (split ",", $match);
> push @authors, join " ",@comps[1,0];
> shift @matches;
> }
> }
>
> # Extract title
> # Everything from the first period followed by a space to the next period.
> # Authors should have periods followed by either a letter or a comma
> # for this to work
> if ($text =~m/\. (.*?)\./s) {
> my $title = $1;
> $title =~ s/\n/ /g;
> foreach(@authors) {
> print "$title: $_\n";
> }
> }
> =====================================================================
>
> $ ./match_2.pl
> The effect of stress on reproduction in Atlantic cod: M.J. Morgan
> The effect of stress on reproduction in Atlantic cod: C.E. Wilson
> The effect of stress on reproduction in Atlantic cod: L.W. Crim
>
> All, please let me know if there is a way to combine both the regexes.
> I had a brain coredump before I gave up.
>
> Thanks,
> Sandip
>
Hasn't someone already fixed this problem? If there isn't a CPAN module to
perform standardized bibliographic reference formatting/parsing. I haven't
looked at CPAN; did either of you? If a CPAN module doesn't exist, one
should!
Ken Wolcott
Thread Previous
|
Thread Next