Front page | perl.beginners |
Postings from May 2011
Re: Help with regular expressions
Thread Previous
|
Thread Next
From:
Sandip Bhattacharya
Date:
May 9, 2011 12:05
Subject:
Re: Help with regular expressions
Message ID:
BANLkTik9snE8j7w94tobEbQwbkZY1-4guQ@mail.gmail.com
On Mon, May 9, 2011 at 11:44 PM, Tiago Hori <tiago.hori@gmail.com> wrote:
> I am trying to write a small script to parse bibliographic references like
> this:
>
> Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
> reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
>
> What I want to be able to do eventually is parse each name separately and
> associate that with the title. I am not sure how yet, but I haven't even got
> there.
I took a stab at this. It might not be perfect and catch all possible
variations. But in any case, unless you have rules for the text in
these entries, it is very difficult to catch them all.
=========================================================
#!/usr/bin/perl
#
use strict;
use warnings;
my $text = <<END;
Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
END
my @authors=();
# Extract authors
# Assuming each author is composed of one of more matches of:
# <SPACE>* WORD, <SPACE>* (ALPHABET PERIOD)+
if (my @matches = $text =~ m/(\s*\w+,\s*(\w\.)+),/gs) {
while(@matches) {
my $match = shift @matches;
my @comps = map {s/^ +//;s/ +$//;$_} (split ",", $match);
push @authors, join " ",@comps[1,0];
shift @matches;
}
}
# Extract title
# Everything from the first period followed by a space to the next period.
# Authors should have periods followed by either a letter or a comma
# for this to work
if ($text =~m/\. (.*?)\./s) {
my $title = $1;
$title =~ s/\n/ /g;
foreach(@authors) {
print "$title: $_\n";
}
}
=====================================================================
$ ./match_2.pl
The effect of stress on reproduction in Atlantic cod: M.J. Morgan
The effect of stress on reproduction in Atlantic cod: C.E. Wilson
The effect of stress on reproduction in Atlantic cod: L.W. Crim
All, please let me know if there is a way to combine both the regexes.
I had a brain coredump before I gave up.
Thanks,
Sandip
Thread Previous
|
Thread Next