Front page | perl.beginners |
Postings from December 2002
postcode regex problem - again.
Thread Next
From:
Gary Stainburn
Date:
December 18, 2002 07:43
Subject:
postcode regex problem - again.
Message ID:
200212181537.59274.gary.stainburn@ringways.co.uk
Hi folks.
I'm trying to locate the UK postcode located somewhere inside an address,
extract it, and place it in a specific position. The data file I'm
processing is generated by a COBOL file and is fixed-length format text.
I've almost got it, but as you can see from the output, it's not quite right.
The split's happening after the first letter not before it.
The postcode is of the format XX99 9XX where the space is optional and the
first 'XX' and the '99' may be single character. The 9XX is always 1 digit
followed by 2 letters.
e.g. WF10 5QQ, M5 5QQ,
TT$ pcoderun <slexport.txt 2>&1 >slexport.gps|head
in=' LS8 5QP','' out=' L','S8 5QP'
in=' LS9 8HE','' out=' L','S9 8HE'
in=' LS28 6QW','' out=' L','S28 6QW'
in=' LS28 7UH','' out=' L','S28 7UH'
in='CO DURHAM DL1 2BL','' out='CO DURHAM D','L1 2BL'
in=' LS11 7NW','' out=' L','S11 7NW'
in=' BS2 0EQ','' out=' B','S2 0EQ'
in='LEEDS, LS12 6BN','' out='LEEDS, L','S12 6BN'
in=' LS11 0DS.','' out=' L','S11 0DS'
in='PRESTON, PR5 8AT.','' out='PRESTON, P','R5 8AT'
TT$cat pcoderun
#!/usr/bin/perl -w
my $template="A40A30A30A30A30A10A9"; # matches COBOL file descripter
while(<STDIN>) {
my ($head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest)=unpack($template,$_);
if ($addr4) {
splitit(\$addr4,\$pcode);
} elsif ($addr3) {
splitit(\$addr3,\$pcode);
} elsif ($addr2) {
splitit(\$addr2,\$pcode);
}
print pack($template,$head,$addr1,$addr2,$addr3,$addr4,$pcode,$rest),"\n";
}
sub splitit {
my ($line,$pcode)=@_;
if ($$line=~/^(.*)(\D{1,2}\d{1,2}\s{0,1}\d\D{2})\s*/) {
print STDERR "in='$$line','$$pcode' out='$1','$2'\n";
$$line=$1;
$$pcode=$2;
}
}
TT$
--
Gary Stainburn
This email does not contain private or confidential material as it
may be snooped on by interested government parties for unknown
and undisclosed purposes - Regulation of Investigatory Powers Act, 2000
Thread Next
-
postcode regex problem - again.
by Gary Stainburn