develooper Front page | perl.beginners | Postings from July 2007

Re: parsing a line

Thread Previous | Thread Next
From:
Chas Owens
Date:
July 2, 2007 10:10
Subject:
Re: parsing a line
Message ID:
58ce48dc0707021010y60146574q2c6108e072eaab02@mail.gmail.com
On 7/1/07, alok nath <aloknathlight@yahoo.com> wrote:
>
> Hi Chas,
>   Can you please explain the portion ( ([\w ]*\w)\s*= )of the regex.?
>   And why its stored in $s.Can it be directly stored in hash my %rec.
> Thanks
> Alok
snip

First off, the results of the regex are not being stored in $s; the
regex is being applied to the string in $s.  Therefore the results are
being stored directly into %rec.

As to what the regex does, whenever one runs up against a regex that
you don't understand, it is wise to break the regex into its simplest
parts and analyze how those parts interact with the input.  The
following script outputs

\s*:    [ ]
[\w ]*: [Test Descriptio]
\w:     [n]
\s*:    [ ]
=:      [=]
\s*:    [ ]
":      ["]
.*?:    [Test 1]
":      ["]

\s*:    [  ]
[\w ]*: [I]
\w:     [D]
\s*:    [  ]
=:      [=]
\s*:    [  ]
":      ["]
.*?:    [ID A1]
":      ["]

\s*:    [ ]
[\w ]*: [DirAbsolut]
\w:     [e]
\s*:    [ ]
=:      [=]
\s*:    [ ]
":      ["]
.*?:    [C:/perl]
":      ["]

We can see from this that the pattern [\w ]*\w holds the key and the
pattern .*? holds the value.  Also we can see that the whitespace
patterns match the whitespace characters and the constants match the
constants.  One might reasonably ask why we are using  [\w ]*\w
instead of just [\w ]+ since \w is part of the character class.  here
is the what the output would look like if we did that:

\s*:    [ ]
[\w ]+: [Test Description ]
\s*:    [ ]
=:      [=]
\s*:    [ ]
":      ["]
.*?:    [Test 1]
":      ["]

\s*:    [  ]
[\w ]+: [ID ]
\s*:    [  ]
=:      [=]
\s*:    [  ]
":      ["]
.*?:    [ID A1]
":      ["]

\s*:    [ ]
[\w ]+: [DirAbsolute ]
\s*:    [ ]
=:      [=]
\s*:    [ ]
":      ["]
.*?:    [C:/perl]
":      ["]

Ah-ha, it appears as if [\w ]+ matches the space after the key and the
\s* is matching nothing.  This might cause a problem if we are
expecting to have "DirAbsolute" rather than "DirAbsolute ".

#!/usr/bin/perl

use strict;
use warnings;

#strings to test
my @a = (
        '<Test Description = "Test 1" ',
        'ID =  "ID A1" ',
        'DirAbsolute = "C:/perl"/>',
);

#regex broken down into individual parts
my @parts = (
        q(\s*),
        q([\w ]*),
        q(\w),
        q(\s*),
        q(=),
        q(\s*),
        q("),
        q(.*?),
        q(")
);

my $regex = join '', map { "($_)" } @parts;

for my $s (@a) {
        my %h;
        @h{@parts} = $s =~ /$regex/;
        print  "\n", map { "$_:\t[$h{$_}]\n" } @parts;
}

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About