Front page | perl.perl5.porters |
Postings from March 2007
new tutorial perlglobtut, translated
Thread Next
From:
Wolfgang Laun
Date:
March 3, 2007 14:44
Subject:
new tutorial perlglobtut, translated
Message ID:
17de7ee80703031038h42d720a3td713ee294680cd63@mail.gmail.com
This is the translation.
-Wolfgang
=head1 NAME
=for roff .nh
perlglobtut - A tutorial on Perl's global variables and their implementation
as stashes and typeglobs
=head1 DESCRIPTION
=for roff .nh
This page describes Perl's global (package level) variables and their
implementation. The implementation of local (or lexical) variables and
the related description of I<scopes>, of the directives C<my> and C<our>
and of the function C<local> can be found in L<perlscopetut|perlscopetut>.
=for html <a name="fn_1"></a>
HINT: This text contains footnotes, marked like this: L<[1]|/item__5b1_5d>.
In browsers that permit navigation by following references, the text of
the footnote can be reached by clicking on the number in brackets, and
from there another reference points back into the main text.
=head2 Global Variables
=for roff .nh
I<Global variables> have been around since Perl's start -- if you don't
declare a variable, it'll be automatically created as global. Such
variables are accessible from I<anywhere> in the entire program, even
from code that's been loaded from some other Perl file. (Some modules
use such variables for configuration or debugging.) The visibility
of global names is somewhat restricted by the concept of I<namespace>,
as each I<package> is associated with a namespace of its own. But
prefixing the package name to the name of the global variable is
sufficient to make it accessible from anywhere else.
In spite of its name, the function C<local>, which just temporarily
saves a variable's value, establishes a new container for another
value and thereby provides for the automatic restoration of the previous
one at some later time, does not influence the variable's identity
or visibility -- it is still reachable from anywhere.
=head3 Stashes and packages
=for roff .nh
Generally speaking, a variable identifies a place in the computer's
memory by a name. The first step during the access to a variable is the
translation of the name into some address value.
Perl provides a standard mechanism for associating a name with some
other value: the I<hash>. It may not come as a surprise that the
lookup of storage via global names in implemented by using a hash.
There is a separate hash for each package namespace, its keys being
the names of all variables, functions or handles existing within
the package. Such hashes function as symbol tables, and that's why
they are called I<symbol table hashes> or, briefly, I<stashes> (pun
intended).
A global variable always belongs to exactly one package, but may be
used from within any other package by prefixing the package
name of its "home" package. The stash name and the package name
are identical, but if you want to access the stash as a hash,
you have to append two colons, enabling the parser to distinguish
the stash name from an ordinary hash. Therefore, the name of the
stash that comes with the package C<Foo> is C<Z<>%Foo::>. (Note the
initial "%", indicating that this is a hash.) In code that doesn't
follow any C<package> directive, i.e., in the I<default package>,
all globals go into the stash called C<Z<>%main::>.
=for html <a name="fn_2"></a>
At compile time, a C<package> directive defines a new package and
establishes the accompanying hash if this doesn't exist yet. Any
new symbols that aren't fully qualified with an explicitly
provided package name or expressively made lexical, will be entered
as new stash elements L<[2]|/item__5b2_5d>.
=for html <a name="fn_3"></a>
Code generated for execution contains the information, which package
(or stash) to access for a certain variable L<[3]|/item__5b3_5d>.
Just like a hash may contain references to other hashes, a stash may,
in addition to symbols, contain references other stashes. This
provides for the realization of package hierarchies, e.g.,
Alpha::Beta::Gamma
Each of these stashes functions independently, and variables within
these stashes have no connection to each other. The stashes are
connected so that the top level stash contains a reference to the
subordinate stash. In the cited example, C<Z<>%Alpha::> contains an
element pointing to C<Z<>%Beta::>, and this stash has an element
referencing C<Z<>%Gamma::>. This permits a recursive search in
symbol tables (see the example in the section
L<Accessing symbol tables|/"Accessing symbol tables">).
=head3 Stash elements
=for roff .nh
The keys of I<elements> within a stash are I<symbol names>, referring
to variables, functions or handles within the package that is
associated with the stash. These elements can be accessed just like
the elements in any other hash. If, for instance, there is an element
C<test> in the package C<Foo>, one may write
$Foo::{test}
Notice that the colons are not separators between the names of the package
and the variable, they are I<part of the name> of the stash C<Z<>%Foo::>.
In contrast, the expression
$Foo{test}
refers to the element C<test> of an "ordinary" hash C<Foo> (in some
arbitrary package).
Symbol elements in the default package C<Z<>%main::> can be accessed
likewise. Thus, its element C<test> would be accessed by writing
$main::{test}
Notice that the colons at the end of the name aren't significant except
for hashes, where they indicate that the hash is a stash. For all
other Perl data types they have no special meaning whatsoever; the scalar
C<$hello::> and the array C<Z<>@hello::> aren't different from any other
variables (except, maybe, by their unusual name). The colons are just
part of the name, and therefore C<$hello::> and C<$hello> are two
distinct variables.
Perl doesn't prohibit the access to stashes and their elements. They are
treated just like any other hash so that the functions C<each>, C<exists>,
C<keys>, C<values> and C<delete> can be applied in the usual way. This
lets you search and even modify symbol tables from within the Perl
program they belong to. Several modules make good use of this possibility.
(Examples can be found in the section
L<Accessing symbol tables|/"Accessing symbol tables">).
It should be fairly obvious that modifications are somewhat dangerous.
After
undef %main::;
or
%main:: = (NO => 'MORE');
your program is likely to suffer. Also, deleting stash elements with
C<delete> should be done with utmost care. Perl is, as usual, free
to do as you please, while expecting you to know what you do.
=head3 Typeglobs
=for roff .nh
In contrast to most other programming languages, Perl permits to use
the same name for a variable and a function. So it's possible for
a scalar variable C<$test>, an array C<@test>, a hash C<%test>, a
function C<test> and even an IO handle C<test> to exist next to each
other, without any mutual influence.
This is achieved by a component called B<typeglob>. You may imagine
this as a structure with six fields (or I<slots>), each of which is
dedicated to one of the data types available in Perl.
=over 3
=item *
Scalars (including references)
=item *
Arrays
=item *
Hashes (including stashes)
=item *
Functions
=item *
IO handles (for files, directories, pipes or sockets)
=item *
Format handles
=back
=for html <a name="fn_4"></a>
Frequently only one slot is occupied, while all others remain void
L<[4]|/item__5b4_5d>. If, however, a script contains a scalar and
an array with identical names, then these two variables are stored
in the pertaining slots of the very same typeglob. No other relationship
than having the same name is implied by this neighbourhood. They
might just as well have different names. It is up to the programmer to
use or to avoid identical names for different objects L<[5]|/item__5b5_5d>.
=for html <a name="fn_5"></a>
What is the connection between stash elements and typeglobs? You
should guess the answer: I<typeglobs> are stash element I<values>.
Somewhat simplified, this is what happens when the variable
C<$Foo::test> is accessed:
=over 3
=item *
=for html <a name="fn_6"></a>
Stash C<Z<>%Foo::> is searched for an element with the key C<test>.
The value of the element is a typeglob L<[6]|/item__5b6_5d>.
=item *
Seeing that there is a "$" in front and neither an array selector ("[...]")
nor a hash key ("{...}") after, the scalar slot of the typeglob is selected.
=back
The I<sigil> ("$", "@", or "%") in variable names acts like a selector for
the typeglob slot. With handles, where there is no sigil, the slot is
implied from the context wherein the name occurs. The same goes for
functions which may be called with the sigil "&", which is frequently
omitted in favour of C<name()>.
The typeglob itself can be accessed by prefixing the sigil "*". Thus,
all three expressions below access the same typeglob in the package C<main>:
$main::{var} *main::var *var
The first one is only evaluated at runtime because it represents an
access to a hash element whose contents aren't available at compile time.
The other two are evaluated during compilation, producing the address
of the typeglob. This will be faster at runtime, but it is less flexible.
(Notice that the key in the first expression could come from an
arbitrary expression evaluating to C<'var'>.
=head3 Typeglobs and references
=for roff .nh
In the previous subsection we have explained how typeglobs provide fields
or I<slots> for Perl's six different types of values. This isn't quite
true since a slot, if occupied, contains a I<references> pointing to the
place where the value really is stored. These references aren't any
different from Perl's references to scalars, arrays, hashes, functions
or handles. If it were possible to access the slots, we could retrieve
or even change these values. This is described in the paragraphs below.
=head4 Assigning typeglobs
=for roff .nh
It's possible to assign a reference to a typeglob. This is not quite
the same as assigning a reference to a scalar, because Perl selects
the appropriate slot according to the type of the reference. (Assigning
anything to a scalar always affects the scalar slot.)
$x = 3; ${*x} = 3; *x = \do{3};
@y = (1, 2); @{*y} = (1, 2); *y = [1, 2];
%z = (s => 3); %{*z} = (s => 3); *z = {s => 3};
sub a {...}; *a = sub {...};
Columns one and two have the same effect: a (new) value is assigned to
a variable. Column two illustrates the access to the typeglob, resulting
in what you may imagine as a generic reference. The sigil and the
braces cause dereferencing through the appropriate slot of the
typeglob, and the right hand side is assigned to the variable that's
there at the other end of the reference (or a newly created variable).
(There is no entry in the last line of column two, because there is
no way of assigning anything to what is pointed to by C<&{*a}> --
this is code!)
Before discussing what the third column does, the strange construct
C<\do{3}> warrants an explanation. It produces a reference to the
anonymous scalar variable containing the result of the do block.
Another bizarre looking way of achieving this would be C<< \sub{4}->() >>,
creating a reference to the result of the call of an anonymous subroutine.
(Why can't we use C<\3>? We'll discuss this a little later.)
The third column also uses an access to the typeglob, yielding a
generic reference on the left hand side. If there is a reference value
on the right hand side (and there better be), the appropriate slot,
according to type, is replaced with that reference. You might be
tempted to think that this isn't different from what goes on in the
other two columns, but consider this sequence of actions:
our $x = 1;
our $xref = \$x;
print "\$x=$x, \$\$xref=$$xref\n"; # $x=1, $$xref=1
${*x} = 2;
print "\$x=$x, \$\$xref=$$xref\n"; # $x=2, $$xref=2
*x = \do{3};
print "\$x=$x, \$\$xref=$$xref\n"; # $x=3, $$xref=2 (!)
The subtle difference is that assigning to a reference leaves intact
whatever is pointed to by that reference. This doesn't make any
difference when the assignment destroys the last reference to whatever
was pointed to, but otherwise, watch out!
It is indeed possible to create a reference to a scalar by writing
something like C<Z<>\3> or C<Z<>\"ABC">. But if you assign this
reference value to a slot in a typeglob:
*x = \3; # makes $x read-only!
variable C<$x> stops being variable. Referencing a constant returns
a reference to something in read-only storage.
The usual way of defining a subroutine using C<sub> followed by the
name and the body block is shown in the last line of the first column.
The remarkable possibility of a dynamic subroutine definition that
may then be called with a constant name is illustrated in the last
line of column three, where an anonymous subroutine reference is
slotted into a typeglob, making it callable as C<a()>. Indeed it
would be possible to write a declaration
sub mysub {
...
}
that is processed by the Perl compiler as
BEGIN {
*mysub = sub { ... };
}
=for html <a name="fn_7"></a>
although the first version is somewhat faster, because it is processed
at compile time L<[7]|/item__5b7_5d>.
The technique of generating functions at run time by assigning a
code reference to a typeglob is quite popular and is, for instance,
used by L<wrapper functions|/"Creating wrappers">.
Another point that's not to be missed is the assignment of a typeglob
to another typeglob:
*alpha = *beta;
Its effect is that you'll get another typeglob so that both typeglobs
share their slots. If there is an assignment to C<$alpha>, it also
affects C<$beta>, and vice versa. The same is true for C<Z<>@beta>,
C<Z<>%beta>, C<Z<>&beta> and the handles C<beta>. This technique is
known as I<aliasing> and it is the foundation for modules such as
L<Exporter|Exporter>, which map symbols (usually subroutine names)
of one package into another package so that foreign, imported subroutines
can be called just like local functions, without prefixing them with their
home package name. (Also see the section about
L<aliasing and importing|/"Aliasing and importing">).
If you try to assign anything that's not a reference to a typeglob:
*hugo = 'test';
Perl assumes that some alias assignment is taking place and will do
*hugo = *test;
As the compiler generates code as it is written, assigning a string value
to a typeglob, this switch will happen at run time. So, not even
C<use warnings> will give you a warning if you assign a non-reference
value to a typeglob.
=for html <a name="fn_8"></a>
Typeglobs cannot be declared with C<our>, and that's why they can be
accessed without declaration and without package name even when
C<use strict 'vars'> is in effect L<[8]|/item__5b8_5d>.
Finally we'll have a quick look at a statement like
undef *bar;
which propagates to all slots of the typeglob, i.e., it is equivalent to
undef $bar;
undef @bar;
undef %bar;
...
Here the "*" acts like a wildcard character, resulting in a command
"undefine everything called C<bar>").
Notice well that deleting a stash element isn't the same as setting
all slots to undef. If you do a
delete $Foo::{bar};
the stash element is gone, but a subsequent access as in
print $bar;
will work nonetheless, because the transformation of the name (or
stash key) that's used in the print statement into a typeglob address
has been done during compilation. Here, the stash isn't required, and
deleting the C<bar> entry has no effect.
Using a symbolic reference, however, is quite another thing. It
requires the stash for lookup at run time, and therefore
my $name = 'bar';
print ${$name};
cannot access C<$bar> any more.
If such a deletion happens in a C<BEGIN> block, the program is likely
to fail. Once more: modifying a stash, or a stash element, in a C<BEGIN>
block should be done with utmost consideration.
Typeglobs are there for storing references, and that's why trying to
assign C<undef> to a typeglob, as in
*bar = undef;
is ignored and elicits a warning (if enabled):
Undefined value assigned to typeglob
Valid, however, is the assignment
*bar = \undef;
=for html <a name="fn_9"></a>
and you are invited to figure out what will happen here (but you
may look it up in footnote L<[9]|/item__5b9_5d>).
=head4 Fetching typeglob contents
=for roff .nh
The references contained within typeglobs can be retrieved and assigned
to some scalar, or used in some other way, just like any other reference.
However, the construct
$globref = *glob;
wouldn't be conclusive as it doesn't indicate I<which> reference (or
which slot) is to be fetched. To alleviate this, Perl provides
standard hash keys for selecting the six distinct slots of a typeglob:
$scalarref = *glob{SCALAR};
$arrayref = *glob{ARRAY};
$hashref = *glob{HASH};
$coderef = *glob{CODE};
$handleref = *glob{IO};
$formatref = *glob{FORMAT};
All of these assignments result in a reference of the indicated category,
which can then be dereferenced in the usual way. So, for instance,
@{$arrayref}
is an access to the array whose reference was taken from the array slot
of the typeglob. In other words, the following expressions are identical;
\@ary *ary{ARRAY}
and, therefore these, too:
@ary @{*ary{ARRAY}}
While the first expression in both pairs is usually used, the second one
illustrates what happens behind the scenes.
If some slot in a typeglob isn't occupied, an access to it returns
C<undef>, except for the C<SCALAR> slot, which returns a I<reference> to
C<undef>. This special case has to be considered in all algorithms
that access a typeglob. (See the first example in the section
L<Accessing symbol tables|/"Accessing symbol tables">.)
The close relationship between typeglobs and references permits us to
use a typeglob in places where a reference is expected (but in
general not the other way round).
There are some more hash keys that can be applied to a typeglob:
$name = *glob{NAME};
$package = *glob{PACKAGE};
$globref = *glob{GLOB};
These return the symbol name (i.e., the key of the stash element
containing the typeglob), the package (or stash) name and a reference
to the typeglob itself, respectively. These three typeglob "slots"
can only be used as rvalues, i.e., you cannot assign to them. The
parser simply refuses to do any explicit assignment to a typeglob slot.
On first glance, these last three typeglob entries don't appear to be
particularly interesting. (C<\*glob> is the same as C<*glob{GLOB}>,
and C<'glob'> is returned by C<*glob{NAME}>, so what?) But they are
important whenever a typeglob is passed as an argument to a subroutine,
where package and symbol name can be retrieved from these slots.
(See the section about
L<using stashes and typeglobs|/"Using stashes and typeglobs">).
Notice that, even though the syntax looks like an access to a hash,
a typeglob isn't a hash, so that it isn't legal to use any other "key"
except one of these "hardwired" ones. Other values (such as
C<Z<>*glob{XYZ}>) simply return C<undef>.
A typeglob itself is treated like a scalar in assignments. That's why
%bar = ( key11 => 'val1' );
$glob = *bar;
print $glob; # prints "*main::bar" (in package main)
is perfectly legal. The scalar C<$Z<>glob> now contains the
typeglob C<*Z<>bar>, and you can use
*{$glob}
to get back at the typeglob or use something like
$slot = 'HASH';
my %hash = %{*{$glob}{$slot}};
to retrieve the original hash (i.e., C<%Z<>bar>) value via the typeglob.
Since typeglobs can be stored in scalar values, they can just as well
be stored as values in hashes or arrays, or passed as subroutine
arguments.
It's important to realize that a typeglob copied to a scalar isn't
quite the same as the real thing. It is a I<fake copy> from which all
slots can be retrieved just like from the original typeglob, but any
effort to assign anything to one of its slots will destroy the entire
copy.
$value = *glob; # $value is a fake copy of *glob
print $value; # prints "*main::glob"
*{$value} = [1,2,3]; # try to change the ARRAY slot
print $value; # prints "ARRAY(0x8630810)", an array reference
The original typeglob (here: C<Z<>*glob>) isn't affected, of course.
If you want to modify typeglobs via some handy scalar, you should
consider using I<glob references>, as discussed in the next paragraph.
Another option would be to reconvert the fake copy into a typeglob proper:
*newglob = $value;
This duplicates the typeglob, and it's now possible to access the
contents of the original typeglob via the new one. This is identical
to the previously described method of I<aliasing>, i.e., both typeglobs
share their slots, which now may be modified via the copy, too.
It is also possible to obtain a reference to a typeglob by applying
the referencing operator C<\> to the typeglob, or by accessing the
C<GLOB> slot:
$globref = \*glob;
$globref = *glob{GLOB};
Both expressions return a reference to the typeglob. Using typeglob
references avoids fake copies. They are also useful as subroutine
arguments. The function C<ref()> lets you distinguish this reference
category from any other:
sub useGlobRef {
my $arg = shift;
die( "not a glob reference" ) unless ref( $arg ) eq 'GLOB';
print "a glob reference to ", *{$arg}, "\n";
# ...
}
It is, as usual, important to ensure that the reference has the correct
type before dereferencing, to avoid a fatal runtime error. It would be
possible to declare such a subroutine with the argument prototype C<Z<>*>:
sub useGlobRef(*) {
...
}
Now a typeglob as well as a typeglob reference would I<always> be
passed in as a typeglob reference.
We notice, again, that the syntax for dereferencing a typeglob reference
and for accessing a typeglob via a fake copy is the same:
$globvar = *glob; # scalar contains a fake copy
$globref = \*glob; # scalar contains a typeglob reference
*newglob = *{$globvar}; # or
*newglob = *{$globref};
=head3 Using stashes and typeglobs
=for roff .nh
So much about the theory behind stashes and typeglobs. How can they
be put to good use?
The following sections discuss possible uses and present examples:
=over 3
=item *
L<Accessing symbol tables|/"Accessing symbol tables">
=item *
L<Aliasing and importing|/"Aliasing and importing">
=item *
L<Creating wrappers|/"Creating wrappers">
=item *
L<Working with filehandles and formats|/"Working with filehandles and formats">
=back
Another reason for using typeglobs in Perl 4 was argument passing
by reference, because the referencing and dereferencing operators
(C<\> and C<< -> >>) weren't available. Instead, the close affinity
between typeglobs and references was exploited. Passing an array by
reference to a subroutine, for instance, was written like this:
sub my_func {
*list = shift;
print join ',', @list; # prints "1,2,3"
@list = (2,4,6);
}
@val = (1,2,3);
my_func(*val);
print join ',', @val; # prints "2,4,6"
Passing the typeglob to the subroutine includes passing all possible
references contained therein. Any changes to the array accessed via
the copied typeglob C<list> affect the array C<@val>.
This coding style became obsolete in Perl 5 by the introduction of true
references and shouldn't be used anymore. Maybe it is still around in
legacy code, so that knowing about this possibility might be useful.
=head4 Accessing symbol tables
=for roff .nh
Accessing symbol tables can be useful for testing and debugging.
As symbol tables are implemented as hashes, Perl's functions for
hashes can be applied as usual, letting you iterate through symbol
tables and process their entries. The Perl debugger uses this
in its C<V>- and C<X>-commands.
The following example demonstrates a recursive iteration through stashes,
while exploiting the hierarchical structure of packages as described in the
section about L<Stashes and packages|/"Stashes and packages">. (The line
numbers are used for referencing in the explanation below.)
1 my @slots = qw(SCALAR ARRAY HASH CODE IO FORMAT);
2 sub stash_dump {
3 my $stash = shift || 'main::';
4 $stash .= '::' unless $stash =~ /::$/;
5 my $recflg = shift;
6 show_globs($stash, 0, $recflg);
7 }
8 sub show_globs {
9 no strict 'refs';
10 my ($stash, $index, $recflg) = @_;
11 foreach my $glob (values %{$stash}) {
12 my $name = *{$glob}{NAME};
13 next if $name eq 'main::';
14 my $fullname = $stash . $name;
15 foreach my $slot (@slots) {
16 my $text = ' ' x $index . '*' . $fullname . "{$slot}\n";
17 if ($slot eq 'SCALAR') {
18 print $text if defined ${$glob};
19 }
20 else {
21 print $text if defined *{$glob}{$slot};
22 }
23 }
24 show_globs($fullname,$index+1,1) if $name =~ /::$/ && $recflg;
25 }
26 }
Subroutine C<stash_dump> prints all typeglobs and their occupied slots
from the stash the name of which is passed as the first argument.
(C<Z<>main::> is used as the default.) If the second argument is true,
it will recursively process stashes of sub-packages as well.
Line 1 defines a list of Perl's data types as they occur in typeglobs.
Reduce the list if you aren't interested in some of them.
The C<no strict 'refs'> in line 9 is important if an overall
C<use strict 'refs'> is in effect, because line 11 uses
a symbolic reference to get the stash (C<Z<>%{$stash}>). The same
bypass is needed whenever typeglobs are to be accessed in this manner:
my $name = 'ENV';
print %{*{$name}{HASH}}; # print %ENV
With C<use strict 'refs'>, this snippet produces the runtime error
Can't use string ("ENV") as a symbol ref while "strict refs" in use
This is a frequent experience when digging around in symbol table and
stashes. We'll see some more examples later on where it is necessary
to disable C<strict 'refs'> temporarily.
=for html <a name="fn_10"></a>
Another remarkable line is number 13 -- it avoids an infinite loop.
The default package is commonly referred to as 'main', but this is
just an alias for ''. You can retrieve its one and only stash
by C<%{'::'}> or by C<%{'main::'}>, but the latter will find it by
accessing
*{\%{'::'}->{'main::'}}{HASH}
This self referencing property must be handled as shown
L<[10]|/item__5b10_5d>).
Finally, the difference between processing a scalar slot and any other
slot deserves some attention. Lines 17 through 22 show that the
(already mentioned) peculiarity of a scalar slot of being I<always>
occupied makes it necessary to dereference this entry to decide whether
the scalar there is defined or not. This means that you cannot
distinguish a non-existing scalar from an existing undefined one.
With all other slots it is sufficient to check whether they are defined.
No printing of variable I<values> is implemented since this isn't
possible (for code and handles) or requires intricate handling (for
arrays and hashes). Even scalars may be tricky if they contain
non-graphic characters. The interested reader is referred to
C<Data::Dumper>.
Talking about non-graphic characters: even names of variables as they
occur in a stash may contain oddities. The variable C<${^UNICODE}>,
for instance, has C<Z<>^U> as its initial character (0x15 on ASCII
systems). Be prepared for this and similar effects if you dump
the stash C<%{'main::'}>.
Another example illustrates the deletion of symbol tables. The full
version can be found in the module C<Symbol.pm>.
1 sub delete_package {
2 my $pkg = shift;
3 $pkg = "main::$pkg" unless $pkg =~ /^main::/;
4 $pkg .= '::' unless $pkg =~ /::$/;
5 my ($stem,$leaf) = $pkg =~ /(.*::)(\w+::)$/;
6 my $stem_symtab = *{$stem}{HASH};
7 return unless exists $stem_symtab->{$leaf};
8 my $leaf_symtab = *{$stem_symtab->{$leaf}}{HASH};
9 foreach my $name (keys %$leaf_symtab) {
10 undef *{$pkg . $name};
11 }
12 %$leaf_symtab = ();
13 delete $stem_symtab->{$leaf};
14 }
In lines 6 and 8 we see how a stash is accessed via a hash reference
that's taken straight from some slot in a typeglob. Let's have a closer
look:
my $stem_symtab = *{$stem}{HASH};
This retrieves the HASH slot from a typeglob, the name of which is
in <$stem>, the result being a hash reference. In line 7, the
dereferencing operation checks whether the specified element, i.e.,
the package contained within this stash, really exists:
return unless exists $stem_symtab->{$leaf};
If it doesn't exist, the subroutine returns -- a non-existent package
doesn't have to be deleted. Otherwise the stash that this element
points to is retrieved:
my $leaf_symtab = *{$stem_symtab->{$leaf}}{HASH};
By eliminating the auxiliary variable C<$stem_symtab> we obtain the
expression
my $leaf_symtab = *{*{$stem}{HASH}->{$leaf}}{HASH};
Analyzing this step by step we see:
=over 3
=item *
C<Z<>*{$stem}> returns the typeglob of the stash, whose name is in C<$stem>.
=item *
C<Z<>*{$stem}{HASH}> accesses the HASH slot of this typeglob, yielding
a hash reference.
=item *
C<< *{$stem}{HASH}->{$leaf} >> dereferences the hash element whose name
is in C<$leaf>. The value of this element is another typeglob, which
is accessed by
=item *
C<< *{*{$stem}{HASH}->{$leaf}} >>. From that typeglob, the expression
=item *
C<< *{*{$stem}{HASH}->{$leaf}}{HASH} >> takes its HASH slot, which
returns yet another hash reference -- this time it's the one of the
desired stash. In the loop starting in line 9 the elements of this
hash are visited.
=back
Setting C<$stem> to C<"main::"> and C<$leaf> to C<"Alpha::">, these
lines can be written as:
6 my $stem_symtab = *main::{HASH};
7 return unless exists $stem_symtab->{'Alpha::'};
8 my $leaf_symtab = *{$stem_symtab->{'Alpha::'}}{HASH};
This ought to show clearly what's going on.
In line 10 the typeglobs in C<$leaf_symtab> are set to undef, one by
one. This also deletes all variables with the corresponding names.
In line 12 the stash is set to an empty hash before it is (in line 13)
finally deleted from the parent symbol table.
The subroutine is short, but exhaustive. It deletes all the data contained
in the specified package and in all of its sub-packages. Use this with
caution.
The function C<gensym> from the same module that returns a reference
to a new typeglob which may then be used in place of a filehandle,
is another interesting example:
package Symbol;
my $genseq = 0;
my $genpkg = "Symbol::";
sub gensym () {
my $name = "GEN" . $genseq++;
my $ref = \*{$genpkg . $name};
delete $$genpkg{$name};
$ref;
}
The function creates a unique name for a new typeglob, (C<Symbol::GEN>,
followed by a number). Taking the reference of the typeglob with this
name creates the typeglob in the stash of the package C<Symbol>. The
newly created stash element is deleted at once, but the typeglob remains
alive, because C<$ref> still contains the reference. The typeglob
becomes I<anonymous>. The reference is returned to the caller.
Both the creation and the deletion of the typeglob occur at runtime,
once for each call of C<gensym()>.
=head4 Aliasing and importing
=for roff .nh
As already mentioned briefly, I<aliasing> is the term used for accessing
the same data by different. After the typeglob assignment
*beta = *alpha;
any variable named C<alpha> may be referred to as C<beta>. This isn't
particularly appealing, but as soon as we consider different packages,
the importance of aliasing begins to surface.
package Alpha;
sub testfunc {
...
}
*main::testfunc = *testfunc;
After the assignment, the subroutine C<testfunc()> can be called in the
package C<main> without its home package name C<Alpha>.
This method, however, has a not quite insubstantial side effect: it
causes not only the subroutine C<testfunc> to be available in C<main>
but also all variables, of any type, and handles and formats as well!
Thus, if both package C<Alpha> and package C<main> happen to sprout
their own scalar C<$testfunc>, this assignment is bound to cause problems
because now these two are merged as well, presumably inadvertently.
That's why there is also a weaker version of aliasing, which is known
as I<partial> aliasing. The method isn't entirely new -- instead of
assigning the entire typeglob, only the slot for the specific type
(most frequently the code slot) is copied. Returning to the previous
example, this would be better written as
*main::testfunc = *testfunc{CODE};
or, using the somewhat more lucid referencing operator C<\>:
*main::testfunc = \&testfunc;
This will only change the code slot of C<Z<>*main::testfunc>, assigning
the value C<Z<>*Alpha::testfunc{CODE}> to it. From now on, both code
slots refer to the very same subroutine. All other slots of both typeglobs
remain as they were, so that, for instance, C<$main::testfunc> and
C<$Alpha::testfunc> still are two different variables.
And this is the principle of operation of the famous module C<Exporter>,
whose C<import()> method does this little sleight of slot with all
symbol names in array C<Z<>@EXPORT> and the symbol names in C<Z<>@EXPORT_OK>
as requested by the client. The code below is copied from
C<Exporter/Heavy.pm>:
1 foreach $sym (@imports) {
2 (*{"${callpkg}::$sym"} = \&{"${pkg}::$sym"}, next)
3 unless $sym =~ s/^(\W)//;
4 $type = $1;
5 *{"${callpkg}::$sym"} =
6 $type eq '&' ? \&{"${pkg}::$sym"} :
7 $type eq '$' ? \${"${pkg}::$sym"} :
8 $type eq '@' ? \@{"${pkg}::$sym"} :
9 $type eq '%' ? \%{"${pkg}::$sym"} :
10 $type eq '*' ? *{"${pkg}::$sym"} :
11 do {require Carp; Carp::croak("Can't export symbol: $type$sym")};
12 }
=for html <a name="fn_11"></a>
In addition to this core functionality of C<Exporter>, the module
offers several advanced features: export tags, exporting into some
other module than the caller, version check, blocking symbols against
export, etc., but it all boils down to the code shown here
L<[11]|/item__5b11_5d>.
What happens here? Well, the array C<Z<>@imports> contains the list
of all names (of subroutines and variables) that are to be exported.
Subroutines may be given with or without the sigil '&'. The variable
C<$callpkg> contains the name of the package from where things are
to be exported. The C<foreach> loop processes C<Z<>@imports> element by
element.
The code in lines 2 and 3 tests whether a sigil precedes the name.
This isn't required (and rarely down) with subroutines. If it is missing,
partial aliasing is done for code references. This takes care of
situations like
use MyModule qw(xx yy zz);
where subroutine names are written without C<Z<>&>.
In all other cases, the regular expression grabs and strips the sigil,
so that the code in lines 5 through 10 can do the partial aliasing
for the suitable type of reference.
It shouldn't come as a surprise, that these lines might just as
well be written as
5 *{$callpkg.'::'.$sym} =
6 $type eq '&' ? *{$pkg.'::'.$sym}{CODE} :
7 $type eq '$' ? *{$pkg.'::'.$sym}{SCALAR} :
8 $type eq '@' ? *{$pkg.'::'.$sym}{ARRAY} :
9 $type eq '%' ? *{$pkg.'::'.$sym}{HASH} :
10 $type eq '*' ? *{$pkg.'::'.$sym} :
11 do {require Carp; Carp::croak("Can't export symbol: $type$sym")};
=for html <a name="fn_12"></a>
This is just another illustration for the Perl mantra TIMTOWTDI
L<[12]|/item__5b12_5d>.
=for html <a name="fn_13"></a>
But stop! Why was line 10 written without a C<Z<>{GLOB}>? Try to find
the answer before looking it up L<[13]|/item__5b13_5d>.
If Exporter exports a scalar, it'll execute
*{"${callpkg}::$sym"} = \${"${pkg}::$sym"};
Assuming an export of variable C<$foo> from C<My_Module> into C<main>,
this becomes
*main::foo = \$My_Module::foo;
The newly created typeglob in the target package is marked as I<imported>.
This import is restricted by type, i.e., it applies (in this example)
to a scalar C<$foo> and not for some C<foo> of any other type.
It's certainly possible to import arrays, hashes and functions, too,
but this must be designed by the module's author, by adding the
appropriate names (including the sigil) to the lists contained in
C<Z<>@EXPORT> or C<Z<>@EXPORT_OK>.
If, later on, the Perl parser encounters the expression
$foo
in the package C<main>, it discovers that this scalar has been marked
as imported in the typeglob and permits this unqualified usage of the
name. That's why it is possible to refer to these names without a
package name even though C<strict 'vars'> is in effect.
=for html <a name="fn_14"></a>
If a program is compiled with C<use strict> or C<use strict 'vars'>,
every global variable must have been imported so that it can be
accessed without its home package name. In addition to real imports,
there are the compiler pragmas C<vars> and C<subs> L<[14]|/item__5b14_5d>.
Looking at the C<import()> method of C<vars.pm>, we find this code
snippet:
$sym = "${callpack}::$sym" unless $sym =~ /::/;
*$sym = ( $ch eq "\$" ? \$$sym
: $ch eq "\@" ? \@$sym
: $ch eq "\%" ? \%$sym
: $ch eq "\*" ? \*$sym
: $ch eq "\&" ? \&$sym);
This shouldn't be quite so strange by now. C<$ch> contains the sigil and
C<$sym> the name of the symbol that's to be imported, to which the
package name of the caller may have to be prefixed to arrive at a
fully qualified name. To import a scalar C<$foo> into the package
C<main>, for instance, one writes
use vars '$foo';
which results in
$sym = "main::foo";
*{$sym} = \${$sym};
which boils down to
*main::foo = \$main::foo;
=for html <a name="fn_15"></a>
This is essentially the same as what we have seen in the code taken
from Exporter, the difference being that here target and home package
are I<identical>. In other words, the module C<Exporter> and the pragmas
C<vars> and C<subs> are based on the same principle: they both import
symbols -- only the source packages differ L<[15]|/item__5b15_5d>.
The code implementing the pragma C<subs> is a simplified version of
C<vars.pm>, optimized for the import of subroutines. C<subs> is
one way of I<declaring> subroutines so that they may be called in
a location preceding their definition.
func;
...
sub func {print "!"}
The parser has no way of telling what C<"func;"> means -- the symbol
table doesn't contain a corresponding typeglob yet. It'll interpret
this expression as an unquoted string literal and flag it, either
with
Unquoted string "func" may clash with future reserved word
or, if C<use strict> is in effect, with
Bareword "func" not allowed while "strict subs" in use
Adding
use subs 'func';
near the beginning of the program makes the name C<func> known as a
subroutine, so that the parser may interpret it correctly. (Notice
that a declaration can also be achieved by C<sub func;>, to which
a prototype may be added.)
After passing this code snippet to the Perl compiler:
use strict 'vars';
use vars '$alpha';
@alpha = (1,2,3);
the error message
Variable "@alpha" is not imported
shouldn't come as a surprise any more. The compiler has found the
array name C<Z<>@alpha> in the last line. The declaration in the
preceding line did create a typeglob for C<alpha>, but only the
scalar was marked as imported, not the array. Since imports are
specific to types, not only the scalar but also the array must be
declared prior to use:
use vars qw($alpha @alpha);
If the second line is omitted from the above example so that nothing
at all is declared, then there's no typeglob at all existing yet,
and the compiler terminates with the well-known messages
Global symbol "$alpha" requires explicit package name
Global symbol "@alpha" requires explicit package name
Notice that some code (in package C<main>) like this
use strict 'vars';
$main::alpha = 2;
print $alpha;
=for html <a name="fn_16"></a>
makes the compiler terminate, after emitting the C<not imported> message.
Using a fully qualified name in line 2 does create the typeglob C<Z<>*alpha>,
but I<the scalar isn't marked as imported>. The situation here is
the same as in the example where the array was used but only the scalar
was imported. Any import will I<only> work if a reference of the
appropriate type is assigned to the corresponding slot of the typeglob
L<[16]|/item__5b16_5d>.
This explain the global effect of C<vars> and C<subs>, which both
manipulate typeglobs, and these (together with any marks they
contain) are visible in the entire program, not just in the scope of
some block. That's why there is no such thing as C<no vars> or C<no subs>:
it just doesn't make any sense to revoke an import, all the more so
because this is only relevant during the compilation stage with
C<strict 'vars'> being turned on.
Handles may I<always> be accessed with an unqualified name, because there
is no way of declaring them. Therefore it's not necessary to import them,
and there are no marks provided for registering them as imports.
=head4 Creating wrappers
=for roff .nh
A I<wrapper> is a subroutine written around some other subroutine, to
enhance its function or to adapt it for a special environment. Ideally,
neither the code calling the original subroutine nor the subroutine
itself are to be aware of the wrapper getting in between. For Perl
in particular this means that a call to the built-in function C<caller>
should return the same results, with and without the wrapper being
in place.
This can be achieved by manipulating the subroutine's typeglob. Here
is a subroutine which we'd like to wrap:
sub hello {
print 'Args: ', join(',', @_), "\n",
'Caller: ', join(',', caller), "\n\n";
}
And this is the code generating the wrapper:
1 sub create_wrapper {
2 no strict 'refs';
3 no warnings 'redefine';
4 my $name = caller . '::' . shift;
5 my $oldsub = *{$name}{CODE} or die "Can't find subroutine '$name'!\n";
6 my $newsub = sub {
7 my ($pkg, $file, $line) = caller;
8 print STDERR "WRAPPER: Hi! $name(@_) was called:\n",
9 "WRAPPER: from '$pkg', file '$file', line $line\n\n";
10 goto &$oldsub;
11 };
12 *{$name} = $newsub;
13 return $oldsub;
14 }
Most of this ought to be familiar to you by now. The essential lines
are number 5 and number 12. In line 5 the subroutine reference to the
one whose name is passed as the argument to C<create_wrapper> is
retrieved from the code slot of its typeglob. (The very same code
reference might just as well be obtained by
$oldsub = \&{$name};
=for html <a name="fn_17"></a>
but see L<[17]|/item__5b17_5d>). If the specified subroutine doesn't
exist, the slot isn't occupied and we get C<undef> and call C<die()>
with an appropriate message. If all goes well, the wrapper is
established as an anonymous subroutine that prints its caller and
uses
goto &$oldsub;
=for html <a name="fn_18"></a>
to transfer control to the original subroutine L<[18]|/item__5b18_5d>.
The trick here is that the stack remains as it is, so that the
original subroutine isn't aware of the interlude in the wrapper.
(The very same technique is frequently used in connection with
C<AUTOLOAD>.)
=for html <a name="fn_19"></a>
Line 12 contains an assignment replacing the contents of the code
slot with the reference of the wrapper subroutine. Now the new
subroutine, wrapping the original, is callable with the same name
as the original L<[19]|/item__5b19_5d>.
Two more things deserve our attention.
=over 3
=item *
Since we're juggling, once more, with symbolic references (in lines
5 and 12), the C<no strict 'refs'> is essential. Moreover, Perl is
very suspicious when something is assigned to a code slot and would
warn with a message
Subroutine ... redefined
=for html <a name="fn_20"></a>
if C<use warnings> (or the switch C<-w>) is on L<[20]|/item__5b20_5d>.
But C<create_wrapper> isn't intended to be used mischievously, and
that's why the C<no warnings> pragma is there to avoid these warnings.
In legacy code older than Perl 5.6, one might find
local $^W = 0;
which isn't quite as precise, since it turns off all warnings.
=item *
Wrapping (or completely redefining) a subroutine at runtime isn't
restricted to old-fashioned subroutines. You may also wrap methods,
including C<AUTOLOAD> and C<DESTROY>, which, of course, are nothing
but subroutines with a specific calling convention. This means that
the behaviour of autoloading code and the one of C<DESTROY> can be
influenced without changing the packages that use them. Moreover
it's possible to add methods to classes dynamically at runtime, or
to modify existing methods.
Wrapping won't work with reserved I<block> names like, e.g., C<BEGIN>
because these names are recognized by the parser and stored internally
in some other manner. A typeglob represents exactly one name, but there
may be more than one C<BEGIN> block.
=back
The following example shows a call of C<create_wrapper> in the file
F<cre_wrp.pl>:
hello(qw(a b c));
my $oldfunc = create_wrapper('hello');
hello(qw(d e f));
{
no warnings 'redefine';
*hello = $oldfunc;
}
hello(qw(g h i));
This is the output:
Args: a,b,c
Caller: main,cre_wrp.pl,1
WRAPPER: Hi! main::hello(d e f) was called:
WRAPPER: from 'main', file 'cre_wrp.pl', line 3
Args: d,e,f
Caller: main,cre_wrp.pl,3
Args: g,h,i
Caller: main,cre_wrp.pl,8
In the first line the original subroutine is called, printing the
arguments and the caller. Then the wrapper is created, and another call
of C<hello()> produces the output of the wrapper code before the
output from C<hello()> itself. Notice that the results produced by
C<caller> -- apart from the line number -- is the same in both cases,
so it's a "real" wrapper.
The code from line 4 on shows how to revert the effect of C<create_wrapper>.
The return value from the call in line 2, a reference of the original
subroutine, is saved in the scalar C<%oldfunc>. The code in the block
assigns this scalar to the typeglob, reestablishing the old state.
Thus, the next call to C<hello()> doesn't activate the wrapper, merely
the original subroutine. Again, C<no warnings 'redefine';> cleanly
suppresses the warning about some subroutine being redefined.
This technique could be extended by saving the second code reference
internally and by providing two subroutines like C<wrapper_on> and
C<wrapper_off> to toggle between the two versions of calling with or
without the wrapper, respectively.
=head4 Working with filehandles and formats
=for roff .nh
In addition to the types discussed so far, there are two more:
I<filehandles> and I<formats>. One thing that differentiates them
from the others is that they don't have a distinctive sigil in their
names; you use a so-called I<bareword> to refer to them. This implies
that they cannot occur anywhere except in such places where the parser
is expecting them. This, for one thing, excludes them from from being
used as subroutine arguments (except in built-in functions, of course).
=for html <a name="fn_21"></a>
This isn't much of a problem with formats, which aren't much in use
anyway L<[21]|/item__5b21_5d>. It is, however, a nuisance that
filehandles cannot be passed as arguments or used as return values,
and that C<local> cannot be applied to them. As soon as you realize
that a typeglob can be used in all places where a filehandle is expected,
and that the IO slot of the typeglob is used to access the filehandle,
then the restrictions imposed by the bareword naming of handles cease
to be a problem L<[22]|/item__5b22_5d>.
=for html <a name="fn_22"></a>
To pass the handle of some already opened file to a subroutine you can
write
open FH,'input.dat';
process(*FH);
sub process {
my $fh = shift;
while (<$fh>) {
...
}
}
To the subroutine C<process()> we pass the typeglob with the same name
as the filehandle. Within the subroutine, this is copied to a scalar,
which in turn is used to access the typeglob from where the IO slot
is taken to get the original file handle. An alternative would be
this (inferior) possibility
sub process {
*SUBFH = shift; # not recommended
while (<SUBFH>) {...
This copies the typeglob into another one, so that wherever a filehandle
may be used as a bareword this second filehandle is written, as usual,
as a bareword. This alternative is inferior to the previous one because
the typeglob used in the subroutine is global, violating a sound
principle of good coding practice by creating the possibility of
unexpected name clashes.
Here is yet another variant, again not recommended:
$name = 'FH'; # not recommended
open $name,'>output.dat'; # same as open FH,'>output.dat';
print $name 'Hello world!'; # same as print FH 'Hello world!;
=for html <a name="fn_23"></a>
This simply uses a symbolic reference L<[23]|/item__5b23_5d>; the scalar
that's used as the first argument in C<open> contains a string which is
interpreted as a handle (or typeglob) name. Using symbolic references
in connection with IO handles is dangerous because there is no connection
between the string scalar storing the handle name and the handle itself.
Consider:
sub process {
my $fh = 'FH';
open $fh,'>temp.tmp.';
print $fh 'Hello world!';
}
The lexical variable that's used as a filehandle is declared locally.
What happens when the subroutine terminates? Will the filehandle be
closed automatically? Certainly not -- C<$fh> just stores a string,
and the global typeglob C<*FH> together with its filehandle remain in
existence, keeping the file open, most likely until the program terminates.
If you do want to achieve an automatic close of a file when a
subroutine terminates, you can I<localize> the typeglob that's used
as a handle:
sub process {
local *FH;
open FH,'>temp.tmp';
...
} # temp.tmp is now closed
Localizing a typeglob creates temporary copies for I<all> slots,
replacing the original ones. If some file was already open via the
handle C<FH>, it'll remain open and untouchable within C<process>.
A new one is created in the IO slot and used in the open of the
temporary file. As soon as the subroutine terminates, Perl has to
restore the original slots. This is where Perl detects that the
temporary slot is still busy and that the file has to be closed
before it can be released.
More information on localizing -- particularly in connection with
typeglobs -- can be found in L<perlscopetut|perlscopetut>.
Perl 5.6 introduced I<autovivification> for IO handles. This process
of creating things by reference is well-known for arrays and hashes:
$top->[4]->{gamma}->[76] = 42;
This sets the 77th element of an anonymous array that's being pointed
to by the element with the key C<gamma> of an anonymous hash, and this,
in turn, is referenced by the fifth element of another anonymous array,
the reference of which is finally stored in C<$top>. Autovivification
lets you execute this assignment without requiring you to create all
intermediate levels. It is sufficient that C<$top> contains an array
reference; the hash and the other array are created automatically.
Since Perl 5.6, it is now possible to execute
open $fh,'>temp.tmp';
=for html <a name="fn_24"></a>
with the scalar C<$fh> still being undefined. No strings are attached
to an undefined scalar (and so there is no string that could be used
as a symbolic reference), and that's why a typeglob with a suitably
initialized IO slot are brought into existence. The scalar C<$fh>
is assigned the resulting typeglob reference L<[24]|/item__5b24_5d>,
providing the access to the filehandle. It isn't quite as anonymous
as the arrays or hashes in the above example, because you can access
the NAME slot of the typeglob, and you could even use that funny name
stored there to access the typeglob's other slots:
print *{$fh}{NAME}; # prints "$fh"
Returning to using autovivificated file handles, we notice that in
this block
{
open my $fh,'>temp.tmp';
print $fh 'Hello world!';
...
} # temp.tmp is now closed
the lexical scalar $fh is discarded upon exit, and, if this is the last
reference to the typeglob, causes the typeglob to be dissolved after
closing the IO handle.
Autovivification doesn't just happen in an C<open> call but is also
supported for all other built-in functions where a handle is initialized:
C<opendir>, C<pipe>, C<sysopen>, C<socket> and C<accept>.
If you're using an older Perl version you don't have autovivification,
but handle references are available. You'll just have to do a little
extra work to achieve essentially the same thing. Let's try this:
my $fh = \*FH; # save a glob reference
delete $main::{FH}; # delete the stash element (?)
open $fh,'>x.x'; # use reference as a filehandle
=for html <a name="fn_25"></a>
First we create a reference to the typeglob C<FH>. Then the typeglob's
entry is deleted from the stash [25]|/item__5b25_5d>, and the scalar
can be used just like a handle. But there is a hidden snag attached to
this approach What if you need more than one handle? Reusing C<FH> is
bound to give you the same reference over and over again, which is
definitely not what we want. For the correct solution, let's study the
function C<Symbol::geniosym()>, which contains (somewhat simplified>
this code:
sub geniosym {
my $name = 'GEN'. $genseq++;
my $sym = \*{'Symbol::' . $name}; # no strict 'refs'
delete $Symbol::{$name};
select(select $sym);
return *{$sym}{IO};
}
=for html <a name="fn_26"></a>
The first three lines create a typeglob in the stash of the package
C<Symbol> and destroy it right away, after the reference has been
saved in C<&sym>. The C<select> call forces the initialization of
the IO slot, so that the last line returns an IO handle reference in
mint condition L<[26]|/item__5b26_5d>. This has the additional
advantage that the detour through the typeglob isn't necessary. The
most noteworthy thing in this subroutine, however, is that the
typeglob is created I<dynamically>, pulling a new one out of the hat
each time C<geniosym()> is being called.
To summarize, C<open> (and all other functions that initialize
a filehandle) can be called with a scalar that has been set up by:
$fh = undef; open($fh,'x.x'); # autovivifying
$fh = \*GLOB; open($fh,'x.x'); # glob reference
$fh = *GLOB; open($fh,'x.x'); # typeglob
$fh = *GLOB{IO}; open($fh,'x.x'); # IO (handle) reference
=for html <a name="fn_27"></a>
Methods 1 and 4 are available since Perl 5.6 L<[27]|/item__5b27_5d>.
The autovivifying technique of method 1 results, as we have seen,
in a glob reference, just as in method 2.
Working with a glob reference stored in some lexical scalar is the simplest
and safest way to handle filehandles. Legacy code from before Perl 5.6
had to resort to typeglobs, so that knowing about this alternative
is still useful.
=head2 Footnotes
=for roff .nh
=over 6
=item [1]
This is an example footnote.
=for html Back to the <a href="#fn_1">text</a>.
=item [2]
There are exceptions to this rule: the symbols C<ARGV>, C<ARGVOUT>, C<ENV>,
C<INC>, C<SIG>, C<STDERR>, C<STDIN> and C<STDOUT> are I<always> located
in the stash C<Z<>%main::>, even when the current C<package> reads different.
If these symbols should be stored in any stash, the symbol name always
has to be fully qualified. The special meaning that's associated with
these variables, however, remains restricted to those in the stash
C<Z<>%main::>.
Notice that we're referring to I<symbols>. The aforesaid is, for instance,
true for all the variables C<Z<>$ENV>, C<Z<>@ENV>, C<Z<>%ENV>, C<Z<>&ENV>,
the filehandle C<ENV> and the format C<ENV>, i.e., everything that has
C<ENV> in its name.
The same is true for all "special" variables, i.e., the ones with
names containing special characters or not beginning with an alphabetic
character. The parser doesn't accept a package name in connection with
these variables so that they are necessarily restricted to the stash
C<%Z<>main::>.
=for html <a href="#fn_2">Back</a>
=item [3]
Stash elements are created during compilation, and the generated code
uses their addresses to access their contents. An access via the stash
is only required if the name is constructed at runtime, e.g.:
my $package = 'Foo';
my $name = 'bar';
${$package.'::'.$name} = 'bletch';
It's obvious that symbolic references cannot be resolved at compile time.
=for html <a href="#fn_3">Back</a>
=item [4]
In the current implementation, the scalar slot of a typeglob is I<always>
occupied, even when there is no scalar variable of that name. If that is
so, the slot still contains a reference to a scalar that is C<undef>.
=for html <br><a href="#fn_4">Back</a>
=item [5]
Use this possibility with utmost restraint while considering that
someone else may have to maintain your code later on.
=for html <br><a href="#fn_5">Back</a>
=item [6]
This happens during compilation; at runtime the address produced by the
compiler is used to access the typeglob.
=for html <br><a href="#fn_6">Back</a>
=item [7]
Moreover, there are considerable differences in the handling of lexical
variables that are declared at module level but used inside of some
subroutine. See L<perlscopetut|perlscopetut>.
=for html <br><a href="#fn_7">Back</a>
=item [8]
The same is true for subroutine and handle names. A C<use strict 'vars'>
refers to variables only -- it does what it says.
=for html <br><a href="#fn_8">Back</a>
=item [9]
The typeglob is assigned a reference to C<undef>, a scalar value, and that's
why it is a scalar reference, bound to override whatever is stored in the
typeglob's scalar slot. In other words, the following statements have the
same effect:
*hugo = \undef;
$hugo = undef;
=for html <br><a href="#fn_9">Back</a>
=item [10]
The hash slot of the typeglob of C<Z<>$main::{main::}> contains a
reference to C<Z<>%main::>. Otherwise it wouldn't be possible to get
at the top level symbol table by writing
*main::{HASH}
Because a typeglob is always stored in an I<element> of a stash, there
is bound to be a hash entry for the element C<main::>, i.e., C<Z<>%main::>.
That's why the typeglob C<Z<>*main> could be referred to as the
hash element
$main::{'main::'}
Whew! Notice that the key has to be a quoted string to avoid a parser
hiccup -- it would simply swallow the colons. Even a parser has its
limits.
=for html <br><a href="#fn_10">Back</a>
=item [11]
Keep in mind that this code is executed in the context of a C<BEGIN>
block, i.e., I<before> the variables established by the partial aliasing
are accessed.
=for html <br><a href="#fn_11">Back</a>
=item [12]
I<There Is More Than One Way To Do It> -- one of the mantras of the
Perl Community, frequently cited to allude to the fact that Perl is
a rich (or not orthogonal) language that provides ways and means to
solve one problem by different techniques. (This doesn't mean that
all are equally good.)
=for html <br><a href="#fn_12">Back</a>
=item [13]
The expression
*{$pkg.'::'.$sym}{GLOB}
returns a I<reference> to the typeglob. That's not what is required here,
because the Exporter should perform a I<full aliasing> whenever something
like C<*symnam> has been specified. As we have seen, this is achieved
by assigning a typeglob to another one. Therefore, the original code
lacks a referencing operator preceding the C<Z<>'*'>.
Current Perl versions don't distinguish between the assignments
*alpha = *beta;
*alpha = \*beta;
and the compiler generates identical code, and the second assignment
does the same thing at runtime. This behaviour isn't documented and is
subject to change in future versions. Exporter doesn't rely on this,
and you shouldn't, too.
=for html <br><a href="#fn_13">Back</a>
=item [14]
Perl 5.6 introduced the possibility of defining global variables with
C<our>. See L<perlsub|perlsub> and L<perlscopetut|perlscopetut> for
details.
=for html <br><a href="#fn_14">Back</a>
=item [15]
There's a subtle difference between C<Exporter.pm> and C<vars.pm>. The
latter assigns a I<reference> to the target typeglob if the declared
thing is a typeglob (e.g., C<use vars '*hugo'>). The Exporter performs
I<direct aliasing>. But, as explained in L<[13]|/item__5b13_5d>, this
doesn't make any difference, at least up to now. A future version may
change this, requiring C<vars.pm> to follow suit.
Both modules are alike with respect to their code executing as a
C<BEGIN> block, i.e., during compilation, which influences the
compiler's behaviour by aliasing or importing.
=for html <br><a href="#fn_15">Back</a>
=item [16]
Moreover, something has to be I<imported>, meaning that the currently
established package has to I<differ> from the one that's about to
receive the goods. That's why some code like this wouldn't work:
use strict 'vars';
package main;
BEGIN {
*main::foo = \$main::foo;
}
$foo = 2;
If we change C<package main> to C<package yours> and insert
C<package main;> in front of C<Z<>$foo = 2;>, then it would work.
You don't have to worry about this with the pragmas C<use vars> or
C<use subs>, because they are elaborated while the package they apply
to is active.
=for html <br><a href="#fn_16">Back</a>
=item [17]
Even though the two forms
*glob = *func{CODE};
*glob = \&func;
achieve the same result, different steps are executed (so that the
generated code differs). The first form retrieves the code slot of
a typeglob, and assigns the reference that's stored there. The second
retrieves C<&func> and creates a reference to the subroutine, and
assigns this value. The second form produces, at run time, the very
same value that has already been stored by the compiler into the code
slot of the typeglob.
Usually the second form is used, even though the first one reflects
more precisely what's actually going on and is slightly faster as it
just copies a stored value.
=for html <br><a href="#fn_17">Back</a>
=item [18]
Actually, this isn't quite correct. A precise interpretation of
goto &$oldsub;
would be: "Call the subroutine that is referenced by C<$oldsub> (as a hard
or symbolic reference) and go to that place that's pointed to by the
code reference returned by the call." That's hardly what you want, so
the parser exhibits intelligence by interpreting it as
goto \&$oldsub;
what is indeed what you expect. This goes to show that one might have
written, just as well:
goto $oldsub;
Curiously, the literature tends to stick to the aforementioned incorrect
version, which, therefore, was adopted in the example.
=for html <br><a href="#fn_18">Back</a>
=item [19]
As a matter of fact, C<create_wrapper> creates a I<closure>, i.e., a
subroutine reference accessing a lexical variable (C<Z<>$oldsub>) that's
declared in some containing scope. This encloses the variable's instance
existing at the time the subroutine reference was created, along with
the subroutine, into the package representing the closure. The closure
may be created repeatedly, but each has its very own instance of
C<Z<>$oldsub>. More about this topic can be found in
L<perlref|perlref> and L<perlscopetut|perlscopetut>.
Most of the time a closure contains an anonymous subroutines. But here,
aimed at replacing some existing, named subroutine, is is given a
name by the assignment of its code reference to a typeglob.
=for html <br><a href="#fn_19">Back</a>
=item [20]
And you do use that, don't you? ;-)
=for html <br><a href="#fn_20">Back</a>
=item [21]
Wrongfully. Perl's built-in formatting system that is controlled by the
variables C<$.>, C<$%>, C<$=>, C<$->, C<$~>, C<$^>, C<$:> and C<$^L> and the
statement C<format> (for declaring) and C<write>(for executing) is top notch.
Among other things, Perl owes the "r" in its name (I<Practical Extraction
and B<R>eport Language>) to the L<format feature|perlform>. The only
restriction is that formats have to be declared (like packages, subroutines
and lexical variables), at compile time.
=for html <br><a href="#fn_21">Back</a>
=item [22]
The possibilities that are due to I<autovivifying> (since Perl 5.6)
will be discussed later on. We'll talk about the ways and means
available i older Perl versions first.
=for html <br><a href="#fn_22">Back</a>
=item [23]
What quickly becomes obvious if C<strict 'refs'> is in effect.
=for html <br><a href="#fn_23">Back</a>
=item [24]
You can see this by executing:
print ref $fh # prints 'GLOB' (a glob reference)
=for html <br><a href="#fn_24">Back</a>
=item [25]
Assuming that the code is executed in the package C<main>.
=for html <br><a href="#fn_25">Back</a>
=item [26]
You may be surprised by the output of
use Symbol 'geniosym';
my $sym = geniosym();
print ref $sym; # prints 'IO::Handle'
The implementation has it so that an C<IO::Handle>, the thing that's
being pointed to by the blessed reference in C<$sym>, is an I<object>.
This enables us to use IO Handles as object instances, and therefore
the following statements are (almost) the same:
print $sym 'Hello';
$sym->print('Hello');
Using typeglobs instead, we can write
print FH 'Hello'; # or: print {*FH} 'Hello';
FH->print('Hello'); # or: *FH->print('Hello');
Contrary to appearance, C<FH> is not a class name but an object instance.
Taking the IO slot from the typeglobs C<*FH>, we get an object and call
its method C<print>.
The object-oriented approach has the benefit of greater flexibility,
as methods can be added by subclassing, thereby extending the set
of available functions for an IO handle far beyond what's available
through Perl's built-in functions. Moreover, it does away with the
necessity of selecting a "current" filehandle prior to calling certain
functions or accessing variables related to output.
=for html <br><a href="#fn_26">Back</a>
=item [27]
In Perl versions older than 5.6 it is possible to pass a handle
reference that hasn't been used yet to open, provided that it has
been I<initialized> via the typeglob:
select (select *GLOB); # initialize
my $fh = *GLOB{IO}; # get the IO reference
open $fh,'x.x'; # pass to open()
The C<select> function creates a handle structure and stores a pointer
to it into the IO slot of the given typeglob. After this, it can be
stored into a scalar and passed to C<open()>.
You won't need this with all the other options being around, but this
explains why C<Symbol::geniosym> (which is available for older Perl
version as well) has to use this trick.
=for html <br><a href="#fn_27">Back</a>
=back
=head1 SEE ALSO
L<perlsub|perlsub>,
L<perlref|perlref>,
L<perldata|perldata>,
L<perlmod|perlmod>,
L<perlform|perlform>,
L<perlscopetut|perlscopetut>.
=head1 AUTHOR AND COPYRIGHT
Copyright (c) 1998, 2000, 2006 by Ferry Bolhar F<bol@adv.magwien.gv.at>.
All rights reserved.
This documentation is free; you can redistribute it and/or modify it
under the same terms as Perl itself.
Any document derived from this documentation must contain this
copyright notice in full.
Thread Next
-
new tutorial perlglobtut, translated
by Wolfgang Laun