develooper Front page | perl.perl5.porters | Postings from February 2011

[perl #85026] deleting elements in a HASH iterator

From:
perl @ ton . iguana . be
Date:
February 28, 2011 00:27
Subject:
[perl #85026] deleting elements in a HASH iterator
Message ID:
rt-3.6.HEAD-28241-1298829225-1617.85026-75-0@perl.org
# New Ticket Created by  perl@ton.iguana.be 
# Please include the string:  [perl #85026]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=85026 >



This is a bug report for perl from perl@ton.iguana.be,
generated with the help of perlbug 1.39 running under perl 5.10.1.


-----------------------------------------------------------------
[Please describe your issue here]

When debuging a core dump in some XS code i reduced it to what seems to
indicate a long standing bug in deleting elements in a HASH that is
being iterated over.

Internally a perl HASH is an array of single linked chains of entries.
Deleting an element means removing the correct chain entry by replacing
the pointer to the removed entry with a pointer to the next entry and then
freeing the deleted entry

However, if the deleted element is the current entry the deleted entry is
kept after removing it from the chain and the LAZYDEL flag is set. Only
on the next iteration is the element actually removed and the iterator is
set to the next entry.

However, if you delete the current iterator and then delete the next
element in the same chain the "next" pointer of the iterator is not updated
because the iterator is not on the chain anymore. That means that when the
next iteration looks up the iterator next pointer it will point to the freed
memory of the second element.

This is easily demonstrated with the next program. It first uses Devel::Peek
to find a chain with at least two elements, then iterates there and deletes
these two elements. The next iteration then causes a freed memory warning
(in my real code more had happened with the memory at that point and I ended
up with a core dump).

========================================================
#!/usr/bin/perl -w
use strict;
use Devel::Peek;
use File::Temp qw(tempdir);

my %hash = ("a".."z");

my $tmp_dir = tempdir(CLEANUP => 1);

sub riter {
    local *OLDERR;
    open(OLDERR, ">&STDERR") || die "Can't dup STDERR: $!";
    open(STDERR, ">", "$tmp_dir/dump") ||
        die "Could not open '$tmp_dir/dump' for write: $^E";
    Dump(\%hash);
    open(STDERR, ">&OLDERR") || die "Can't dup OLDERR: $!";
    open(my $fh, "<", "$tmp_dir/dump") ||
        die "Could not open '$tmp_dir/dump' for read: $^E";
    local $/;
    my $dump = <$fh>;
    my ($riter) = $dump =~ /^\s*RITER\s*=\s*(\d+)/m or
        die "No plain RITER in dump '$dump'";
    return $riter;
}

my @riters;
while (my $key = each %hash) {
    push @{$riters[riter()]}, $key;
}

my ($first_key, $second_key);
my $riter = 0;
for my $chain (@riters) {
    if ($chain && @$chain >= 2) {
        $first_key  = $chain->[0];
        $second_key = $chain->[1];
        last;
    }
    $riter++;
}
$first_key ||
    die "No 2 element chains. Please retry with a different intial HASH";
$| = 1;

# Ok all preparation is done
print <<"EOF"
Found keys '$first_key' and '$second_key' on chain $riter
Will now iterato to key '$first_key' then delete '$first_key' and '$second_key'.
EOF
;
1 until $first_key eq each %hash;
delete $hash{$first_key};
delete $hash{$second_key};

print "Now iterating into freed memory\n";
1 for each %hash;
print "Done (not reached)\n";
========================================================

Most of that code above is to analyze the HASH chains. But once you know the
keys on a 2-element (or more) chain the problem can be demonstrated directly
too. E.g. on my system the HASH function works out so that the following code:

perl -wle '%a=("a".."z"); each %a; delete $a{w}; delete $a{e}; 1 for each %a'

will give:

Use of freed value in iteration at -e line 1

The fix is not so clear. Ideas would be:
   - Don't remove lazydel elements from the entry chain. Then its next
     pointer would always be upto date. But all iteration code would have
     to be fixed to skip the deleted iterator and getting the
     PL_sv_placeholder code right could be a bit tricky for restricted
     hashes. Still, I think this would be the cleanest fix, but it will
     break existing code that thinks it knows howe to iterate hash entries
     by itself
   - Fix hv_free_ent
     This is good because it would put all the fixup in one place. But it
     would implicitely assume that by the time an entry is passed to
     hv_free_ent its next pointer should still be valid. It also means doing
     the extra test even though the place it gets called is one where the
     lazydel problem can't happen (which is quite common)
   - Fix the places where the delete is done
     Drawback is that you may never forget to do the lazydel fixup in at any
     place where the entry chain gets shortened
     Anyways, this is what I used in my sample fix.
   - Always delete the current entry but if it is the one in the iterator, 
     replace EITER by the next address , and LAZYDEL just means that 
     the EITER poiner is already increased. That still needs code to update
     the EITER pointer if THAT entry is deleted, but it still leads to simpler
     code needing less HeNEXT()

Looking at the delete code I see another strange thing. If the delete of the
current iterator is done with the G_DISCARD flag, the corresponding value is
not freed and survives until the lazy deleted entry gets removed on the next
hash iteration. This is easily demonstrated like this:

perl -wle '
sub DESTROY { print "DESTROY" }
%a=(a=>bless[]);
each %a;
delete $a{a};
print "END"
'

This prints:

END
DESTROY

notice the difference with:

perl -wle '
sub DESTROY { print "DESTROY" }
%hash = (a => bless[]);
each %hash;
$dummy = delete $hash{a}; $dummy = 0;
print "END"
'

This prints:

DESTROY
END

This is easily solved by always replacing the deleted entry value with
&PL_sv_placeholder. It actually simplifies the code a bit except for the
fact that the mro_method_changed from free_hash_ent now has to be done
in hv_delete

A further note: while debugging this issue it was annoying that 
Devel::Peek::Dump doesb't actually dump the HASH elements when an
iterator is active. Also added is a patch that does the iteration to
dump the HASH contents by iterating over it by hand (not disturbing
any active iterator). With that it also doesn't activate hash magic 
during iteration, which I think is a feature

Next a few proposed patches solving all this. They are for perl 5.10.1,
(a quick look at blead indicates the code there seems unchanged, so I think
all this is still relevant)

Patch for deleting the element after a delete iterator
============================================================
--- /home/ton/src/perl-5.10.1/hv.c	2009-06-10 14:36:34.000000000 +0200
+++ hv.c	2011-02-27 14:46:28.622503735 +0100
@@ -1060,8 +1060,12 @@
 	    }
 	    if (SvOOK(hv) && entry == HvAUX(hv)->xhv_eiter /* HvEITER(hv) */)
 		HvLAZYDEL_on(hv);
-	    else
+	    else {
+		if (SvOOK(hv) && HvLAZYDEL(hv) &&
+		    entry == HeNEXT(HvAUX(hv)->xhv_eiter))
+		    HeNEXT(HvAUX(hv)->xhv_eiter) = HeNEXT(entry);
 		hv_free_ent(hv, entry);
+	    }
 	    xhv->xhv_keys--; /* HvTOTALKEYS(hv)-- */
 	    if (xhv->xhv_keys == 0)
 	        HvHASKFLAGS_off(hv);
@@ -1611,8 +1615,12 @@
 		    HvFILL(hv)--; /* This linked list is now empty.  */
 		if (entry == HvEITER_get(hv))
 		    HvLAZYDEL_on(hv);
-		else
+		else {
+		    if (SvOOK(hv) && HvLAZYDEL(hv) &&
+			entry == HeNEXT(HvAUX(hv)->xhv_eiter))
+			HeNEXT(HvAUX(hv)->xhv_eiter) = HeNEXT(entry);
 		    hv_free_ent(hv, entry);
+		}
 
 		if (--items == 0) {
 		    /* Finished.  */
============================================================

Patch to delete the current iterator value when using G_DISCARD:
============================================================
--- hv.c.old	2011-02-27 16:18:34.666306057 +0100
+++ hv.c	2011-02-27 17:11:05.540720971 +0100
@@ -1034,12 +1034,17 @@
         if (k_flags & HVhek_FREEKEY)
             Safefree(key);
 
-	if (d_flags & G_DISCARD)
-	    sv = NULL;
-	else {
-	    sv = sv_2mortal(HeVAL(entry));
-	    HeVAL(entry) = &PL_sv_placeholder;
-	}
+	if (d_flags & G_DISCARD) {
+	    sv = HeVAL(entry);
+	    if (sv) {
+		/* deletion of method from stash */
+		if (isGV(sv) && isGV_with_GP(sv) && GvCVu(sv) && HvNAME_get(hv))
+		    mro_method_changed_in(hv);
+		SvREFCNT_dec(sv);
+		sv = NULL;
+	    }
+	} else sv = sv_2mortal(HeVAL(entry));
+	HeVAL(entry) = &PL_sv_placeholder;
 
 	/*
 	 * If a restricted hash, rather than really deleting the entry, put
@@ -1047,13 +1052,11 @@
 	 * we can still access via not-really-existing key without raising
 	 * an error.
 	 */
-	if (SvREADONLY(hv)) {
-	    SvREFCNT_dec(HeVAL(entry));
-	    HeVAL(entry) = &PL_sv_placeholder;
+	if (SvREADONLY(hv))
 	    /* We'll be saving this slot, so the number of allocated keys
 	     * doesn't go down, but the number placeholders goes up */
 	    HvPLACEHOLDERS(hv)++;
-	} else {
+	else {
 	    *oentry = HeNEXT(entry);
 	    if(!*first_entry) {
 		xhv->xhv_fill--; /* HvFILL(hv)-- */
============================================================

Patch to iterate a hash by hand during do_sv_dump:
============================================================
--- /home/ton/src/perl-5.10.1/dump.c	2009-07-09 17:00:51.000000000 +0200
+++ dump.c	2011-02-27 14:59:18.516122042 +0100
@@ -1744,29 +1744,41 @@
 			   dumpops, pvlim);
 	    }
 	}
-	if (nest < maxnest && !HvEITER_get(sv)) { /* Try to preserve iterator */
-	    HE *he;
+	if (nest < maxnest) {
 	    HV * const hv = MUTABLE_HV(sv);
-	    int count = maxnest - nest;
+	    STRLEN i;
+	    HE *he;
 
-	    hv_iterinit(hv);
-	    while ((he = hv_iternext_flags(hv, HV_ITERNEXT_WANTPLACEHOLDERS))
-                   && count--) {
+	    if (HvARRAY(hv)) {
+		int count = maxnest - nest;
+		for (i=0; i <= HvMAX(hv); i++) {
+		    for (he = HvARRAY(hv)[i]; he; he = HeNEXT(he)) {
+			U32 hash;
+			SV * keysv;
+			const char * keypv;
+			SV * elt;
 		STRLEN len;
-		const U32 hash = HeHASH(he);
-		SV * const keysv = hv_iterkeysv(he);
-		const char * const keypv = SvPV_const(keysv, len);
-		SV * const elt = hv_iterval(hv, he);
+
+			if (count-- <= 0) goto DONEHV;
+
+			hash = HeHASH(he);
+			keysv = hv_iterkeysv(he);
+			keypv = SvPV_const(keysv, len);
+			elt = HeVAL(he);
 
 		Perl_dump_indent(aTHX_ level+1, file, "Elt %s ", pv_display(d, keypv, len, 0, pvlim));
 		if (SvUTF8(keysv))
 		    PerlIO_printf(file, "[UTF8 \"%s\"] ", sv_uni_display(d, keysv, 6 * SvCUR(keysv), UNI_DISPLAY_QQ));
+			if (HvEITER_get(hv) == he)
+			    PerlIO_printf(file, "[CURRENT] ");
 		if (HeKREHASH(he))
 		    PerlIO_printf(file, "[REHASH] ");
-		PerlIO_printf(file, "HASH = 0x%"UVxf"\n", (UV)hash);
+			PerlIO_printf(file, "HASH = 0x%"UVxf"\n", (UV) hash);
 		do_sv_dump(level+1, file, elt, nest+1, maxnest, dumpops, pvlim);
 	    }
-	    hv_iterinit(hv);		/* Return to status quo */
+		}
+	      DONEHV:;
+	    }
 	}
 	break;
     case SVt_PVCV:
============================================================

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=medium
---
Site configuration information for perl 5.10.1:

Configured by Debian Project at Mon Jul 12 12:19:34 UTC 2010.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:
   
  Platform:
    osname=linux, osvers=2.6.24-27-server, archname=x86_64-linux-gnu-thread-multi
    uname='linux yellow 2.6.24-27-server #1 smp fri mar 12 01:23:09 utc 2010 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.4.4', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.12.so, so=so, useshrplib=true, libperl=libperl.so.5.10.1
    gnulibc_version='2.12'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'

Locally applied patches:
    DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts
    DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to /etc/perl as /usr may not be writable.
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
    DEBPKG:debian/perl_synopsis - http://bugs.debian.org/278323 Rearrange perl.pod
    DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in ODBM_File/NDBM_File. 
    DEBPKG:fixes/assorted_docs - http://bugs.debian.org/443733 [384f06a] Math::BigInt::CalcEmu documentation grammar fix
    DEBPKG:fixes/net_smtp_docs - http://bugs.debian.org/100195 [rt.cpan.org #36038] Document the Net::SMTP 'Port' option
    DEBPKG:fixes/processPL - http://bugs.debian.org/357264 [rt.cpan.org #17224] Always use PERLRUNINST when building perl modules.
    DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:fixes/pod2man-index-backslash - http://bugs.debian.org/521256 Escape backslashes in .IX entries
    DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in Compress::Raw::Zlib
    DEBPKG:fixes/kfreebsd_cppsymbols - http://bugs.debian.org/533098 [3b910a0] Add gcc predefined macros to $Config{cppsymbols} on GNU/kFreeBSD.
    DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
    DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
    DEBPKG:fixes/kfreebsd-filecopy-pipes - http://bugs.debian.org/537555 [16f708c] Fix File::Copy::copy with pipes on GNU/kFreeBSD
    DEBPKG:fixes/anon-tmpfile-dir - http://bugs.debian.org/528544 [perl #66452] Honor TMPDIR when open()ing an anonymous temporary file
    DEBPKG:fixes/abstract-sockets - http://bugs.debian.org/329291 [89904c0] Add support for Abstract namespace sockets.
    DEBPKG:fixes/hurd_cppsymbols - http://bugs.debian.org/544307 [eeb92b7] Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd.
    DEBPKG:fixes/autodie-flock - http://bugs.debian.org/543731 Allow for flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc
    DEBPKG:fixes/archive-tar-instance-error - http://bugs.debian.org/539355 [rt.cpan.org #48879] Separate Archive::Tar instance error strings from each other
    DEBPKG:fixes/positive-gpos - http://bugs.debian.org/545234 [perl #69056] [c584a96] Fix \\G crash on first match
    DEBPKG:debian/devel-ppport-ia64-optim - http://bugs.debian.org/548943 Work around an ICE on ia64
    DEBPKG:fixes/trie-logic-match - http://bugs.debian.org/552291 [perl #69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626]
    DEBPKG:fixes/hppa-thread-eagain - http://bugs.debian.org/554218 make the threads-shared test suite more robust, fixing failures on hppa
    DEBPKG:fixes/crash-on-undefined-destroy - http://bugs.debian.org/564074 [perl #71952] [1f15e67] Fix a NULL pointer dereference when looking for a DESTROY method
    DEBPKG:fixes/tainted-errno - http://bugs.debian.org/574129 [perl #61976] [be1cf43] fix an errno stringification bug in taint mode
    DEBPKG:patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.10.1-12 in patchlevel.h

---
@INC for perl 5.10.1:
    /etc/perl
    /usr/local/lib/perl/5.10.1
    /usr/local/share/perl/5.10.1
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.10
    /usr/share/perl/5.10
    /usr/local/lib/site_perl
    .

---
Environment for perl 5.10.1:
    HOME=/home/ton
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/ton/bin:/home/ton/bin.SampleSetup:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
    PERL_BADLANG (unset)
    SHELL=/bin/bash




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About