develooper Front page | perl.perl5.porters | Postings from July 2001

[PATCH] Re: Bug in File::Find

Thread Next
July 27, 2001 08:57
[PATCH] Re: Bug in File::Find
Message ID:


Nicholas Clarc wrote:
>On Thu, Jul 26, 2001 at 11:26:48AM -0700, B. Cowgill wrote:
>> I believe I have discovered a bug in File::Find. It only happens for me
>> when running perl on a linux box while trying to find files within
>> a directory which is mounted with smbmount.  The samba server is a Win2K
>> box.
>> Is this the right place to send a bug report?  If so, let me know and
>> I'll send all the details, with a sample program and test plan.
>Looking at a samba mount here I see
>drwxr-xr-x    1 nclark   nclark        512 Jul 18 12:09 Customers
>drwxr-xr-x    1 nclark   nclark        512 Jul 26 12:26 Development
>              ^ link count is 1
>By default File::Find is assuming that it can rely on a directory's link
>count being 2 + number of sub directories, which holds true on nearly all
>Unix file system types.
>You can turn this assumption off by setting $File::Find::dont_use_nlink =
>which should work on File::Find at least as far back as 5.004_05.
>There was a long thread about this link count assumption starting at

>There's a summary of that thread in "This Week on perl5-porters" archived

- From there I read:

   field of stat was inaccurate; one suggestion was just to set the
   dont_use_nlink configuration variable, but that's horrendously slow.
   Tels provided a patch to File::Find which avoids using nlink where
   possible, but Andy pointed out that it isn't a general solution,
   because it could get expensive to stat every directory to make sure
   you haven't passed a filesystem boundary and need to change your nlink

I am not aware of Andys message, if soemone has a copy, please forward it
to me. (Or did I forgot it? oh well...)

If I remember correctly, my solution does not slow it down except for one
"if ()". There are no additional stats because:

* if the nlink count is > 2, all is well and we precede as usual.
* if it is below (aka 1 or 0), we have a filesystem with wrong nlink count
and do like dont_use_nlink variable was temp. == 1. In this case the slow
method is used, but it would have to be used anyway.

Summary: File systems with correct nlinks are traveled fast, upon
entering one without we slow down and get faster again when the slow
filesystem is left. This decision is on a per-directory basis, so having
two slow filesystems (like /cdrom and /mnt/sambe/server) in your tree work.

Unless I overlooked something really big, there would be no slowdown.

Here is benchmark done on my system. I mounted some random cdrom. The
times are after caching (e.g. the second run) to eliminate HD access times
skewing the benchmark:

        root@null:/root/ > cat
        #!/usr/bin/perl -w

        use File::Find;

        my $c = 0;
        find (\&wanted, "/usr", "/cdrom");

        print "counted $c\n";

        sub wanted


        count   time    methdo used             patched
        42809   0.507   w/o cdrom mounted       no
        42812   0.499   mounted                 no
        42859   2.747   dont_use_nlink = 1      no
        42859   0.507   dont_use_nlink = 0      yes

The interesting thing is that 2-8 seconds after the run the HD activity
goes trough the roof. Probably Linux updating all the file's last access
time stamps ;o)

Notes: The time is too small to measurement any slowdowns accurately, but
even _if_ there are slowdowns, it is really faster (and as correct) as the
dont_use_nlink method. I remember that on my old system the slowdown due to
$dont_use_nlink=1 was much greater, but this system was a lot slower
(CPU-wise) than I have now.

The patch is appended, applies against perl@11446. What is left is possible
to remove the dont_use_nlink = 1 for the specified platforms like OS/2,
DOS etc. But I doubt that they will have correct nlinks anytime in the
future, and would thus profit from this patch.

   Andreas got in touch with the author of Linux's tmpfs who
   provided a patch to it. Alan Cox bitched that applications demanding
   traditional Unix semantics for nlink were buggy, but applied the patch
   anyway, so tmpfs and File::Find will play nice again at some
   unspecified point in the future.

This is of course a good solution to increase the File::Find speed, but it
won't help all these people that have file systems with wrong nlink count
as part of their filetree (not as whole). Specifically, as a author of any
script using File::Find, you would be _forced_ to either use my
patch or dont_use_nlink, because you _never_ know when a user of your script
will encounter such a filesystem. There are plenty ones, MSDOS, ISO9660
CD-ROMS, Samba and I believe AFS, and probably more. Just imagine what
happens when an admin burns part of his filesystem to a CD-R and mounts it
at his server, which you have mounted somehow. Your script using File::Find
would suddenly leave that part out. And $dont_use_nlink=1 is not really
a inteligent solution ;)

>but it does seem that even the most recent development File::Find is still
>making this link count assumption.

Richard Soderberg said he missed my email with the patch. Seems either
there were more arguments against it, or he simple forgot, or his system
crashed again ;)


Te"Why is the length of a discussion inverse proportional to the patch's
size? ;)"ls

- -- 
perl -MMath::String -e 'print \
Math::String->from_number("215960156869840440586892398248"),"\n"'     Thief - The Dark Project       My current Perl projects       Fight for your right to link.
 PGP key available on or via email 

Version: 2.6.3i
Charset: latin1


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About