[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19045.49265.255525.260784@samba.org>
Date: Tue, 21 Jul 2009 23:19:45 +1000
From: tridge@...ba.org
To: Pavel Machek <pavel@....cz>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
James Bottomley <James.Bottomley@...senPartnership.com>,
Martin Steigerwald <Martin@...htvoll.de>,
Jan Engelhardt <jengelh@...ozas.de>,
Theodore Tso <tytso@....edu>,
Rusty Russell <rusty@...tcorp.com.au>, john.lanza@...ux.com,
OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>,
linux-fsdevel@...r.kernel.org,
Dave Kleikamp <shaggy@...ux.vnet.ibm.com>, corbet@....net,
jcm@...masters.org, torvalds@...ux-foundation.org
Subject: Re: CONFIG_VFAT_FS_DUALNAMES regressions
Hi Pavel,
> > - Similarly, there is a small chance that chkdsk on Windows will
> > rename one file in a directory if they happen to have the same 11
> > byte dummy values. The probability of this happening is
> > approximately 80x lower than with the previous patch.
>
> The "small chance" seems to be 90% for 30000 files in directory. And
> no, it is probably not 80x lower. Do the math.
Perhaps you could double check my math?
The number of combinations available with the current patch is
35^6 * 7 * 8 for files (for directories there are more combinations,
but lets ignore that for now).
That comes to 102942875000 combinations.
Now we can do the exponential birthday approximation, which is:
p = 1.0 - exp(-(n * (n-1)) / (2 * m))
where n is the number of entries in the directory, and m is the total
number of combinations.
That comes to about 0.0052 for 32767 files in a directory
(ie. maximully full), or about 0.5%.
With the previous patch we had 2^30 combinations, which came to 0.393,
or about 39%. So the new patch has about 75x lower chance of a single
collision than the old one.
Similarly for 100 files, the old patch gave a probability of
4.6*10^-6, whereas the new one gives 4.8x10^-8, which is about 96x
lower chance of a single collision.
My '80x' was very rough of course, but its close enough to be a
reasonable rule of thumb I think. What matters here is orders of
magnitude, not precise numbers.
The chance of an actual bluescreen is much lower again, as a
bluescreen happens when two files that have the same 8.3 entry are
accessed in quick succession. Paul did some modelling of that, which
is where the numbers I gave in my last email came from.
I do agree that it would be much better if the chance was zero, but I
also think that the chances of a problem due to a collision are not
really very high, and it may well be lost in the noise of the (rather
common) bluescreens of the fastfat driver on WinXP.
Please do check my maths though. I'll be extremely surprised if you
get anything like 90% as I have run it for days with randomised
testing, and even with the lower chances of the last patch I couldn't
make it happen. I could make it happen by greatly reducing the amount
of randomness in the patch, which is why I know it can happen, but the
probability drops off pretty rapidly with more bits.
Still, my combinatorics is a little rusty, so please correct me if I
have made a mistake.
Cheers, Tridge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists