[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070103115632.GA3062@elf.ucw.cz>
Date: Wed, 3 Jan 2007 12:56:32 +0100
From: Pavel Machek <pavel@....cz>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: bhalevy@...asas.com, arjan@...radead.org,
mikulas@...ax.karlin.mff.cuni.cz, jaharkes@...cmu.edu,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
nfsv4@...f.org
Subject: Re: Finding hardlinks
Hi!
> > > the use of a good hash function. The chance of an accidental
> > > collision is infinitesimally small. For a set of
> > >
> > > 100 files: 0.00000000000003%
> > > 1,000,000 files: 0.000003%
> >
> > I do not think we want to play with probability like this. I mean...
> > imagine 4G files, 1KB each. That's 4TB disk space, not _completely_
> > unreasonable, and collision probability is going to be ~100% due to
> > birthday paradox.
> >
> > You'll still want to back up your 4TB server...
>
> Certainly, but tar isn't going to remember all the inode numbers.
> Even if you solve the storage requirements (not impossible) it would
> have to do (4e9^2)/2=8e18 comparisons, which computers don't have
> enough CPU power just yet.
Storage requirements would be 16GB of RAM... that's small enough. If
you sort, you'll only need 32*2^32 comparisons, and that's doable.
I do not claim it is _likely_. You'd need hardlinks, as you
noticed. But system should work, not "work with high probability", and
I believe we should solve this in long term.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists