[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTin=EeQx4pEPk9ST27kcRpDP65NQvL1c1m8UcRmO@mail.gmail.com>
Date: Thu, 28 Oct 2010 11:54:55 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "Ted Ts'o" <tytso@....edu>, Ingo Molnar <mingo@...e.hu>,
Linus Torvalds <torvalds@...ux-foundation.org>,
git@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Minimum git commit abbrev length (Was Re: -tip: origin tree build
failure (was: [GIT PULL] ext4 update) for 2.6.37)
On Thu, Oct 28, 2010 at 11:28 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Yes. The default of 7 (I think) comes from fairly early in git
> development, when seven hex digits was a lot (it covers about 250+
> million hash values). Back then I thought that 65k revisions was a lot
> (it was what we were about to hit in BK), and each revision tends to
> be about 5-10 new objects or so, so a million objects was a big
> number.
>
> These days, the kernel isn't even the largest git project, and even
> the kernel has about 220k revisions (_much_ bigger than the BK tree
> ever was) and we are approaching two million objects. At that point,
> seven hex digits is still unique for a lot of them, but when we're
> talking about just two orders of magnitude difference between number
> of objects and the hash size, there _will_ be hash collisions. It's no
> longer even close to unrealistic - it happens all the time.
Hmm. In fact, in the kernel, we currently have about twelve thousand
objects that end up having collisions in 7 hex digits. Even in the old
historical BK kernel tree, we have over a thousand objects that
collide (each bucket in both cases gets just two objects, there are as
of yet no multiple collisions, which is what you'd expect with a good
hash). See with
git rev-list --objects --all | cut -c1-7 | sort | uniq -dc
and in fact git itself has a few collisions (but currently just 44
objects ending up sharing 22 SHA1 buckets in 7 digits).
With each digit, you'd expect the collisions to decrease by a factor
of 16, and that is indeed exactly what happens. For my current kernel
tree I get:
- 7 digits: 5823 buckets with duplicates (ie 11646 objects that aren't unique)
- 8: 406
- 9: 30
- 10: 1
- 11: 0
so 12 hex digits is indeed pretty safe for the kernel, and is likely
to remain so until the kernel history grows by a factor of 16.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists