lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTikV6=FCQ+-Z=fAi8SH90M-izCUEBisinqGwo=DU@mail.gmail.com>
Date:	Tue, 18 Jan 2011 21:56:38 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	nobody <darwinskernel@...il.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.38-rc1

On Tue, Jan 18, 2011 at 9:42 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> When pulling from 2.6.37 to 2.6.38-rc1, it should look something like this:
>
>  remote: Counting objects: 84898, done.
>  remote: Compressing objects: 100% (14274/14274), done.
>  Receiving objects: 100% (71245/71245), 21.07 MiB | 26.53 MiB/s, done.
>  remote: Total 71245 (delta 59086), reused 67779 (delta 56042)
>  Resolving deltas: 100% (59086/59086), completed with 7395 local objects.
>
> ie you got 21.07MiB for the whole change between 2.6.37 and 2.6.38-rc1.

Btw, what may confuse you a bit is that the on-disk representation of
the newly received pack ends up being about 69MB, ie the 21MiB of
network traffic almost tripled in size as a result of that "resolving
deltas" thing. That's because git pack-files are designed to always be
stand-alone, so on disk, the pack-file will always contain the base
objects needed to expand all the deltas.

But on the wire, we don't do that, which is why you have that
"Resolving deltas" phase - it's a purely local phase where it takes
the "pure delta" pack that came over the wire, and creates the
well-formed pack that doesn't have any deltas that depend on external
objects.

And that expansion will end up happening every time you pull: so if
you do daily pulls, all those pulls that will have been fairly small
on the wire will all have been expanded so that the resulting packs
are stand-alone. Which means that you often end up having the same (or
very similar) base objects duplicated in the packs.

So I can well imagine that if you do a pull every day, over two weeks
your .git/objects/pack directory will have new packs that together are
500MB in size due to all of that. That's why git likes doing some GC
on its data every once in a while - it will repack all those
individual packs into one big pack, which avoids all that duplication
of base objects.

And why do we expand the packs and make them stand on their own? Why
don't we just keep all the object data as deltas agains objects in
other packs, the way we pass data around on the network? The reason is
simply robustness. You can get into various nasty situations (like
circular delta dependencies) if you allow deltas between different
packs. So the only time we allow a so-called "thin pack" (ie the pack
is full of deltas against objects external to the pack) is for the
ephemeral pack that is transferred during a "pull" or "fetch". In that
situation we end up doing lots of extra sanity checking, and because
it's ephemeral you never get into the whole situation where deltas in
different packs could refer to each other (because by the time it's a
real pack, it will have been expanded out to be self-sufficient).

So do use "git gc" every once in a while to avoid unnecessary pack
duplication issues (it also makes object indexing much faster etc).

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ