linux-kernel - Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xmqqtwkymxzj.fsf@gitster.mtv.corp.google.com>
Date:	Tue, 23 Feb 2016 22:19:28 -0800
From:	Junio C Hamano <gitster@...ox.com>
To:	"H. Peter Anvin" <hpa@...or.com>
Cc:	ebiederm@...ssion.com, Fengguang Wu <fengguang.wu@...el.com>,
	Xiaolong Ye <xiaolong.ye@...el.com>, git@...r.kernel.org,
	ying.huang@...el.com, philip.li@...el.com, julie.du@...el.com,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Christoph Hellwig <hch@....de>,
	Dan Carpenter <dan.carpenter@...cle.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

"H. Peter Anvin" <hpa@...or.com> writes:

> Personally, as a maintainer, I would love to see the tree ID and
> ideally also the commit ID a series is based on.  The commit ID is
> in some ways less useful since it is non-recreatable (and
> therefore will never match for anything but the first patch of a
> series), but could be useful to the maintainer.

I admit that the very first "applies-to" proposal I made long time
ago was based on a tree object name, not a commit object name like
the proposal under discussion here, but I doubt that a tree object
name is that much more useful than a commit object name in this
context.

Below, I assume that you are envisioning that the "base tree"
recorded in a patch does not necessarily name a public, well-known
tree (e.g. a tree-ish that already appears in Linus's tree for those
who work with his tree, or other relevant trees like linux-next or
net tree) [*1*]. It would name an unknown tree that results by
applying a set of well-known patches in-flight on a public
well-known commit.  In that set-up, because you cannot guess
committer identity and timestamp that are used by the patch
submitter when these in-flight patches are applied to prepare the
base for these private commits, a commit object name is useless, but
it may still be possible for you to independently compute these
trees that would result from set of well-known in-flight patches.

But I do not think "it may be possible" above translates to
usefulness in practice.

Suppose we have only three well-known in-flight patches that are
unrelated and independent, and you somehow know that the patch
submitter built the first patch in the series by working on either a
recently tagged commit (say v4.4) or a result of applying some of
these in-flight patches on top of that commit.  Even with these
three commits, the base tree the patch submitter based his or her
work on could be v4.4 itself, v4.4 plus one of the three patches
(v4.4+A, v4.4+B, v4.4+C, three possibilities in total), v4.4 plus
two of the three patches in some order (v4.4+AB, v4.4+BC, v4.4+CA,
three possibilities in total) or v4.4 plus all of the three patches,
so there are 8 possible top-level tree objects in total.  Unless you
are doing something unusual [*2*], even if you have all of these
three well-known in-flight patches in your repository, you would
have only a subset of them (you would certainly have v4.4, and v4.4
plus all three patches, but you would likely to have only one path
between these two points, that's four commits recording four trees,
out of possible 8).

In the real world, of course you have far more than three well-known
in-flight patches, so even though in theory trees may have better
chance to be "figured out", I do not think it is practical to even
attempt to "figure out" an unknown state given a tree object name.

So assuming that it is a good idea to add some information to a
patch that identifies the whole tree it applies to, I think it is
sensible to (1) limit that identifiable set of tree-ishes only to
well-known public ones, and (2) use the commit object name, not the
tree object name, for the purpose of identifying these tree-ishes.

If I understand Fengguang's plan correctly, a new work based on a
public well-known base tree-ish plus other patches in-flight are to
be accompanied by the identifier for that well-known base tree-ish
and some identifiers for these in-flight patches, i.e. the robot
will be told to check out the well-known base tree-ish, apply the
prerequisite patches and then the patches for the work are applied
on top to be evaluated.  So the above two limitations I placed in
the previous paragraph would not hurt the identifiability of the
"base" tree-ish, I would think.

[Footnote]

*1* If you limit the bases to these well known ones, then there is
no practical difference between commit and tree, because we can
assume as a maintainer you would have these commits so you would
have both, and once located, a commit would be easier to reason
about (e.g. run "log" to see what changes there are between it and a
well known tags).

*2* By "something unusual" I mean you prepare the permutations of
in-flight patches in your repository, to make it possible to find
any of the 8 tree objects in this senario.