linux-kernel - email as a bona fide git transport

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <b9fb52b8-8168-6bf0-9a72-1e6c44a281a5@oracle.com>
Date:   Wed, 16 Oct 2019 12:22:54 +0200
From:   Vegard Nossum <vegard.nossum@...cle.com>
To:     workflows@...r.kernel.org, Git Mailing List <git@...r.kernel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Konstantin Ryabitsev <konstantin@...uxfoundation.org>,
        Eric Wong <e@...24.org>
Subject: email as a bona fide git transport

(cross-posted to git, LKML, and the kernel workflows mailing lists.)

Hi all,

I've been following Konstantin Ryabitsev's quest for better development
and communication tools for the kernel [1][2][3], and I would like to
propose a relatively straightforward idea which I think could bring a
lot to the table.

Step 1:

* git send-email needs to include parent SHA1s and generally all the
   information needed to perfectly recreate the commit when applied so
   that all the SHA1s remain the same

* git am (or an alternative command) needs to recreate the commit
   perfectly when applied, including applying it to the correct parent

Having these two will allow a perfect mapping between email and git;
essentially email just becomes a transport for git. There are a lot of
advantages to this, particularly that you have a stable way to refer to
a patch or commit (despite it appearing on a mailing list), and there
is no need for "changeset IDs" or whatever, since you can just use the
git SHA1 which is unique, unambiguous, and stable.

As a rough proof of concept I've attached 3 git patches which implement
this. There are issues to work out like exact format, encodings, mail
mangling, error handling, etc., but hopefully the git community can
help out here. (Improvement suggestions are welcome!)

Step 2:

* A bot that follows LKML (and other lists) and imports patchsets into
   a git repository hosted on git.kernel.org

* The bot can add git notes with URLs to lore (and/or other mailing
   list archives) and store them in e.g. refs/notes/lore,
   refs/notes/lkml, etc.

   (For those who don't use git notes yet: they are essentially small
   bits of information you can add to a commit without changing its SHA1,
   and you can configure tools like 'git log' to show these at the bottom
   of a commit. Notes can also exist in a repo completely separate from
   the commits they attach data to, so there is _zero_ overhead for those
   who don't want to use this.)

* Maintainers can either pull patchsets directly from this bot-
   maintained repo OR they can continue to apply patches from their inbox
   (the result should be the same either way) OR they can continue in the
   old-style process (at least for a while) and just not have the
   benefits of the new process.

Step 3:

* Instead of describing a patchset in a separate introduction email, we
   can create a merge commit between the parent of the first commit in
   the series and the last and put the patchset description in the merge
   commit [5]. This means the patchset description also gets to be part
   of git history.

   (This would require support for git send-email/am to be able to send
   and apply merge commits -- at least those which have the same tree as
   one of the parents. This is _not_ yet supported in my proposed git
   patches.)

* stable SHA1s means we can refer to previous versions of a patchset by
   SHA1 rather than archive links. I propose a new changelog tag for
   this, maybe "Previous:" or maybe even a full list of "v1:", "v2:",
   etc. with a SHA1 or ref. Note that these SHA1s do *not* need to exist
   in Linus's repo, but those who want can pull those branches from the
   bot-maintained repo on git.kernel.org.

Advantages:

- we can keep using email to post patches/patchsets

- the process is opt-in (but should be encouraged) for both authors and
   maintainers, and the transition can happen over time

- there is a central repo for convenience, but it is not necessary for
   development to happen and is not a single point of failure -- it's
   more like Linus's repo and can be moved or even replicated from
   scratch by somebody else simply by having mailing list archives

- allows quick lookup of patch/patchset <-> email discussion within git

- allows diffing between versions of a single logical patchset

- patchset descriptions naturally become part of the changelog that ends
   up in Linus's tree

Disadvantages:

- requires patching git

- requires a bot to continuously create branches for patchsets sent to
   mailing lists

- increased storage/bandwidth for git.kernel.org (?)

- may need a couple of new wrapper scripts to automate patchset
   construction/versioning

Thoughts?


Vegard

PS: Eric Wong described something that comes quite close to this idea, 
but AFAICT without actually recreating commits exactly. I've included 
the link for completeness. [4]


[1]: https://lwn.net/Articles/793037/ "Ryabitsev: Patches carved into
developer sigchains"

[2]: https://lwn.net/Articles/799134/ "Defragmenting the kernel
development process"

[3]: 
https://lore.kernel.org/workflows/20190924182536.GC6041@hmswarspite.think-freely.org/

[4]: https://lore.kernel.org/workflows/20191008003931.y4rc2dp64gbhv5ju@dcvr/

[5]: To create this merge commit one could use something like this (bash):

# usage: patchset BASE [PREVIOUS_VERSION]
patchset () {
     start=$1
     prev=$2

     # construct tentative commit message
     commit_editmsg="$(git rev-parse --git-dir)/COMMIT_EDITMSG"
     (
         if [ -z "$prev" ]
         then
             echo 'Patchset title'
             echo
             echo Commits:
             echo
             git log --oneline $start..HEAD
         else
             git show --format=format:%B --no-patch $prev
             echo Previous-version: $(git rev-parse $prev)
         fi
     ) > "${commit_editmsg}"

     ${EDITOR} "${commit_editmsg}"

     merge=$(git commit-tree -p $start -p HEAD -F "${commit_editmsg}" 
$(git rev-parse HEAD^{tree}))
     echo $merge
}

This will open the editor to edit the patchset description and create a
merge commit that encompasses the patches in the patchset (use sha1^- to
view the patches in it).

View attachment "0001-format-patch-add-complete.patch" of type "text/x-patch" (3744 bytes)

View attachment "0002-mailinfo-collect-commit-metadata-from-mail.patch" of type "text/x-patch" (5778 bytes)

View attachment "0003-am-add-exact.patch" of type "text/x-patch" (5726 bytes)