linux-kernel - Re: [GIT PULL] virtio: last minute fixup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220511125140.ormw47yluv4btiey@meerkat.local>
Date:   Wed, 11 May 2022 08:51:40 -0400
From:   Konstantin Ryabitsev <konstantin@...uxfoundation.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Nathan Chancellor <nathan@...nel.org>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        KVM list <kvm@...r.kernel.org>,
        virtualization@...ts.linux-foundation.org,
        Netdev <netdev@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        mie@...l.co.jp
Subject: Re: [GIT PULL] virtio: last minute fixup

On Tue, May 10, 2022 at 04:50:47PM -0700, Linus Torvalds wrote:
> > For what it's worth, as someone who is frequently tracking down and
> > reporting issues, a link to the mailing list post in the commit message
> > makes it much easier to get these reports into the right hands, as the
> > original posting is going to have all relevant parties in one location
> > and it will usually have all the context necessary to triage the
> > problem.
> 
> Honestly, I think such a thing would be trivial to automate with
> something like just a patch-id lookup, rather than a "Link:".

I'm not sure that's quite reliable, and I'm speaking from experience of
running git-patchwork-bot, which attempts to match commits to patches.
Patch-id has these important disadvantages:

1. git-patch-id can be fragile: if the maintainer changes things like add
   curly braces, rename a variable, or edit a code comment for clarity, the
   patch-id stops matching. This happens routinely with git-patchwork-bot,
   and patchwork uses an even laxer algorithm than git-patch-id. In fact, I
   had to hack git-patchwork-bot to fall back on Link: tags to match by
   message-id to address some of the maintainers' complaints.

2. git-patch-id doesn't include author/date/commit message: which can actually
   be important for establishing provenance and attribution and can confuse
   automation. E.g. an author submits a patch as part of a large series, gets
   told to break it apart, then submits it as part of a different series.
   Automated processes trying to match commits to submissions won't be able to
   tell from which series the commit came from.

Cregit folks (cregit.linuxsources.org) have encountered all of these and I
know from talking to them that they are quite happy to have a way to match
commit provenance to the exact messages in the archives.

> Wouldn't it be cool if you had some webby interface to just go from
> commit SHA1 to patch ID to a lore.kernel.org lookup of where said
> patch was done?

Yes, it's https://cregit.linuxsources.org/ and it's... okay. :) It certainly
doesn't manage to match all commits to patches despite having access to all of
lore.kernel.org archives.

> My argument here really is that "find where this commit was posted" is
> 
>  (a) not generally the most interesting thing
> 
>  (b) doesn't even need that "Link:" line.
> 
> but what *is* interesting, and where the "Link:" line is very useful,
> is finding where the original problem that *caused* that patch to be
> posted in the first place.

I think the disconnect here is that you're approaching this from the
perspective of a human being, while what many want is a dumb and reliable way
to match commits to ML submissions, which will allow improving unattended
automation.

> So that whole "searching is often an option" is true for pretty much
> _any_ Link:, but I think that for the whole "original submission" it's
> so mindless and can be automated that it really doesn't add much real
> value at all.

Believe me, I've tried, and I really, really like having a fool-proof way to
match commits directly to the exact ML submissions. :( Even a 99%-reliable
fuzzy matching algorithm has enough of a failure rate that causes maintainers
to get annoyed -- I have many "git-patchwork-bot missed this commit"
complaints in the queue to prove this.

I think we should simply disambiguate the trailer added by tooling like b4.
Instead of using Link:, it can go back to using Message-Id, which is already
standard with git -- it's trivial for git.kernel.org to link them to
lore.kernel.org.

Before:

    Signed-off-by: Main Tainer <main.tainer@...ux.dev>
    Link: https://lore.kernel.org/r/CAHk-=wgAk3NEJ2PHtb0jXzCUOGytiHLq=rzjkFKfpiuH-SROgA@mail.gmail.com

After:

    Signed-off-by: Main Tainer <main.tainer@...ux.dev>
    Message-Id: <CAHk-=wgAk3NEJ2PHtb0jXzCUOGytiHLq=rzjkFKfpiuH-SROgA@...l.gmail.com>

This would allow people to still use Link: for things like linking to actual
ML discussions. I know this pollutes commits a bit, but I would argue that
this is a worthwhile trade-off that allows us to improve our automation and
better scale maintainers.

-K