linux-kernel - Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250724210609.GV11202@pendragon.ideasonboard.com>
Date: Fri, 25 Jul 2025 00:06:09 +0300
From: Laurent Pinchart <laurent.pinchart@...asonboard.com>
To: Kees Cook <kees@...nel.org>
Cc: Konstantin Ryabitsev <konstantin@...uxfoundation.org>,
	linux@...blig.org, corbet@....net, workflows@...r.kernel.org,
	josh@...htriplett.org, linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag

On Thu, Jul 24, 2025 at 01:45:35PM -0700, Kees Cook wrote:
> On Thu, Jul 24, 2025 at 03:07:17PM -0400, Konstantin Ryabitsev wrote:
> > On Thu, Jul 24, 2025 at 06:54:39PM +0100, linux@...blig.org wrote:
> > > From: "Dr. David Alan Gilbert" <linux@...blig.org>
> > > 
> > > It seems right to require that code which is automatically
> > > generated is disclosed in the commit message.
> > 
> > I'm not sure that's the case. There is a lot of automatically generated
> > content being added to the kernel all the time -- such as auto-formatted code,
> > documentation, and unit tests generated by non-AI tooling. We've not required
> > indicating this usage before, so I'm not sure it makes sense to start doing it
> > now.
> > 
> > Furthermore, merely indicating the tool doesn't really say anything about how
> > it was used (e.g. what version, what prompt, what context, etc.) If anything,
> > this information needs to live in the cover letter of the submission. I would
> > suggest we investigate encouraging contributors to disclose this there, e.g.:
> > 
> > | ---
> > | This patch series was partially generated with "InsensitiveClod o4 Hokus"
> > | and then heavily modified to remove the parts where it went completely off
> > | the deep end.
> > 
> > I am also not opposed to having a more standard cover letter footer that would
> > allow an easier way to query this information via public-inbox services, e.g.:
> > 
> > | generated-by: insensitive clod o4 hokus
> > 
> > However, I don't really think this belongs in the commit trailers.

I think there's often value in having the information in individual
patches instead of (or in addition to) the cover letter though, as it's
common for different patches in a series to be generated differently.
Standardizing on one option or the other may be overkill at this point
though. Especially when it comes to code generated by LLMs, how (and if)
to report that information should be governed by the issues we want to
address, and I don't think there's a consensus on those yet.

One issue that is often mentioned is copyright infringement. We go to
great length today to ensure that code is fit for inclusion in the
kernel from a legal point of view with the certificate of origin and the
SoB line. It would seem to make sense to then also report if code was
geenrated by an LLM per-commit if we want to extend the copyright paper
trail (for whatever purpose it will be used later).

> I agree; I'm not sure I see a benefit in creating a regularized trailer
> for this. What automation/tracking is going to key off of it?

We may find/invent use cases for automation later, in which case we can
revisit usage of a standardized trailer. I however see an important
manual use case for the information already: knowing how a patch was
created helps reviewers. If I'm told a patch was generated by coccinelle
(especially if the semantic patch is included in the commit message
too), I will pay attention to different types of mistakes than for a
manually written patch.

> It's
> a detail of patch creation methodology, so the commentary about how
> something was created is best put in the prose areas, like we already
> do for Coccinelle or other scripts. It's a bit buried in the Researcher
> Guidelines[1], but we have explicitly asked for details about tooling:
> 
>   When sending patches produced from research, the commit logs should
>   contain at least the following details, so that developers have
>   appropriate context for understanding the contribution.
>   ...
>   Specifically include details about any testing, static or dynamic
>   analysis programs, and any other tools or methods used to perform the
>   work.
> 
> Maybe that needs to be repeated in SubmittingPatches?
> 
> -Kees
> 
> [1] https://docs.kernel.org/process/researcher-guidelines.html

-- 
Regards,

Laurent Pinchart