linux-kernel - Re: [PATCH] [v2] Documentation: Provide guidelines for tool-generated content

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c8d9f4fc-332f-4df8-9620-e0e2aa6dc0e9@lucifer.local>
Date: Mon, 10 Nov 2025 16:51:20 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>, linux-kernel@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>,
        Dan Williams <dan.j.williams@...el.com>, Theodore Ts'o <tytso@....edu>,
        Sasha Levin <sashal@...nel.org>, Jonathan Corbet <corbet@....net>,
        Kees Cook <kees@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Miguel Ojeda <ojeda@...nel.org>, Shuah Khan <shuah@...nel.org>,
        Christian Brauner <brauner@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        "workflows@...r.kernel.org" <workflows@...r.kernel.org>,
        "ksummit@...ts.linux.dev" <ksummit@...ts.linux.dev>,
        Dan Carpenter <dan.carpenter@...aro.org>,
        Julia Lawall <julia.lawall@...ia.fr>,
        James Bottomley <James.Bottomley@...senpartnership.com>,
        Mark Brown <broonie@...nel.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Jiri Kosina <kosina@...il.com>
Subject: Re: [PATCH] [v2] Documentation: Provide guidelines for
 tool-generated content

+re-cc various - email is a pain!

On Mon, Nov 10, 2025 at 08:35:12AM -0800, Dave Hansen wrote:
> On 11/10/25 02:48, Lorenzo Stoakes wrote:
> ...
> > It also seems slightly odd to produce this in advance of the maintainer's
> > summit, as I felt there was some agreement that the topic should be discussed
> > there?
>
> The TAB discussions have been ongoing and this document was mostly put
> together before the ksummit thread even launched off. This patch just
> suffered from being put on the back burner.

Ah that's useful context!

I mean I (obviously) feel this document is very necessary/useful so it's
nice that this was already an ongoing thing and we're all aligned anyway.

>
> > I think stating that we will NOT accept series that are generated without
> > understanding would be very beneficial in all respects, rather than leaving
> > it somehow implied.
>
> I actually don't think that's a tooling-specific requirement.
>
> If you're posting a series, you should understand it. I've seen quite a
> few cases where folks will pick up someone else's work, forward port it,
> and post it again without a clear understanding of the series.
>
> "Understand and be able to defend what you contribute" is certainly a
> good rule. It's also concise enough to have this document touch in it.
>
> Would that suffice?

That's great thanks. And yes absolutely it applies to everything, but
obviously LLMs are a realm where a person's capacity to exceed their
understanding is amplified.

>
> >> +Guidelines
> >> +==========
> >> +
> >> +First, read the Developer's Certificate of Origin:
> >> +Documentation/process/submitting-patches.rst . Its rules are simple
> >> +and have been in place for a long time. They have covered many
> >> +tool-generated contributions.
> >> +
> >> +Second, when making a contribution, be transparent about the origin of
> >> +content in cover letters and changelogs. You can be more transparent
> >> +by adding information like this:
> >> +
> >> + - What tools were used?
> >> + - The input to the tools you used, like the coccinelle source script.
> >
> > Not sure repeatedly using coccinelle as an example is helpful, as
> > coccinelle is far less of an issue than LLM tooling, perhaps for the
> > avoidance of doubt, expand this to include references to that?
> >
> >> + - If code was largely generated from a single or short set of
> >> +   prompts, include those prompts in the commit log. For longer
> >> +   sessions, include a summary of the prompts and the nature of
> >> +   resulting assistance.
> >
> > Maybe worth saying send it in a cover letter if a series, but perhaps
> > pedantic.
>
> Do we have a good short term that means "commit logs or cover letter"?
> "Changelogs" maybe? But, yeah, we don't want people reading this and
> avoiding putting stuff in cover letters.

Yeah it's maybe not worth specifying to be honest, it might just add
confusion.

>
> >> + - Which portions of the content were affected by that tool?
> >> +
> >> +As with all contributions, individual maintainers have discretion to
> >> +choose how they handle the contribution. For example, they might:
> >> +
> >> + - Treat it just like any other contribution
> >> + - Reject it outright
> >> + - Review the contribution with extra scrutiny
> >> + - Suggest a better prompt instead of suggesting specific code changes
> >> + - Ask for some other special steps, like asking the contributor to
> >> +   elaborate on how the tool or model was trained
> >> + - Ask the submitter to explain in more detail about the contribution
> >> +   so that the maintainer can feel comfortable that the submitter fully
> >> +   understands how the code works.
> >
> > OK I wrote something suggesting you add this and you already have :) that's
> > great. Let me go delete that request :)
> >
> > However I'm not sure the 'as with all contributions' is right though - as a
> > maintainer in mm I don't actually feel that we can reject outright without
> > having to give significant explanation as to why.
> >
> > And I think that's often the case - people (rightly) dislike blanket NAKs
> > and it's a terrible practice, which often (also rightly) gets pushback from
> > co-maintainers or others in the community.
> >
> > So I think perhaps it'd also be useful to very explicitly say that
> > maintainers may say no summarily in instances where the review load would
> > simply be too much to handle large clearly-AI-generated and
> > clearly-unfiltered series.
> >
> > Another point to raise perhaps is that - even in the cases where the
> > submitter is carefully reviewing generated output - that submitters must be
> > reasonable in terms of the volume they submit. This is perhaps hand wavey
> > but mentioning it would be great not least for the ability for maintainers
> > to point at the doc and reference it.
>
> How about we expand this bullet a bit?
>
> - Review the contribution with extra scrutiny
>
> to
>
> - Treat the contribution specially like reviewing with extra scrutiny,
>   or at a lower priority than human-generated content.
>
> That's a good match for the "Treat it just like any other contribution"
> bullet. Maintainers can either treat it normally _or_ specially.

That sounds good.

I do still think an explicit point about volume is important though, at
least to underline it.

Something like:

	Please be wary of the volume of submitted patches - sending an
	unreasonable number of generated patches is more likely to result
	in maintainers rejecting them or deprioritising review.

Perhaps?

>
> >> diff --git a/Documentation/process/index.rst b/Documentation/process/index.rst
> >> index aa12f26601949..e1a8a31389f53 100644
> >> --- a/Documentation/process/index.rst
> >> +++ b/Documentation/process/index.rst
> >> @@ -68,6 +68,7 @@ beyond).
> >>     stable-kernel-rules
> >>     management-style
> >>     researcher-guidelines
> >> +   generated-content
> >>
> >>  Dealing with bugs
> >>  -----------------
> >
> > I guess this is a WIP?

Cheers, Lorenzo