[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db93c591-8a1e-45a4-a33e-a0578054a8cf@lucifer.local>
Date: Mon, 10 Nov 2025 11:15:33 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
Dan Williams <dan.j.williams@...el.com>, Theodore Ts'o <tytso@....edu>,
Sasha Levin <sashal@...nel.org>, Jonathan Corbet <corbet@....net>,
Kees Cook <kees@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Miguel Ojeda <ojeda@...nel.org>, Shuah Khan <shuah@...nel.org>,
Christian Brauner <brauner@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>,
"workflows@...r.kernel.org" <workflows@...r.kernel.org>,
"ksummit@...ts.linux.dev" <ksummit@...ts.linux.dev>,
Dan Carpenter <dan.carpenter@...aro.org>,
Julia Lawall <julia.lawall@...ia.fr>,
James Bottomley <James.Bottomley@...senpartnership.com>,
Mark Brown <broonie@...nel.org>,
"Paul E. McKenney" <paulmck@...nel.org>,
Jiri Kosina <kosina@...il.com>
Subject: Re: [PATCH] [v2] Documentation: Provide guidelines for
tool-generated content
+cc potentially interested parties.
Apologies if I missed anybody, just scanned through quickly.
On Mon, Nov 10, 2025 at 10:48:04AM +0000, Lorenzo Stoakes wrote:
> I think it would have been helpful to ping those engaged in the discussion in
> this area in related threads, e.g. [0] and [1].
>
> [0]: https://lore.kernel.org/ksummit/49f1a974-e1e6-4be5-864e-5e0f905e1a8f@paulmck-laptop/T/#m30873ef3dc9bd2c4c95547e81efff3085474f2d9
> [1]: https://lore.kernel.org/all/7e7f485e-93ad-4bc4-9323-f154ce477c39@lucifer.local/
>
> I'm not sure what the process was that lead to this, but it feels rather as if
> the community were excluded here.
>
> It also seems slightly odd to produce this in advance of the maintainer's
> summit, as I felt there was some agreement that the topic should be discussed
> there?
>
> Obviously there may be very good reasons for this but it'd be good for them to
> be clarified and those who engaged in these discussions to be cc'd also (or at
> least ping on threads linking!)
>
> On Wed, Nov 05, 2025 at 03:15:14PM -0800, Dave Hansen wrote:
> > In the last few years, the capabilities of coding tools have exploded.
> > As those capabilities have expanded, contributors and maintainers have
> > more and more questions about how and when to apply those
> > capabilities.
> >
> > The shiny new AI tools (chatbots, coding assistants and more) are
> > impressive. Add new Documentation to guide contributors on how to
> > best use kernel development tools, new and old.
>
> As others have pointed out, this is strangely gleeful, can we please drop it?
>
> As mentioned in the msummit thread I have a great concern about how the press
> might report on this kind of change, as I fear that a 'kernel accepts AI
> patches' story might result in a large influx of AI patches from enthusiatic
> people which will have a direct impact on maintainer workload.
>
> I don't think comments like this help in that respect.
>
> In general I feel that a more restrictive/pessmistic document that can later be
> made less pessimistic/restrictive is a better approach than a broad one on this
> basis.
>
> >
> > Note, though, there are fundamentally no new or unique rules in this
> > new document. It clarifies expectations that the kernel community has
>
> Hmm, I'm not sure the conflation of pre-existing tooling which always required
> some degree of understanding vs. a technique which can simply generate entire
> patch sets with commentary included is justified.
>
> While I _do_ like the idea that basic principles that already existed still
> exist for LLMs (that's a powerful notion), I wonder if we do in fact do need
> some new rules here.
>
> I think saying this also pushes back on the concept of maintainer-by-maintainer
> policy as 'it's just like it always was' doesn't suggest that it warrants a
> higher level of scrutiny.
>
> > had for many years. For example, researchers are already asked to
> > disclose the tools they use to find issues in
> > Documentation/process/researcher-guidelines.rst. This new document
> > just reiterates existing best practices for development tooling.
>
> Ironically that document is considerably more strident and firm than this
> one :)
>
> >
> > In short: Please show your work and make sure your contribution is
> > easy to review.
>
> I wonder whether we need to be very explicit in stating - please do not
> generate patches in large volume with no involvement from you and
> _emphasise_ that human involvement is _necessary_.
>
> In discussion with kernel colleagues who use AI extensively, there is a
> very clear pattern than a key part of usefully making use of this tooling
> is for there to be an 'expert in the loop' who reviews what is generated to
> ensure it is correct.
>
> I therefore think we either _should_ have a specific rule for LLM-generated
> content or should (and it really makes sense actually) have a broad
> 'generated content' rule that - you _must_ have a thorough understanding of
> what you are doing such that you can review and filter the generated
> output.
>
> I think stating that we will NOT accept series that are generated without
> understanding would be very beneficial in all respects, rather than leaving
> it somehow implied.
>
> Being soft or vague here is likely to cause maintainer headaches IMO
> (though of course there's only so many who will read a doc etc. being able
> to point at the document in reply as a maintainer is useful too).
>
> >
> > Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
> > Cc: Steven Rostedt <rostedt@...dmis.org>
> > Cc: Dan Williams <dan.j.williams@...el.com>
> > Cc: Theodore Ts'o <tytso@....edu>
> > Cc: Sasha Levin <sashal@...nel.org>
> > Cc: Jonathan Corbet <corbet@....net>
> > Cc: Kees Cook <kees@...nel.org>
> > Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> > Cc: Miguel Ojeda <ojeda@...nel.org>
> > Cc: Shuah Khan <shuah@...nel.org>
> >
> > --
> >
> > This document was a collaborative effort from all the members of
> > the TAB. I just reformatted it into .rst and wrote the changelog.
> >
> > Changes from v1:
> > * Rename to generated-content.rst and add to documentation index.
> > (Jon)
> > * Rework subject to align with the new filename
> > * Replace commercial names with generic ones. (Jon)
> > * Be consistent about punctuation at the end of bullets for whole
> > sentences. (Miguel)
> > * Formatting sprucing up and minor typos (Miguel)
> > ---
> > Documentation/process/generated-content.rst | 94 +++++++++++++++++++++
> > Documentation/process/index.rst | 1 +
> > 2 files changed, 95 insertions(+)
> > create mode 100644 Documentation/process/generated-content.rst
> >
> > diff --git a/Documentation/process/generated-content.rst b/Documentation/process/generated-content.rst
> > new file mode 100644
> > index 0000000000000..5e8ff44190932
> > --- /dev/null
> > +++ b/Documentation/process/generated-content.rst
> > @@ -0,0 +1,94 @@
> > +============================================
> > +Kernel Guidelines for Tool Generated Content
> > +============================================
> > +
> > +Purpose
> > +=======
> > +
> > +Kernel contributors have been using tooling to generate contributions
> > +for a long time. These tools are constantly becoming more capable and
> > +undoubtedly improve developer productivity. At the same time, reviewer
> > +and maintainer bandwidth is a very scarce resource. Understanding
>
> This is absolutely the key issue here imo, maintainer bandwidth. Glad this
> is in the opener.
>
> > +which portions of a contribution come from humans versus tools is
> > +critical to maintain those resources and keep kernel development
> > +healthy.
>
> Agreed entirely.
>
> > +
> > +The goal here is to clarify community expectations around tools. This
> > +lets everyone become more productive while also maintaining high
> > +degrees of trust between submitters and reviewers.
>
> Also very good.
>
> > +
> > +Out of Scope
> > +============
> > +
> > +These guidelines do not apply to tools that make trivial tweaks to
> > +preexisting content. Nor do they pertain to AI tooling that helps with
> > +menial tasks. Some examples:
> > +
> > + - Spelling and grammar fix ups, like rephrasing to imperative voice
> > + - Typing aids like identifier completion, common boilerplate or
> > + trivial pattern completion
> > + - Purely mechanical transformations like variable renaming
> > + - Reformatting, like running Lindent, ``clang-format`` or
> > + ``rust-fmt``
> > +
> > +Even if your tool use is out of scope you should still always consider
> > +if it would help reviewing your contribution if the reviewer knows
> > +about the tool that you used.
>
> This is great, I agree very much that we have to be reasonable about these
> uses.
>
> The final sentence is also great.
>
> > +
> > +In Scope
> > +========
> > +
> > +These guidelines apply when a meaningful amount of content in a kernel
> > +contribution was not written by a person in the Signed-off-by chain,
> > +but was instead created by a tool.
>
> Yes, perhaps useful actually using the term 'meaningful amount' rather than
> trying to be absolutely explicit about what this entails.
>
> Also allows for maintainer discretion.
>
> > +
> > +Detection of a problem is also a part of the development process; if a
> > +tool was used to find a problem addressed by a change, that should be
> > +noted in the changelog. This not only gives credit where it is due, it
> > +also helps fellow developers find out about these tools.
> > +
> > +Some examples:
> > + - Any tool-suggested fix such as ``checkpatch.pl --fix``
> > + - Coccinelle scripts
> > + - A chatbot generated a new function in your patch to sort list entries.
> > + - A .c file in the patch was originally generated by a LLM but cleaned
> > + up by hand.
> > + - The changelog was generated by handing the patch to a generative AI
> > + tool and asking it to write the changelog.
> > + - The changelog was translated from another language.
> > +
> > +If in doubt, choose transparency and assume these guidelines apply to
> > +your contribution.
>
> Yes agreed.
>
> > +
> > +Guidelines
> > +==========
> > +
> > +First, read the Developer's Certificate of Origin:
> > +Documentation/process/submitting-patches.rst . Its rules are simple
> > +and have been in place for a long time. They have covered many
> > +tool-generated contributions.
> > +
> > +Second, when making a contribution, be transparent about the origin of
> > +content in cover letters and changelogs. You can be more transparent
> > +by adding information like this:
> > +
> > + - What tools were used?
> > + - The input to the tools you used, like the coccinelle source script.
>
> Not sure repeatedly using coccinelle as an example is helpful, as
> coccinelle is far less of an issue than LLM tooling, perhaps for the
> avoidance of doubt, expand this to include references to that?
>
> > + - If code was largely generated from a single or short set of
> > + prompts, include those prompts in the commit log. For longer
> > + sessions, include a summary of the prompts and the nature of
> > + resulting assistance.
>
> Maybe worth saying send it in a cover letter if a series, but perhaps
> pedantic.
>
> > + - Which portions of the content were affected by that tool?
> > +
> > +As with all contributions, individual maintainers have discretion to
> > +choose how they handle the contribution. For example, they might:
> > +
> > + - Treat it just like any other contribution
> > + - Reject it outright
> > + - Review the contribution with extra scrutiny
> > + - Suggest a better prompt instead of suggesting specific code changes
> > + - Ask for some other special steps, like asking the contributor to
> > + elaborate on how the tool or model was trained
> > + - Ask the submitter to explain in more detail about the contribution
> > + so that the maintainer can feel comfortable that the submitter fully
> > + understands how the code works.
>
> OK I wrote something suggesting you add this and you already have :) that's
> great. Let me go delete that request :)
>
> However I'm not sure the 'as with all contributions' is right though - as a
> maintainer in mm I don't actually feel that we can reject outright without
> having to give significant explanation as to why.
>
> And I think that's often the case - people (rightly) dislike blanket NAKs
> and it's a terrible practice, which often (also rightly) gets pushback from
> co-maintainers or others in the community.
>
> So I think perhaps it'd also be useful to very explicitly say that
> maintainers may say no summarily in instances where the review load would
> simply be too much to handle large clearly-AI-generated and
> clearly-unfiltered series.
>
> Another point to raise perhaps is that - even in the cases where the
> submitter is carefully reviewing generated output - that submitters must be
> reasonable in terms of the volume they submit. This is perhaps hand wavey
> but mentioning it would be great not least for the ability for maintainers
> to point at the doc and reference it.
>
> > diff --git a/Documentation/process/index.rst b/Documentation/process/index.rst
> > index aa12f26601949..e1a8a31389f53 100644
> > --- a/Documentation/process/index.rst
> > +++ b/Documentation/process/index.rst
> > @@ -68,6 +68,7 @@ beyond).
> > stable-kernel-rules
> > management-style
> > researcher-guidelines
> > + generated-content
> >
> > Dealing with bugs
> > -----------------
>
> I guess this is a WIP?
>
> > --
> > 2.34.1
> >
> >
>
> Thanks, Lorenzo
Powered by blists - more mailing lists