linux-kernel - Re: [PATCH] [v2] Documentation: Provide guidelines for tool-generated content

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11eaf7fa-27d0-4a57-abf0-5f24c918966c@lucifer.local>
Date: Mon, 10 Nov 2025 10:48:04 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
        Dan Williams <dan.j.williams@...el.com>, Theodore Ts'o <tytso@....edu>,
        Sasha Levin <sashal@...nel.org>, Jonathan Corbet <corbet@....net>,
        Kees Cook <kees@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Miguel Ojeda <ojeda@...nel.org>, Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH] [v2] Documentation: Provide guidelines for
 tool-generated content

I think it would have been helpful to ping those engaged in the discussion in
this area in related threads, e.g. [0] and [1].

[0]: https://lore.kernel.org/ksummit/49f1a974-e1e6-4be5-864e-5e0f905e1a8f@paulmck-laptop/T/#m30873ef3dc9bd2c4c95547e81efff3085474f2d9
[1]: https://lore.kernel.org/all/7e7f485e-93ad-4bc4-9323-f154ce477c39@lucifer.local/

I'm not sure what the process was that lead to this, but it feels rather as if
the community were excluded here.

It also seems slightly odd to produce this in advance of the maintainer's
summit, as I felt there was some agreement that the topic should be discussed
there?

Obviously there may be very good reasons for this but it'd be good for them to
be clarified and those who engaged in these discussions to be cc'd also (or at
least ping on threads linking!)

On Wed, Nov 05, 2025 at 03:15:14PM -0800, Dave Hansen wrote:
> In the last few years, the capabilities of coding tools have exploded.
> As those capabilities have expanded, contributors and maintainers have
> more and more questions about how and when to apply those
> capabilities.
>
> The shiny new AI tools (chatbots, coding assistants and more) are
> impressive.  Add new Documentation to guide contributors on how to
> best use kernel development tools, new and old.

As others have pointed out, this is strangely gleeful, can we please drop it?

As mentioned in the msummit thread I have a great concern about how the press
might report on this kind of change, as I fear that a 'kernel accepts AI
patches' story might result in a large influx of AI patches from enthusiatic
people which will have a direct impact on maintainer workload.

I don't think comments like this help in that respect.

In general I feel that a more restrictive/pessmistic document that can later be
made less pessimistic/restrictive is a better approach than a broad one on this
basis.

>
> Note, though, there are fundamentally no new or unique rules in this
> new document. It clarifies expectations that the kernel community has

Hmm, I'm not sure the conflation of pre-existing tooling which always required
some degree of understanding vs. a technique which can simply generate entire
patch sets with commentary included is justified.

While I _do_ like the idea that basic principles that already existed still
exist for LLMs (that's a powerful notion), I wonder if we do in fact do need
some new rules here.

I think saying this also pushes back on the concept of maintainer-by-maintainer
policy as 'it's just like it always was' doesn't suggest that it warrants a
higher level of scrutiny.

> had for many years. For example, researchers are already asked to
> disclose the tools they use to find issues in
> Documentation/process/researcher-guidelines.rst. This new document
> just reiterates existing best practices for development tooling.

Ironically that document is considerably more strident and firm than this
one :)

>
> In short: Please show your work and make sure your contribution is
> easy to review.

I wonder whether we need to be very explicit in stating - please do not
generate patches in large volume with no involvement from you and
_emphasise_ that human involvement is _necessary_.

In discussion with kernel colleagues who use AI extensively, there is a
very clear pattern than a key part of usefully making use of this tooling
is for there to be an 'expert in the loop' who reviews what is generated to
ensure it is correct.

I therefore think we either _should_ have a specific rule for LLM-generated
content or should (and it really makes sense actually) have a broad
'generated content' rule that - you _must_ have a thorough understanding of
what you are doing such that you can review and filter the generated
output.

I think stating that we will NOT accept series that are generated without
understanding would be very beneficial in all respects, rather than leaving
it somehow implied.

Being soft or vague here is likely to cause maintainer headaches IMO
(though of course there's only so many who will read a doc etc. being able
to point at the document in reply as a maintainer is useful too).

>
> Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Dan Williams <dan.j.williams@...el.com>
> Cc: Theodore Ts'o <tytso@....edu>
> Cc: Sasha Levin <sashal@...nel.org>
> Cc: Jonathan Corbet <corbet@....net>
> Cc: Kees Cook <kees@...nel.org>
> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> Cc: Miguel Ojeda <ojeda@...nel.org>
> Cc: Shuah Khan <shuah@...nel.org>
>
> --
>
> This document was a collaborative effort from all the members of
> the TAB. I just reformatted it into .rst and wrote the changelog.
>
> Changes from v1:
>  * Rename to generated-content.rst and add to documentation index.
>    (Jon)
>  * Rework subject to align with the new filename
>  * Replace commercial names with generic ones. (Jon)
>  * Be consistent about punctuation at the end of bullets for whole
>    sentences. (Miguel)
>  * Formatting sprucing up and minor typos (Miguel)
> ---
>  Documentation/process/generated-content.rst | 94 +++++++++++++++++++++
>  Documentation/process/index.rst             |  1 +
>  2 files changed, 95 insertions(+)
>  create mode 100644 Documentation/process/generated-content.rst
>
> diff --git a/Documentation/process/generated-content.rst b/Documentation/process/generated-content.rst
> new file mode 100644
> index 0000000000000..5e8ff44190932
> --- /dev/null
> +++ b/Documentation/process/generated-content.rst
> @@ -0,0 +1,94 @@
> +============================================
> +Kernel Guidelines for Tool Generated Content
> +============================================
> +
> +Purpose
> +=======
> +
> +Kernel contributors have been using tooling to generate contributions
> +for a long time. These tools are constantly becoming more capable and
> +undoubtedly improve developer productivity. At the same time, reviewer
> +and maintainer bandwidth is a very scarce resource. Understanding

This is absolutely the key issue here imo, maintainer bandwidth. Glad this
is in the opener.

> +which portions of a contribution come from humans versus tools is
> +critical to maintain those resources and keep kernel development
> +healthy.

Agreed entirely.

> +
> +The goal here is to clarify community expectations around tools. This
> +lets everyone become more productive while also maintaining high
> +degrees of trust between submitters and reviewers.

Also very good.

> +
> +Out of Scope
> +============
> +
> +These guidelines do not apply to tools that make trivial tweaks to
> +preexisting content. Nor do they pertain to AI tooling that helps with
> +menial tasks. Some examples:
> +
> + - Spelling and grammar fix ups, like rephrasing to imperative voice
> + - Typing aids like identifier completion, common boilerplate or
> +   trivial pattern completion
> + - Purely mechanical transformations like variable renaming
> + - Reformatting, like running Lindent, ``clang-format`` or
> +   ``rust-fmt``
> +
> +Even if your tool use is out of scope you should still always consider
> +if it would help reviewing your contribution if the reviewer knows
> +about the tool that you used.

This is great, I agree very much that we have to be reasonable about these
uses.

The final sentence is also great.

> +
> +In Scope
> +========
> +
> +These guidelines apply when a meaningful amount of content in a kernel
> +contribution was not written by a person in the Signed-off-by chain,
> +but was instead created by a tool.

Yes, perhaps useful actually using the term 'meaningful amount' rather than
trying to be absolutely explicit about what this entails.

Also allows for maintainer discretion.

> +
> +Detection of a problem is also a part of the development process; if a
> +tool was used to find a problem addressed by a change, that should be
> +noted in the changelog. This not only gives credit where it is due, it
> +also helps fellow developers find out about these tools.
> +
> +Some examples:
> + - Any tool-suggested fix such as ``checkpatch.pl --fix``
> + - Coccinelle scripts
> + - A chatbot generated a new function in your patch to sort list entries.
> + - A .c file in the patch was originally generated by a LLM but cleaned
> +   up by hand.
> + - The changelog was generated by handing the patch to a generative AI
> +   tool and asking it to write the changelog.
> + - The changelog was translated from another language.
> +
> +If in doubt, choose transparency and assume these guidelines apply to
> +your contribution.

Yes agreed.

> +
> +Guidelines
> +==========
> +
> +First, read the Developer's Certificate of Origin:
> +Documentation/process/submitting-patches.rst . Its rules are simple
> +and have been in place for a long time. They have covered many
> +tool-generated contributions.
> +
> +Second, when making a contribution, be transparent about the origin of
> +content in cover letters and changelogs. You can be more transparent
> +by adding information like this:
> +
> + - What tools were used?
> + - The input to the tools you used, like the coccinelle source script.

Not sure repeatedly using coccinelle as an example is helpful, as
coccinelle is far less of an issue than LLM tooling, perhaps for the
avoidance of doubt, expand this to include references to that?

> + - If code was largely generated from a single or short set of
> +   prompts, include those prompts in the commit log. For longer
> +   sessions, include a summary of the prompts and the nature of
> +   resulting assistance.

Maybe worth saying send it in a cover letter if a series, but perhaps
pedantic.

> + - Which portions of the content were affected by that tool?
> +
> +As with all contributions, individual maintainers have discretion to
> +choose how they handle the contribution. For example, they might:
> +
> + - Treat it just like any other contribution
> + - Reject it outright
> + - Review the contribution with extra scrutiny
> + - Suggest a better prompt instead of suggesting specific code changes
> + - Ask for some other special steps, like asking the contributor to
> +   elaborate on how the tool or model was trained
> + - Ask the submitter to explain in more detail about the contribution
> +   so that the maintainer can feel comfortable that the submitter fully
> +   understands how the code works.

OK I wrote something suggesting you add this and you already have :) that's
great. Let me go delete that request :)

However I'm not sure the 'as with all contributions' is right though - as a
maintainer in mm I don't actually feel that we can reject outright without
having to give significant explanation as to why.

And I think that's often the case - people (rightly) dislike blanket NAKs
and it's a terrible practice, which often (also rightly) gets pushback from
co-maintainers or others in the community.

So I think perhaps it'd also be useful to very explicitly say that
maintainers may say no summarily in instances where the review load would
simply be too much to handle large clearly-AI-generated and
clearly-unfiltered series.

Another point to raise perhaps is that - even in the cases where the
submitter is carefully reviewing generated output - that submitters must be
reasonable in terms of the volume they submit. This is perhaps hand wavey
but mentioning it would be great not least for the ability for maintainers
to point at the doc and reference it.

> diff --git a/Documentation/process/index.rst b/Documentation/process/index.rst
> index aa12f26601949..e1a8a31389f53 100644
> --- a/Documentation/process/index.rst
> +++ b/Documentation/process/index.rst
> @@ -68,6 +68,7 @@ beyond).
>     stable-kernel-rules
>     management-style
>     researcher-guidelines
> +   generated-content
>
>  Dealing with bugs
>  -----------------

I guess this is a WIP?

> --
> 2.34.1
>
>

Thanks, Lorenzo