linux-kernel - Re: [RFC 2/2] AI: Add initial set of rules and docs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202507251341.C933489@keescook>
Date: Fri, 25 Jul 2025 13:53:57 -0700
From: Kees Cook <kees@...nel.org>
To: Sasha Levin <sashal@...nel.org>
Cc: workflows@...r.kernel.org, linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org, rostedt@...dmis.org,
	konstantin@...uxfoundation.org, corbet@....net,
	josh@...htriplett.org
Subject: Re: [RFC 2/2] AI: Add initial set of rules and docs

On Fri, Jul 25, 2025 at 01:53:58PM -0400, Sasha Levin wrote:
> Add rules based on our existing documentation.

I'd still like this not in Documentation/, but I obviously defer to Jon.

> Require AI to identify itself in the commit message.
> 
> Signed-off-by: Sasha Levin <sashal@...nel.org>
> ---
>  Documentation/AI/main.md | 70 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 68 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/AI/main.md b/Documentation/AI/main.md
> index 959ba50568f57..ca59e52f54445 100644
> --- a/Documentation/AI/main.md
> +++ b/Documentation/AI/main.md
> @@ -1,5 +1,71 @@
>  # Linux Kernel Development AI Instructions
>  
> -This is the Linux kernel repository. When working with this codebase, you must follow the following rules:
> +This is the Linux kernel repository. When working with this codebase, you must follow the Linux kernel development processes and coding standards.
>  
> -- [ TODO ]
> +## Essential Documentation References
> +
> +### Core Development Process
> +- **Documentation/process/howto.rst** - Start here! The comprehensive guide on how to become a Linux kernel developer
> +- **Documentation/process/development-process.rst** - Detailed information on how the kernel development process works
> +- **Documentation/process/submitting-patches.rst** - Essential guide for getting your code into the kernel
> +- **Documentation/process/submit-checklist.rst** - Checklist to review before submitting code

Instead of hard-coded paths, I would recommend just discuss the topic
areas it is expected to find and ingest. :) (e.g. redo the "Key
principles" list you have later to be more specific about the topic
areas and adjust the prompting to induce the requirement to find and
read each topic.)

> +
> +### Coding Standards and Style
> +- **Documentation/process/coding-style.rst** - Linux kernel coding style (MUST READ)
> +  - Use tabs (8 characters) for indentation
> +  - 80-character line limit preferred
> +  - Specific formatting rules for switch statements, functions, etc.
> +- **Documentation/process/programming-language.rst** - Language requirements and standards
> +
> +### What NOT to Do
> +- **Documentation/process/deprecated.rst** - Deprecated interfaces and features to avoid
> +  - Do not use BUG() or BUG_ON() - use WARN() instead
> +  - Avoid deprecated APIs listed in this document
> +- **Documentation/process/volatile-considered-harmful.rst** - Why volatile is usually wrong

And the reason I want to avoid such specifics is that even as an example
above, this ends up being hyperspecific. Why summarize the
deprecated.rst? Just say "Find and read the notes on deprecated APIs and
language features"

> +
> +### Patch Submission Process
> +- **Documentation/process/5.Posting.rst** - How to post patches properly
> +- **Documentation/process/email-clients.rst** - Email client configuration for patches
> +- **Documentation/process/applying-patches.rst** - How patches are applied
> +
> +### Legal and Licensing
> +- **Documentation/process/license-rules.rst** - Linux kernel licensing rules
> +  - Kernel is GPL-2.0 only with syscall exception
> +  - All files must have proper SPDX license identifiers

The only stuff I think should be in this kind of area is a commentary
about how an Agent differs from a human. "You are not a legal entity;
you cannot sign the DCO", which you get into below.

> +
> +### Specialized Topics
> +- **Documentation/process/adding-syscalls.rst** - How to add new system calls
> +- **Documentation/process/stable-kernel-rules.rst** - Rules for stable kernel patches
> +- **Documentation/process/security-bugs.rst** - Handling security issues
> +- **Documentation/process/handling-regressions.rst** - Dealing with regressions
> +
> +### Maintainer Guidelines
> +- **Documentation/process/maintainers.rst** - Working with subsystem maintainers
> +- **Documentation/process/maintainer-handbooks.rst** - Subsystem-specific guidelines
> +
> +## Key Principles
> +1. Read and follow the documentation before making changes
> +2. Respect the existing code style and conventions
> +3. Test thoroughly before submitting
> +4. Write clear, descriptive commit messages
> +5. Never break userspace (the #1 rule)
> +6. Identify yourself as AI in commits (see below)

Everything except #6 is already expected of human devs, so I think just
the last item.

> +
> +## AI Attribution Requirement
> +When creating commits, you MUST identify yourself as an AI assistant by including the following tag in the commit message:
> +
> +```
> +Co-developed-by: $AI_NAME $AI_MODEL $AI_VERSION

If we're going to go with Co-developed-by: here, then I think we need to
explicitly say "do not include an email", and we must update
checkpatch.pl to not yell about the missing S-o-b when it finds a C-d-b.
(Perhaps it can skip the check with there is no email address in the
C-b-d line?)

> +```
> +
> +For example:
> +- `Co-developed-by: Claude claude-3-opus-20240229`
> +- `Co-developed-by: GitHub-Copilot GPT-4 v1.0.0`
> +- `Co-developed-by: Cursor gpt-4-turbo-2024-04-09`
> +
> +This transparency helps maintainers and reviewers understand that AI was involved in the development process.
> +
> +### Signed-off-by Restrictions
> +AI assistants MUST NOT add a Signed-off-by tag pointing to themselves. The Signed-off-by tag represents a legal certification by a human developer that they have the right to submit the code under the open source license. 

Hello trailing whitespace my old friend.

"Unless explicitly told otherwise, Agents must never have trailing
whitespace on any line and all files must have a final newline
character." :)

> +
> +Only the human user running the AI assistant should add their Signed-off-by tag to commits. The AI's contribution is acknowledged through the Co-developed-by tag as described above.

And can we please not use the term "AI"? I think "Agent" is the better
generic term as it could include other things?

-Kees

-- 
Kees Cook