linux-kernel - Re: [RFC 2/2] AI: Add initial set of rules and docs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIQA8oizbtK4zSTL@lappy>
Date: Fri, 25 Jul 2025 18:10:58 -0400
From: Sasha Levin <sashal@...nel.org>
To: Kees Cook <kees@...nel.org>
Cc: workflows@...r.kernel.org, linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org, rostedt@...dmis.org,
	konstantin@...uxfoundation.org, corbet@....net,
	josh@...htriplett.org
Subject: Re: [RFC 2/2] AI: Add initial set of rules and docs

On Fri, Jul 25, 2025 at 01:53:57PM -0700, Kees Cook wrote:
>On Fri, Jul 25, 2025 at 01:53:58PM -0400, Sasha Levin wrote:
>> Add rules based on our existing documentation.
>
>I'd still like this not in Documentation/, but I obviously defer to Jon.
>
>> Require AI to identify itself in the commit message.
>>
>> Signed-off-by: Sasha Levin <sashal@...nel.org>
>> ---
>>  Documentation/AI/main.md | 70 ++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 68 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/AI/main.md b/Documentation/AI/main.md
>> index 959ba50568f57..ca59e52f54445 100644
>> --- a/Documentation/AI/main.md
>> +++ b/Documentation/AI/main.md
>> @@ -1,5 +1,71 @@
>>  # Linux Kernel Development AI Instructions
>>
>> -This is the Linux kernel repository. When working with this codebase, you must follow the following rules:
>> +This is the Linux kernel repository. When working with this codebase, you must follow the Linux kernel development processes and coding standards.
>>
>> -- [ TODO ]
>> +## Essential Documentation References
>> +
>> +### Core Development Process
>> +- **Documentation/process/howto.rst** - Start here! The comprehensive guide on how to become a Linux kernel developer
>> +- **Documentation/process/development-process.rst** - Detailed information on how the kernel development process works
>> +- **Documentation/process/submitting-patches.rst** - Essential guide for getting your code into the kernel
>> +- **Documentation/process/submit-checklist.rst** - Checklist to review before submitting code
>
>Instead of hard-coded paths, I would recommend just discuss the topic
>areas it is expected to find and ingest. :) (e.g. redo the "Key
>principles" list you have later to be more specific about the topic
>areas and adjust the prompting to induce the requirement to find and
>read each topic.)

I'm very open to changing these parts. Ideally we can rewrite it in a
way that's easier for the agent to process rather than something that is
more readable to humans.

>> +
>> +### Coding Standards and Style
>> +- **Documentation/process/coding-style.rst** - Linux kernel coding style (MUST READ)
>> +  - Use tabs (8 characters) for indentation
>> +  - 80-character line limit preferred
>> +  - Specific formatting rules for switch statements, functions, etc.
>> +- **Documentation/process/programming-language.rst** - Language requirements and standards
>> +
>> +### What NOT to Do
>> +- **Documentation/process/deprecated.rst** - Deprecated interfaces and features to avoid
>> +  - Do not use BUG() or BUG_ON() - use WARN() instead
>> +  - Avoid deprecated APIs listed in this document
>> +- **Documentation/process/volatile-considered-harmful.rst** - Why volatile is usually wrong
>
>And the reason I want to avoid such specifics is that even as an example
>above, this ends up being hyperspecific. Why summarize the
>deprecated.rst? Just say "Find and read the notes on deprecated APIs and
>language features"

When we're being explicit with rules, the agent is more likely to not
ignore it (and go "whoops I messed up!" later).

It's a balance we need to find, but I suspect we can fine tune as when
we see how various agents respond to the rules.

>> +### Patch Submission Process
>> +- **Documentation/process/5.Posting.rst** - How to post patches properly
>> +- **Documentation/process/email-clients.rst** - Email client configuration for patches
>> +- **Documentation/process/applying-patches.rst** - How patches are applied
>> +
>> +### Legal and Licensing
>> +- **Documentation/process/license-rules.rst** - Linux kernel licensing rules
>> +  - Kernel is GPL-2.0 only with syscall exception
>> +  - All files must have proper SPDX license identifiers
>
>The only stuff I think should be in this kind of area is a commentary
>about how an Agent differs from a human. "You are not a legal entity;
>you cannot sign the DCO", which you get into below.

I was thinking that if we explicitly call out the GPL requirement, an
agent will avoid searching online resources and potentially embedding
code that is not licensed under GPL.

>> +### Specialized Topics
>> +- **Documentation/process/adding-syscalls.rst** - How to add new system calls
>> +- **Documentation/process/stable-kernel-rules.rst** - Rules for stable kernel patches
>> +- **Documentation/process/security-bugs.rst** - Handling security issues
>> +- **Documentation/process/handling-regressions.rst** - Dealing with regressions
>> +
>> +### Maintainer Guidelines
>> +- **Documentation/process/maintainers.rst** - Working with subsystem maintainers
>> +- **Documentation/process/maintainer-handbooks.rst** - Subsystem-specific guidelines
>> +
>> +## Key Principles
>> +1. Read and follow the documentation before making changes
>> +2. Respect the existing code style and conventions
>> +3. Test thoroughly before submitting
>> +4. Write clear, descriptive commit messages
>> +5. Never break userspace (the #1 rule)
>> +6. Identify yourself as AI in commits (see below)
>
>Everything except #6 is already expected of human devs, so I think just
>the last item.
>
>> +
>> +## AI Attribution Requirement
>> +When creating commits, you MUST identify yourself as an AI assistant by including the following tag in the commit message:
>> +
>> +```
>> +Co-developed-by: $AI_NAME $AI_MODEL $AI_VERSION
>
>If we're going to go with Co-developed-by: here, then I think we need to
>explicitly say "do not include an email", and we must update
>checkpatch.pl to not yell about the missing S-o-b when it finds a C-d-b.
>(Perhaps it can skip the check with there is no email address in the
>C-b-d line?)
>
>> +```
>> +
>> +For example:
>> +- `Co-developed-by: Claude claude-3-opus-20240229`
>> +- `Co-developed-by: GitHub-Copilot GPT-4 v1.0.0`
>> +- `Co-developed-by: Cursor gpt-4-turbo-2024-04-09`
>> +
>> +This transparency helps maintainers and reviewers understand that AI was involved in the development process.
>> +
>> +### Signed-off-by Restrictions
>> +AI assistants MUST NOT add a Signed-off-by tag pointing to themselves. The Signed-off-by tag represents a legal certification by a human developer that they have the right to submit the code under the open source license.
>
>Hello trailing whitespace my old friend.
>
>"Unless explicitly told otherwise, Agents must never have trailing
>whitespace on any line and all files must have a final newline
>character." :)
>
>> +
>> +Only the human user running the AI assistant should add their Signed-off-by tag to commits. The AI's contribution is acknowledged through the Co-developed-by tag as described above.
>
>And can we please not use the term "AI"? I think "Agent" is the better
>generic term as it could include other things?

Ack

-- 
Thanks,
Sasha