linux-kernel - Re: Follow-up on Linux-kernel code accessibility

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ea21f17-7531-4aa1-a162-4d7cd7645e72@paulmck-laptop>
Date: Thu, 8 Jan 2026 17:40:46 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Gabriele Paoloni <gpaoloni@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Kate Stewart <kstewart@...uxfoundation.org>,
	Chuck Wolber <chuckwolber@...il.com>,
	"Julia.Lawall@...ia.fr" <Julia.Lawall@...ia.fr>,
	Dmitry Vyukov <dvyukov@...gle.com>,
	Mark Rutland <mark.rutland@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Shuah Khan <skhan@...uxfoundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: Follow-up on Linux-kernel code accessibility

On Tue, Jan 06, 2026 at 06:05:28PM +0000, Lorenzo Stoakes wrote:
> Sorry been on leave!

No problem!  At least one of us has priorities straight.  ;-)

> Sorry to fork the thread but going to take me a while to catch up with it.
> 
> On Thu, Dec 18, 2025 at 11:49:21AM -0800, Paul E. McKenney wrote:
> > Hello!
> >
> > Just following up on some Linux Plumbers Conference discussions on the
> > accessibility of Linux-kernel code to people ranging from novices to
> > the developers and maintainers of the code in question.  I am adding
> > Lorenze on CC not because he was involved with these discussions (at
> > least as far as I know), but rather because I am using some of his work
> > in my follow-up analysis.
> >
> > The Linux kernel's mm system weighs in at about 200KLoC, and Lorenzo
> > wrote a book on its design that weighs in at about 1300 pages, or
> > about 150 LoC/page.  This suggests that the Linux-kernel scheduler,
> > which weighs in at about 70KLoC and has similar heuristics/workload
> > challenges as does mm, would require a 430-page textbook to provide a
> > similar level of design detail.  By this methodology, RCU would require
> > "only" 190 pages, presumably substituting its unfamiliarity for sched's
> > and mm's deeply heuristic and workload-dependent nature.
> 
> Well - keep in mind my book explicitly and intentionally excludes a _great
> deal_ of topics (simply because I didn't have the time or capacity to cover
> more), and even when exploring the code, I made liberal use of 'X is out of
> scope' here to a. make it readable without being distracted constantly, and
> b. again for time/capacity reasons.
> 
> And of course, I focused on only one architecture for anything
> arch-specific (x86-64) with similar excuses^Wreasoning so there's that as a
> multiplier too.
> 
> Overall I suspect what I cover is really only 10% of mm, as well or not
> otherwise as I did.
> 
> So I'd x10 the LoC there ;)

Fair enough.  And if you have covered 10% of mm, you are probably ahead
of the coverage of Documntation/RCU/Design.  ;-)

> > Sadly, this data does not support the hypothesis that we can create
> > comments that will provide understanding to people taking random dives
> > into the Linux kernel's source code.  In contrast to code that is closely
> > associated with a specific type of mechanical device, Linux-kernel
> > code requires the reader to possess a great deal of abstract and global
> > conceptual/workload information.
> >
> > This is not to say that the Linux kernel's internal documentation
> > (including its comments) cannot or should not be improved.
> > They clearly should.  It instead means that a necessary part of any
> > instant-understanding methodology for the Linux kernel include active
> > software assistance, for example, Anthropic's Claude LLM or IBM's (rather
> > older but less readily accessible) Analysis and Renovation Catalyst (ARC).
> > I am not denigrating other options, but rather restricting myself to
> > tools with which I have personal experience.
> 
> In my view AI is useful in the hands of an expert who can determine when it
> tells the truth or not.
> 
> So you have a catch-22 there that's unresolvable by such tooling in my
> opinion, and developers relying on that from the start are likely to not
> have the right mental muscles exercised in my opinion.
> 
> I think there's definitely a place for AI, but I feel like this is not
> it. And I think we'd do people a disservice by suggesting it.

Is AI really worse than a randomly selected human-generated comment in
the Linux kernel?  Especially one of the older comments?  I agree that
the best possible human-generated comment will likely beat the current
crop of LLMs, but we are definitely not oversupplied with anything
resembling best possible comments here.  ;-)

> A big idea in my book is to get people familiar with the concepts and the
> code presented together so they can end up reading the code and
> understanding it on the basis of the book having tied the two together and
> shown 'hey it's not so bad you can extract meaning from this!'
> 
> That way, they can take the inevitably out of date contents and update to
> the latest kernel with the skills developed (and of course many of the
> concepts will remain valid).

Makes sense to me!

							Thanx, Paul

> > And one reason for continued but reasonable emphasis on internal
> > documentation, including comments, is that the aforementioned tools
> > ingest that documentation.  ;-)
> >
> > Thoughts?
> >
> > And in the meantime, happy holidays for those celebrating them!
> >
> > 							Thanx, Paul
> 
> Cheers, Lorenzo