linux-kernel - Re: [PATCH v4 08/19] tools/docs: sphinx-build-wrapper: add a wrapper for sphinx-build

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250912100645.15c79351@foz.lan>
Date: Fri, 12 Sep 2025 10:06:45 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jonathan Corbet <corbet@....net>
Cc: Jani Nikula <jani.nikula@...ux.intel.com>, Linux Doc Mailing List
 <linux-doc@...r.kernel.org>, Björn Roy Baron
 <bjorn3_gh@...tonmail.com>, Alex Gaynor <alex.gaynor@...il.com>, Alice Ryhl
 <aliceryhl@...gle.com>, Boqun Feng <boqun.feng@...il.com>, Gary Guo
 <gary@...yguo.net>, Trevor Gross <tmgross@...ch.edu>,
 linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH v4 08/19] tools/docs: sphinx-build-wrapper: add a
 wrapper for sphinx-build

Em Thu, 11 Sep 2025 13:47:54 -0600
Jonathan Corbet <corbet@....net> escreveu:

> Jani Nikula <jani.nikula@...ux.intel.com> writes:
> 
> > On Thu, 11 Sep 2025, Jonathan Corbet <corbet@....net> wrote:  
> >> A couple of times I have looked into using intersphinx, making each book
> >> into an actually separate book.  The thing I always run into is that
> >> doing a complete docs build, with working references, would require
> >> building everything twice.  This is probably worth another attempt one
> >> of these years...  

There are a couple of different usecase scenarios for building docs.

1) The first and most important one is to produce book(s) for people
   to use. This is usually done by some automation, and the result is
   placed on places like:
	- https://docs.kernel.org/

   and on subsystem-specific places like:
	- https://linuxtv.org/downloads/v4l-dvb-apis-new/

for scenario (1), taking twice the time to build is not an issue, as
nobody will be sitting on a chair waiting for the build to finish.

On such scenario, SPHINXDIRS is important on subsystem-specific docs.
For instance, on media, we use SPHINXDIRS to pick parts of 3 different
books:

	- Documentation/admin-guide/media/
	- Documentation/driver-api/media/
	- Documentation/userspace-api/media/

What media automation does, once per day, is:

	# Non-essencial parts of index.rst dropped
	cat <<END >Documentation/media/index.rst
	================================
	Linux Kernel Media Documentation
	================================

	.. toctree::

	        admin-guide/index
        	driver-api/index
	        userspace-api/index
	END

	rsync -uAXEHlaSx -W --inplace --delete Documentation/admin-guide/media/ Documentation/media/admin-guide
	rsync -uAXEHlaSx -W --inplace --delete Documentation/driver-api/media/ Documentation/media/driver-api
	rsync -uAXEHlaSx -W --inplace --delete Documentation/userspace-api/media/ Documentation/media/userspace-api

	make SPHINXDIRS='media' CSS='$CSS' DOCS_THEME='$DOCS_THEME' htmldocs
	make SPHINXDIRS='media' pdfdocs
	make SPHINXDIRS='media' epubdocs

2) CI tests. Here, taking more time usually is not a problem, except
   when CI is used before pushing stuff, and the developer has to wait
   it to finish before pushing.

For scenario (2), a build time increase is problematic, as, if it now
takes twice the time, a change like that will require twice the
resources for the build with may increase costs.

3) developers who touched docs. They want a way to quickly build and
   verify the output for their changes.

Here, any time increase is problematic, and SPHINXDIRS play an important 
hole by allowing them to build only the touched documents.

For instance, when I was developing Netlink yaml plugin, I had to use
dozens of times:

	make SPINXDRS=Documentation/netlink/specs/ htmldocs

If I had to build the entire documentation every time, the development
time would increase from days to weeks.

Looking on those three scenarios, the only one where intersphinx is
useful is (1).

From my PoV, we should support intersphinx, but this should be optional.
Also, one has to point from where intersphinx will point unsolved
symbols. So, we would need something like:

	make SPHINXREFMAP=intersphinx_mapping.py htmldocs

where intersphinx_mapping.py would be a file containing intersphinx
configuration. We would add a default map at Documentation/, while
letting it to be overridden if some subsystem has different requirements
or is using a different CSS tamplate or not using alabaster.

> > I think the main factor in that should be whether it makes sense from
> > overall documentation standpoint, not the technical details.

Agreed.

> > Having several books might make sense. It might even be helpful in
> > organizing the documentation by audiences. But having the granularity of
> > SPHINXDIRS with that would be overkill. 

On the contrary. SPHINXDIRS granuarity is very important for scenario (3).

> > And there needs to be a book to
> > bring them together, and link to the other books, acting as the landing
> > page.  
> 
> Well, I think that the number of existing directories needs to be
> reduced rather further.  I made progress in that direction by coalescing
> all the arch docs under Documentation/arch/.  I would like to do
> something similar with all the device-specific docs, creating
> Documentation/devices/.  Then we start to get to a reasonable number of
> books.

I don't think reducing the number of books should be the goal, but,
instead, to have them with a clear and coherent organization with focus
on the audience that will be actually using them.

After reorg, we may have less books. That's fine. But it is also fine
if we end with more books.

I lost the battle years ago, but I still believe that, at least for
some subsystems like media, i2c, DRM, security and others, a 
subsystem-specific book could be better. After all, the audience for
such subsystems is very specialized.

> > I believe it should be possible to generate the intersphinx inventory
> > without generating the full html or pdf documentation. So I don't think
> > it's actually two complete docs builds. It might speed things up to have
> > a number of independent documentation builds.  
> 
> That's a good point, I hadn't looked into that part.  The builder phase
> takes a lot of the time, if that could be cut out things would go
> faster. 

Indeed, but we need to double check if .doctree cache expiration will
happen the right way for all books affected by a partial build.

During this merge window, I sent a RFC patch in the middle of a comment
with a conf.py logic to detect Sphinx cache expiration. I remember I
added a comment asking if we should upstream it or not, but, as nobody
answered, I ended forgetting about it.

If we're willing to experiment with that, I recommend looking on such
patch and add a variant of it, enabled via V=1 or via some debug
parameter.

The goal would be to check if a change on a file will ensure that all
books using it will have cache expiration and be rebuilt.

> > As to the working references, IIUC partial builds with SPHINXDIRS
> > doesn't get that part right if there are references outside of the
> > designated dirs, leading to warnings.  
> 
> That is true.  My point though is that, to get the references right with
> a *full* build, a two-pass approach is needed though, as you suggest,
> perhaps the first pass could be faster.

How fast? during development time, SPHINXDIRS means a couple of seconds:

	$ make clean; time make SPHINXDIRS="peci" htmldocs
	...
	real    0m1,373s
	user    0m1,348s

Even more complex builds, even when picking more than one book, like this:

	$ make clean; time make SPHINXDIRS="driver-api/media/ userspace-api/media/" htmldocs
	...
	real    0m11,801s
	user    0m31,381s
	sys     0m6,880s

it still fits at the seconds range. Can interphinx first pass have a
similar build time?

Thanks,
Mauro