linux-kernel - Re: [PATCH RFC 1/2] docs: Add documentation generation for Kconfig symbols

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250407104429.45c0ba77@sal.lan>
Date: Mon, 7 Apr 2025 10:47:55 +0800
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: "Nícolas F. R. A. Prado" <nfraprado@...labora.com>
Cc: Jonathan Corbet <corbet@....net>, Masahiro Yamada
 <masahiroy@...nel.org>, Nathan Chancellor <nathan@...nel.org>, Nicolas
 Schier <nicolas.schier@...ux.dev>, kernel@...labora.com,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-kbuild@...r.kernel.org, Mauro Carvalho Chehab <mchehab@...nel.org>
Subject: Re: [PATCH RFC 1/2] docs: Add documentation generation for Kconfig
 symbols

Em Fri, 04 Apr 2025 10:02:52 -0400
Nícolas F. R. A. Prado <nfraprado@...labora.com> escreveu:

> Add the contents of all Kconfig files to the Documentation to both
> increase their visibility and allow for cross-referencing throughout the
> documentation. In order to achieve this:
> * Add a new script 'kconfig2rst' that converts a Kconfig file into a
>   reStructuredText document.
> * Add an extra step to the documentation building that runs the script
>   for every Kconfig in the source tree, generating a documentation page
>   for each one.
> * Add a new "Kconfig symbols" page in the documentation, that is listed
>   on the "Kernel Build System" page, which contains an index of all
>   Kconfig files and their Kconfig symbols, linking to the corresponding
>   pages.
> 
> The generated documentation pages have the config symbols as sections
> with labels that can be referenced from anywhere in the documentation.
> The exceptions are configs that appear multiple times. Those don't get
> labels, as that would generate 'duplicate label' warnings from sphinx.
> To allow this, a list of configs that appear more than once is embedded
> in the kconfig2rst script. When a config appears more than once in the
> same Kconfig file, a count is appended in the section to prevent
> sphinx's auto-labeling to cause the same warning.
> 
> The paths in 'source' directives in the Kconfig files are turned into
> links to the generated documentation page to allow for navigation to
> included Kconfig files.
> 
> Config symbols on 'depends'/'select'/etc lines are prepended by
> 'CONFIG_' to allow them to be cross-referenced by automarkup, though no
> cross-references are created in this commit.

Despite the huge increase on the time to produce documentation, I'm not
sure how worth is to have it, as there are already cross-reference
services doing something somewhat similar, like:

	https://elixir.bootlin.com

Yet, I didn't test this series yet. So, not sure yet about its
value.

Anyway, it follows some comments about the current implementation.

After addressed on a v2, I intend to test and see how it behaves.

> 
> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@...labora.com>
> ---
>  Documentation/.gitignore       |   2 +
>  Documentation/Config/index.rst |  17 +++
>  Documentation/Makefile         |  12 +-
>  Documentation/kbuild/index.rst |   2 +
>  scripts/kconfig2rst.py         | 336 +++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 368 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/.gitignore b/Documentation/.gitignore
> index d6dc7c9b8e25020f1f3b28811df2291c38695d5f..2fc70a398dc874fcb83834cb6337f602c64a070a 100644
> --- a/Documentation/.gitignore
> +++ b/Documentation/.gitignore
> @@ -1,3 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  output
>  *.pyc
> +Config/
> +!Config/index.rst
> diff --git a/Documentation/Config/index.rst b/Documentation/Config/index.rst
> new file mode 100644
> index 0000000000000000000000000000000000000000..2abaa9844dd2a9f57bed0a8d050da3538865b1a5
> --- /dev/null
> +++ b/Documentation/Config/index.rst
> @@ -0,0 +1,17 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============
> +Kconfig symbols
> +===============
> +
> +.. toctree::
> +   :glob:
> +   :maxdepth: 2
> +
> +   *
> +   */*
> +   */*/*
> +   */*/*/*
> +   */*/*/*/*
> +   */*/*/*/*/*
> +   */*/*/*/*/*/*

That sounds weird, hard to maintain and probably slow.

Better to have a Sphinx extension instead, with a decent implementation
of glob. The Python's one is slow, on my tests with the Kernel tree.
I worked on something that worked fine for kernel-doc.py:

	https://lore.kernel.org/linux-doc/12a54f1b8f4afd2e70a87195a2aa34f96d736b77.1740387599.git.mchehab+huawei@kernel.org/

Perhaps this script could import the class from it, once such
series gets merged. It could make sense to split it on a separate file
if we're going to re-use its code.

> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index 63094646df2890a788542a273e4a828a844b2932..74ebc5303b47f0837a9ab31d39b5464af5f17995 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -115,7 +115,7 @@ $(YNL_INDEX): $(YNL_RST_FILES)
>  $(YNL_RST_DIR)/%.rst: $(YNL_YAML_DIR)/%.yaml $(YNL_TOOL)
>  	$(Q)$(YNL_TOOL) -i $< -o $@
>  
> -htmldocs texinfodocs latexdocs epubdocs xmldocs: $(YNL_INDEX)
> +htmldocs texinfodocs latexdocs epubdocs xmldocs: $(YNL_INDEX) kconfigdocs
>  
>  htmldocs:
>  	@$(srctree)/scripts/sphinx-pre-install --version-check
> @@ -182,9 +182,19 @@ endif # HAVE_SPHINX
>  refcheckdocs:
>  	$(Q)cd $(srctree);scripts/documentation-file-ref-check
>  
> +KCONFIG_DOC_DIR=$(srctree)/Documentation/Config
> +KCONFIGS := $(shell find $(srctree) -name Kconfig -type f)
> +KCONFIGS_RST := $(patsubst %, $(KCONFIG_DOC_DIR)/%.rst, $(KCONFIGS))
> +
> +$(KCONFIGS_RST): $(KCONFIGS)
> +	$(Q)cd $(srctree); $(foreach var,$^,$(shell mkdir -p $(KCONFIG_DOC_DIR)/$(shell dirname $(var)); scripts/kconfig2rst.py $(var) >$(KCONFIG_DOC_DIR)/$(var).rst))
> +
> +kconfigdocs: $(KCONFIGS_RST)
> +
>  cleandocs:
>  	$(Q)rm -f $(YNL_INDEX) $(YNL_RST_FILES)
>  	$(Q)rm -rf $(BUILDDIR)
> +	$(Q)rm -rf $(filter-out %index.rst,$(wildcard $(KCONFIG_DOC_DIR)/*))
>  	$(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/userspace-api/media clean
>  
>  dochelp:
> diff --git a/Documentation/kbuild/index.rst b/Documentation/kbuild/index.rst
> index 3731ab22bfe745c5c51963cffe58fb652dadf88c..47a1d9753a9fb7b55b8a7141da8123ca97b15cfb 100644
> --- a/Documentation/kbuild/index.rst
> +++ b/Documentation/kbuild/index.rst
> @@ -15,6 +15,8 @@ Kernel Build System
>      makefiles
>      modules
>  
> +    /Config/index
> +
>      headers_install
>  
>      issues
> diff --git a/scripts/kconfig2rst.py b/scripts/kconfig2rst.py
> new file mode 100755
> index 0000000000000000000000000000000000000000..5af073a1c669ac43c95bb7af00099dcd9473a6ae
> --- /dev/null
> +++ b/scripts/kconfig2rst.py
> @@ -0,0 +1,336 @@
> +#!/usr/bin/env python3
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright 2025 Collabora Ltd
> +
> +import sys
> +import re
> +import os
> +
> +import argparse
> +
> +BASE_PATH_DEFAULT = "Documentation/Config/"
> +CFG_LEN = 60

> +RE_indentation = r"^[ \t]*"
> +in_help_txt = False
> +help_txt = ""

Better to follow Python's standards and keep all constants on uppercase.

I would place the main code inside a class, with the non-const data
inside the class, as it makes the code cleaner and helps using it as
both a Sphinx extension and as a standalone command (which is useful
for testing it).

> +
> +# These configs appear more than once, thus we don't generate labels or xrefs to
> +# them to avoid duplicate label warnings from Sphinx
> +REPEATED_CONFIGS = [
> +    "32BIT",
> +    "4KSTACKS",
> +    "64BIT",
> +    "A",
> +    "ADVANCED_OPTIONS",
> +    "ALPHA_LEGACY_START_ADDRESS",
> +    "ARCH_AIROHA",
> +    "ARCH_ALPINE",
> +    "ARCH_BCM2835",
> +    "ARCH_BCM_IPROC",
> +    "ARCH_BRCMSTB",
> +    "ARCH_DEFAULT_CRASH_DUMP",
> +    "ARCH_FLATMEM_ENABLE",
> +    "ARCH_FORCE_MAX_ORDER",
> +    "ARCH_HAS_ADD_PAGES",
> +    "ARCH_HAS_CACHE_LINE_SIZE",
> +    "ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION",
> +    "ARCH_HAS_ILOG2_U32",
> +    "ARCH_HAS_ILOG2_U64",
> +    "ARCH_HIBERNATION_HEADER",
> +    "ARCH_HIBERNATION_POSSIBLE",
> +    "ARCH_HISI",
> +    "ARCH_MAY_HAVE_PC_FDC",
> +    "ARCH_MEMORY_PROBE",
> +    "ARCH_MMAP_RND_BITS_MAX",
> +    "ARCH_MMAP_RND_BITS_MIN",
> +    "ARCH_MMAP_RND_COMPAT_BITS_MAX",
> +    "ARCH_MMAP_RND_COMPAT_BITS_MIN",
> +    "ARCH_MTD_XIP",
> +    "ARCH_OMAP",
> +    "ARCH_PKEY_BITS",
> +    "ARCH_PROC_KCORE_TEXT",
> +    "ARCH_R9A07G043",
> +    "ARCH_RENESAS",
> +    "ARCH_ROCKCHIP",
> +    "ARCH_SELECT_MEMORY_MODEL",
> +    "ARCH_SELECTS_CRASH_DUMP",
> +    "ARCH_SELECTS_KEXEC_FILE",
> +    "ARCH_SPARSEMEM_DEFAULT",
> +    "ARCH_SPARSEMEM_ENABLE",
> +    "ARCH_SUNXI",
> +    "ARCH_SUPPORTS_CRASH_DUMP",
> +    "ARCH_SUPPORTS_CRASH_HOTPLUG",
> +    "ARCH_SUPPORTS_KEXEC",
> +    "ARCH_SUPPORTS_KEXEC_FILE",
> +    "ARCH_SUPPORTS_KEXEC_JUMP",
> +    "ARCH_SUPPORTS_KEXEC_PURGATORY",
> +    "ARCH_SUPPORTS_KEXEC_SIG",
> +    "ARCH_SUPPORTS_UPROBES",
> +    "ARCH_SUSPEND_POSSIBLE",
> +    "ARCH_UNIPHIER",
> +    "ARCH_VIRT",
> +    "AUDIT_ARCH",
> +    "B",
> +    "BCH_CONST_M",
> +    "BCH_CONST_T",
> +    "BUILTIN_DTB",
> +    "BUILTIN_DTB_NAME",
> +    "C",
> +    "CC_HAVE_STACKPROTECTOR_TLS",
> +    "CHOICE_B",
> +    "CHOICE_C",
> +    "CMDLINE",
> +    "CMDLINE_BOOL",
> +    "CMDLINE_EXTEND",
> +    "CMDLINE_FORCE",
> +    "CMDLINE_FROM_BOOTLOADER",
> +    "CMDLINE_OVERRIDE",
> +    "CMM",
> +    "COMPAT",
> +    "COMPAT_VDSO",
> +    "CORE",
> +    "CORE_BELL_A",
> +    "CORE_BELL_A_ADVANCED",
> +    "CPU_BIG_ENDIAN",
> +    "CPU_HAS_FPU",
> +    "CPU_HAS_PREFETCH",
> +    "CPU_LITTLE_ENDIAN",
> +    "CRYPTO_CHACHA20_NEON",
> +    "CRYPTO_JITTERENTROPY_MEMORY_BLOCKS",
> +    "CRYPTO_JITTERENTROPY_MEMORY_BLOCKSIZE",
> +    "CRYPTO_JITTERENTROPY_OSR",
> +    "CRYPTO_JITTERENTROPY_TESTINTERFACE",
> +    "CRYPTO_NHPOLY1305_NEON",
> +    "DEBUG_ENTRY",
> +    "DMA_NONCOHERENT",
> +    "DMI",
> +    "DRAM_BASE",
> +    "DUMMY",
> +    "DUMMY_CONSOLE",
> +    "EARLY_PRINTK",
> +    "EFI",
> +    "EFI_STUB",
> +    "FIT_IMAGE_FDT_EPM5",
> +    "FIX_EARLYCON_MEM",
> +    "FPU",
> +    "GENERIC_BUG",
> +    "GENERIC_BUG_RELATIVE_POINTERS",
> +    "GENERIC_CALIBRATE_DELAY",
> +    "GENERIC_CSUM",
> +    "GENERIC_HWEIGHT",
> +    "GENERIC_ISA_DMA",
> +    "GENERIC_LOCKBREAK",
> +    "HAS_IOMEM",
> +    "HAVE_SMP",
> +    "HAVE_TCM",
> +    "HEARTBEAT",
> +    "HIGHMEM",
> +    "HOTPLUG_CPU",
> +    "HW_PERF_EVENTS",
> +    "HZ",
> +    "HZ_100",
> +    "HZ_1000",
> +    "HZ_1024",
> +    "HZ_128",
> +    "HZ_250",
> +    "HZ_256",
> +    "ILLEGAL_POINTER_VALUE",
> +    "IRQSTACKS",
> +    "ISA",
> +    "ISA_DMA_API",
> +    "KASAN_SHADOW_OFFSET",
> +    "KERNEL_MODE_NEON",
> +    "KERNEL_START",
> +    "KERNEL_START_BOOL",
> +    "KUSER_HELPERS",
> +    "KVM",
> +    "KVM_GUEST",
> +    "L1_CACHE_SHIFT",
> +    "LEDS_EXPRESSWIRE",
> +    "LOCKDEP_SUPPORT",
> +    "LOWMEM_SIZE",
> +    "LOWMEM_SIZE_BOOL",
> +    "MACH_LOONGSON32",
> +    "MACH_LOONGSON64",
> +    "MACH_TX49XX",
> +    "MAGIC_SYSRQ",
> +    "MATH_EMULATION",
> +    "MCOUNT",
> +    "MMU",
> +    "NODES_SHIFT",
> +    "NO_IOPORT_MAP",
> +    "NR_CPUS",
> +    "NR_CPUS_DEFAULT",
> +    "NR_CPUS_RANGE_END",
> +    "NUMA",
> +    "PAGE_OFFSET",
> +    "PANIC_TIMEOUT",
> +    "PARAVIRT",
> +    "PARAVIRT_SPINLOCKS",
> +    "PARAVIRT_TIME_ACCOUNTING",
> +    "PFAULT",
> +    "PGTABLE_LEVELS",
> +    "PHYSICAL_ALIGN",
> +    "PHYSICAL_START",
> +    "PID_IN_CONTEXTIDR",
> +    "PM",
> +    "POWERPC64_CPU",
> +    "PRINT_STACK_DEPTH",
> +    "RANDOMIZE_BASE",
> +    "RANDOMIZE_BASE_MAX_OFFSET",
> +    "RELOCATABLE",
> +    "SBUS",
> +    "SCHED_CLUSTER",
> +    "SCHED_HRTICK",
> +    "SCHED_MC",
> +    "SCHED_OMIT_FRAME_POINTER",
> +    "SCHED_SMT",
> +    "SERIAL_CONSOLE",
> +    "SMP",
> +    "STACKPROTECTOR_PER_TASK",
> +    "STACKTRACE_SUPPORT",
> +    "SWAP_IO_SPACE",
> +    "SYS_SUPPORTS_APM_EMULATION",
> +    "SYS_SUPPORTS_NUMA",
> +    "SYS_SUPPORTS_SMP",
> +    "TASK_SIZE",
> +    "TASK_SIZE_BOOL",
> +    "TCP_CONG_CUBIC",
> +    "TIME_LOW_RES",
> +    "UNWINDER_FRAME_POINTER",
> +    "UNWINDER_GUESS",
> +    "UNWINDER_ORC",
> +    "USE_OF",
> +    "VMSPLIT_1G",
> +    "VMSPLIT_2G",
> +    "VMSPLIT_3G",
> +    "VMSPLIT_3G_OPT",
> +    "X",
> +    "X86_32",
> +    "X86_64",
> +    "XEN",
> +    "XEN_DOM0",
> +    "XIP_KERNEL",
> +    "XIP_PHYS_ADDR",
> +    "ARCH_BCM",
> +    "VIRTUALIZATION",
> +]

Maintaining this sounds a nightmare, as new (eventually duplicated)
symbols may happen anytime.

The best here sounds to do something similar to what I did with
get_abi.py: parse them all altogether, dynamically detecting
duplication. IMO, it also makes sense to have dereference pages
for such duplicated symbols pointing to all occurrences of them.


> +
> +
> +def print_title(title):
> +    heading = "=" * len(title)
> +    print(heading)
> +    print(title)
> +    print(heading)
> +    print()
> +
> +
> +parser = argparse.ArgumentParser(
> +    prog="kconfig2rst", description="Convert a Kconfig file into ReStructuredText"
> +)
> +
> +parser.add_argument("kconfig", help="Path to input Kconfig file")
> +parser.add_argument(
> +    "--base-doc-path",
> +    default=BASE_PATH_DEFAULT,
> +    help="Base path of generated rST files for usage in 'source' links",
> +)
> +args = parser.parse_args()
> +
> +print_title(args.kconfig)
> +
> +line_accum = ""
> +continued_line = False
> +
> +repeated_config_count = {}
> +
> +with open(args.kconfig) as f:
> +    for il in f:

This won't handle directories. Better to use my glob function.

Also, calling lines as as "il" sounds weird for me. I would just
call it "line".

> +        # If line ends with \, accumulate it and handle full line
> +        if re.search(r"\\\n$", il):

Better to use endswith("\\\n"),, as it is faster. We can also use
removesuffix(), as I guess the minimal Python version is now 3.9.

> +            continued_line = True
> +            line_accum += il[:-2]  # accumulate without backslash and newline
> +            continue
> +
> +        if continued_line:
> +            continued_line = False
> +            l = line_accum + il
> +            line_accum = ""
> +        else:
> +            l = il
> +
> +        if in_help_txt:
> +            if l == "\n":
> +                help_txt += l
> +                continue
> +            if first_line_help_txt:
> +                help_txt_indentation = re.match(RE_indentation, l).group(0).expandtabs()

Please compile all regular expressions, to make it faster.

> +                first_line_help_txt = False
> +            # Consider any line with same or more indentation as part of help text
> +            if (
> +                help_txt_indentation
> +                in re.match(RE_indentation, l).group(0).expandtabs()
> +            ):
> +                help_txt += l
> +                continue
> +            else:
> +                in_help_txt = False
> +                print(help_txt)
> +                help_txt = ""
> +        else:
> +            l = re.sub(r"[*]", r"\*", l)  # Escape asterisks
> +
> +        if re.match(r"^[ \t]*#.*", l):
> +            # Skip comments
> +            continue

I would strip comments first, as I guess Kconfig syntaxe allow to use
comments after any texts, like:

	config SYMBOL # some comment

> +
> +        if re.match(r"^[ \t]*help", l):
> +            in_help_txt = True
> +            first_line_help_txt = True
> +            print("* help::\n")
> +            continue
> +
> +        m = re.match("^[ \t]*(menu)?config (?P<cfgname>[A-Za-z0-9_]+)", l)

Better to accept multiple spaces after config, as it would be valid to
have:

	config    FOO

> +        if m:
> +            section_name = f"\nCONFIG_{m.group('cfgname')}"
> +            underline = f"\n{'='*CFG_LEN}\n"
> +            if m.group("cfgname") in REPEATED_CONFIGS:
> +                repeated_config_count[m.group("cfgname")] = (
> +                    repeated_config_count.get(m.group("cfgname"), 0) + 1
> +                )
> +                if repeated_config_count[m.group("cfgname")] > 1:
> +                    section_name += f"({repeated_config_count[m.group('cfgname')]})"
> +                print(section_name + underline)
> +            else:
> +                print(f"\n.. _CONFIG_{m.group('cfgname')}:\n\n" + section_name + underline)
> +            continue
> +
> +        m = re.match(
> +            r"^[ \t]*(def_bool|def_tristate|depends on|select|range|visible if|imply|default|prompt|bool|tristate|string|hex|int|modules)( \"(.*)\")?(?P<expr> [^\"]*)?",
> +            l,
> +        )

I would place the valid matches on an array and do a join to create the
compiled regex to match them. This would make easier to maintain as 
Kconfig syntax add more notations.

> +        if m:
> +            expr = m.group('expr') if m.group('expr') else ''
> +            not_expr = l
> +            if expr:
> +                expr = re.sub(r'[A-Z0-9_]{2,}', rf" CONFIG_\g<0> ", expr)
> +                not_expr = l[:m.start('expr')]
> +            print("* " + not_expr.lstrip() + expr.rstrip())
> +            continue
> +
> +        m = re.match(r'^[ \t]*source "(.*)"', l)
> +        if m:
> +            # Format Kconfig file paths as Documentation/... so they can be turned
> +            # into links by the automarkup plugin
> +            print(f"\nsource {args.base_doc_path + m.group(1)}.rst\n")
> +            continue
> +
> +        m = re.match(r"[^ \t]*choice|endchoice|comment|menu|endmenu|if|endif", l)

Same here.

> +        if m:
> +            print("\n" + l.strip() + "\n")
> +            continue
> +
> +        print(l.strip())
> +
> +if help_txt:
> +    print(help_txt)  # Flush any pending help text
>