linux-kernel - Re: [PATCH v2 09/39] scripts/kernel-doc.py: add a Python parser

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87v7sy29rh.fsf@trenco.lwn.net>
Date: Mon, 24 Feb 2025 16:38:58 -0700
From: Jonathan Corbet <corbet@....net>
To: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>, Linux Doc Mailing
 List <linux-doc@...r.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>, "Gustavo A. R. Silva"
 <mchehab+huawei@...nel.org>, Mauro Carvalho Chehab
 <mchehab+huawei@...nel.org>, Kees Cook <mchehab+huawei@...nel.org>,
 linux-hardening@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 09/39] scripts/kernel-doc.py: add a Python parser

Mauro Carvalho Chehab <mchehab+huawei@...nel.org> writes:

> Maintaining kernel-doc has been a challenge, as there aren't many
> perl developers among maintainers. Also, the logic there is too
> complex. Having lots of global variables and using pure functions
> doesn't help.
>
> Rewrite the script in Python, placing most global variables
> inside classes. This should help maintaining the script in long
> term.

[...]

> diff --git a/scripts/kernel-doc.py b/scripts/kernel-doc.py
> new file mode 100755
> index 000000000000..5cf5ed63f215
> --- /dev/null
> +++ b/scripts/kernel-doc.py
> @@ -0,0 +1,2757 @@
> +#!/usr/bin/env python3
> +# pylint: disable=R0902,R0903,R0904,R0911,R0912,R0913,R0914,R0915,R0917,R1702
> +# pylint: disable=C0302,C0103,C0301
> +# pylint: disable=C0116,C0115,W0511,W0613
> +# Copyright(c) 2025: Mauro Carvalho Chehab <mchehab@...nel.org>.
> +# SPDX-License-Identifier: GPL-2.0

The SPDX tag is supposed to be up top, right under the shebang

I also think you should give consideration to preserving the other
copyright notices in the Perl version.  A language translation doesn't
remove existing copyrights...who knows how much creativity went into
some of those regexes?

> +# TODO: implement warning filtering
> +
> +"""
> +kernel_doc
> +==========
> +
> +Print formatted kernel documentation to stdout
> +
> +Read C language source or header FILEs, extract embedded
> +documentation comments, and print formatted documentation
> +to standard output.
> +
> +The documentation comments are identified by the "/**"
> +opening comment mark.
> +
> +See Documentation/doc-guide/kernel-doc.rst for the
> +documentation comment syntax.
> +"""
> +
> +import argparse
> +import logging
> +import os
> +import re
> +import sys
> +
> +from datetime import datetime
> +from pprint import pformat
> +
> +from dateutil import tz
> +
> +# Local cache for regular expressions
> +re_cache = {}
> +
> +
> +class Re:

So I have to say this bugs me a bit ... the class is fine, but the
one-letter case-only difference from the standard "re" class is just
going to make the code harder for others to approach.  "kern_re" or
something like that?  Or even "kre" if you really want it to be as short
as possible.

> +    """
> +    Helper class to simplify regex declaration and usage,
> +
> +    It calls re.compile for a given pattern. It also allows adding
> +    regular expressions and define sub at class init time.
> +
> +    Regular expressions can be cached via an argument, helping to speedup
> +    searches.
> +    """

[...]

> +
> +class KernelDoc:
> +    # Parser states
> +    STATE_NORMAL        = 0        # normal code
> +    STATE_NAME          = 1        # looking for function name
> +    STATE_BODY_MAYBE    = 2        # body - or maybe more description
> +    STATE_BODY          = 3        # the body of the comment
> +    STATE_BODY_WITH_BLANK_LINE = 4 # the body which has a blank line
> +    STATE_PROTO         = 5        # scanning prototype
> +    STATE_DOCBLOCK      = 6        # documentation block
> +    STATE_INLINE        = 7        # gathering doc outside main block
> +
> +    st_name = [
> +        "NORMAL",
> +        "NAME",
> +        "BODY_MAYBE",
> +        "BODY",
> +        "BODY_WITH_BLANK_LINE",
> +        "PROTO",
> +        "DOCBLOCK",
> +        "INLINE",
> +    ]

So these ... kind of look like enums?

That's kind of it for nits ... I do have one wish that will kind of hard
to grant overall ... for the long-term maintenance of this code, it
would be really nice if every non-trivial regex were described by a
comment explaining what it is trying to do.  It's not reasonable to
expect that as a condition for accepting this rewrite, but it sure would
be a nice goal to be working toward.

Thanks,

jon