linux-kernel - Re: [PATCH 6/7] docs: kdoc: tighten up the pointer-to-function case

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250813110214.4da8553c@foz.lan>
Date: Wed, 13 Aug 2025 11:02:14 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jonathan Corbet <corbet@....net>
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, Akira Yokosawa
 <akiyks@...il.com>
Subject: Re: [PATCH 6/7] docs: kdoc: tighten up the pointer-to-function case

On Tue, 12 Aug 2025 17:22:35 -0600
Jonathan Corbet <corbet@....net> wrote:

> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> writes:
> 
> > On Tue, 12 Aug 2025 13:57:47 -0600
> > Jonathan Corbet <corbet@....net> wrote:
> >  
> >> -                r = KernRe(r'[^\(]+\(\*?\s*([\w\[\].]*)\s*\)')
> >> +                r = KernRe(r'[^\(]+\(\*?\s*' r'([\w\[\].]*)' r'\s*\)')  
> >
> > Heh, it took me a couple of seconds to understand this concat, as I haven't
> > seem concat pattern like that before (maybe except for some old C book
> > I read a millennium ago that I barely remember).  So, IMO, it became harder
> > to understand this way. I would either remove the extra two ' r' from the
> > string or write it as:
> >
> >                r = KernRe(r'[^\(]+\(\*?\s*' 
> > 			  r'([\w\[\].]*)'
> > 			  r'\s*\)')  
> 
> By remove the " r" you mean glom the pieces back together into a single
> string?  These long regexes are hard to make sense of, I do think it
> helps to break them into logical chunks.

At least for me, it doesn't make much difference: I still need to
read the entire thing to understand what it does. The extra ' r' in
the middle of it, adds extra noise.

> Certainly I can split it across multiple lines if that helps.

It is OK to split on multiple lines. 

Btw, I tried to use re.X:
	https://docs.python.org/3/library/re.html#re.X

on one of the most complex regexes, which would be the standard way to 
document complex regular expression (Perl had some using x flag), but
it didn't work.

Still, I think that we should aim to have something similar to it,
e.g.:

                r = KernRe(r'[^\(]+\(\*?\s*'  # some explanation
 			   r'([\w\[\].]*)'    # some explanation
 			   r'\s*\)')          # some explanation  

for the most complex ones.

---

Side note: reading just this out of the context - as one would do
when reviewing such patch makes is hard for one to tell exactly
what c prototype patterns the above is trying to catch.

Adding some explanation for the pattern may help, but, IMO, we should 
go one step further and create something like the enclosed unittest
logic (placed at tools/docs or tools/docs_unittest).

Before merging kernel-doc patches (and/or inside a CI job), run the
unit test to check if this won't break anything.

I suspect that creating something like this won't be that hard,
as we can modify kernel-doc command line to generate a list of
all cases it currently handles, storing the original kernel-doc
macro. Then, ask some LLM to clean it up, dropping cases that
have the same pattern.


Thanks,
Mauro

---

The code below is just a prototype. On its current version, the
tests run, but them all fail. My main goal here is just to give you
a more precise example of the kind of tool I'm thinking. If you
think it is worth, I can work on it later on.

Btw, I have somewhere on my internal scripts a replacement method for 
unittest.main() that provides a better summary when verbose is disabled
and have a command line argument to enable verbose. If we're willing
to do that, I'll add it to the final patchset.


#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0
# Copyright(c) 2025: Mauro Carvalho Chehab <mchehab@...nel.org>.
# Generated with the help of DeepSeek / Deepthink (R1)

import argparse
import logging
import os
import tempfile
import unittest
import sys

from textwrap import dedent

# Import KernelDoc parser class
SRC_DIR = os.path.dirname(os.path.realpath(__file__))
LIB_DIR = "../../scripts/lib/kdoc"

sys.path.insert(0, os.path.join(SRC_DIR, LIB_DIR))

from kdoc_parser import KernelDoc


TEST_CASES = [
    # 1. Basic function parsing
    {
        "input": dedent("""
            /**
             * simple_function - Basic function
             * @param: Description
             */
            void simple_function(int param)
        """),
        "expected": {
            "identifier": "simple_function",
            "return_type": "void",
            "func_macro": False
        }
    },

    # 2. Static prefix stripping (pattern: r"^static +")
    {
        "input": dedent("""
            /**
             * static_fn - Static function
             * @arg: String argument
             */
            static int static_fn(char *arg)
        """),
        "expected": {
            "identifier": "static_fn",
            "return_type": "int",
            "func_macro": False
        }
    },

    # 3. Function macro (#define)
    {
        "input": dedent("""
            /**
             * macro_func() - Macro function
             * @a: First param
             */
            #define macro_func(a) do_something(a)
        """),
        "expected": {
            "identifier": "macro_func",
            "return_type": "",
            "func_macro": True
        }
    },

    # 4. Complex return type
    {
        "input": dedent("""
            /**
             * complex_return - Long return type
             * Returns: Pointer to foo structure
             */
            struct foo *complex_return(void)
        """),
        "expected": {
            "identifier": "complex_return",
            "return_type": "struct foo *",
            "func_macro": False
        }
    },

    # 5. __attribute_const__ stripping (pattern: r"__attribute_const__ +")
    {
        "input": dedent("""
            /**
             * const_attr - Function with const attribute
             */
            __attribute_const__ int const_attr(void)
        """),
        "expected": {
            "identifier": "const_attr",
            "return_type": "int",
            "func_macro": False
        }
    },

    # 6. __printf attribute stripping (pattern: r"__printf\s*\(\s*\d*\s*,\s*\d*\s*\) +")
    {
        "input": dedent("""
            /**
             * printf_fn - Printf-style function
             * @fmt: Format string
             * @...: Variable arguments
             */
            __printf(2, 3) int printf_fn(const char *fmt, ...)
        """),
        "expected": {
            "identifier": "printf_fn",
            "return_type": "int",
            "func_macro": False
        }
    },

    # 7. Multiple prefixes (static + inline)
    {
        "input": dedent("""
            /**
             * multi_prefix - Multiple prefixes
             */
            static __always_inline int multi_prefix(int a)
        """),
        "expected": {
            "identifier": "multi_prefix",
            "return_type": "int",
            "func_macro": False
        }
    },

    # 8. DECL_BUCKET_PARAMS handling
    {
        "input": dedent("""
            /**
             * bucket_fn - Bucket params function
             */
            DECL_BUCKET_PARAMS(a, b) void bucket_fn(void)
        """),
        "expected": {
            "identifier": "bucket_fn",
            "return_type": "void",
            "func_macro": False
        }
    },

    # 9. Function pointer parameter
    {
        "input": dedent("""
            /**
             * fp_param - Function with callback
             * @cb: Callback function
             */
            void fp_param(int (*cb)(int))
        """),
        "expected": {
            "identifier": "fp_param",
            "return_type": "void",
            "func_macro": False
        }
    },

    # 10. Malformed prototype (error case)
    {
        "input": dedent("""
            /**
             * bad_func - Bad prototype
             */
            int bad_func(missing_paren int a
        """),
        "expected": {
            "expected_error": "cannot understand function prototype: 'int bad_func(missing_paren int a'"
        }
    },

    # 11. Name mismatch (error case)
    {
        "input": dedent("""
            /**
             * expected_name - Wrong prototype
             */
            void actual_name(void)
        """),
        "expected": {
            "expected_error": "expecting prototype for expected_name(). Prototype was for actual_name() instead"
        }
    },

    # 12. No kernel-doc comment (edge case)
    {
        "input": "void undocumented_func(void)",
        "expected": {
            "expected_error": "missing kernel-doc comment"
        }
    },

    # Add a lot more here
]

class TestKernelDocParsing(unittest.TestCase):
    """Dynamically generated test cases for kernel-doc parsing"""

    def setUp(self):
        self.config = argparse.Namespace()
        self.config.verbose = False
        self.config.werror = False
        self.config.wreturn = False
        self.config.wshort_desc = False
        self.config.wcontents_before_sections = False
        self.config.log = logging.getLogger("kernel-doc-test")
        self.config.src_tree = None

def make_test_function(input_str, expected):
    """Factory function to create kernel-doc test cases with file handling"""
    def test(self):
        # Create temp file with test content
        with tempfile.NamedTemporaryFile(mode='w+', delete=False) as tmp:
            tmp.write(input_str)
            tmp_path = tmp.name

        try:
            # Initialize parser with the temp file
            doc = KernelDoc(self.config, tmp_path)

            # FIXME: this part still need adjustments
            export_table, entries = doc.parse_kdoc()

            # Verify parsing results
            if "expected_error" in expected:
                self.fail(f"Expected error but parsing succeeded: {input_str}")

            # Find our test entry (assuming single entry for simplicity)
            test_entry = next((e for e in entries if e.identifier == expected["identifier"]), None)
            self.assertIsNotNone(test_entry, f"Expected entry not found for: {input_str}")

            # Validate results
            self.assertEqual(test_entry.return_type, expected["return_type"],
                            f"Return type mismatch for: {input_str}")
            self.assertEqual(getattr(test_entry, 'func_macro', False), expected["func_macro"],
                            f"Function macro flag mismatch for: {input_str}")

        except Exception as e:
            if "expected_error" not in expected:
                raise
            self.assertIn(expected["expected_error"], str(e),
                         f"Expected error not found in: {str(e)}")
        finally:
            # Clean up temp file
            if os.path.exists(tmp_path):
                os.unlink(tmp_path)

    return test

# Dynamically add test methods to the TestCase class
# This has to be outside __main__
for i, test_case in enumerate(TEST_CASES):
    test_name = f"test_case_{i+1:02d}"
    if "identifier" in test_case["expected"]:
        test_name += f"_{test_case['expected']['identifier']}"
    elif "expected_error" in test_case["expected"]:
        test_name += "_error_case"

    test_func = make_test_function(test_case["input"], test_case["expected"])
    setattr(TestKernelDocParsing, test_name, test_func)

if __name__ == "__main__":
    unittest.main()