linux-kernel - Re: [PATCH 10/12] docs: kdoc: further rewrite_struct

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250804151511.73ffb949@foz.lan>
Date: Mon, 4 Aug 2025 15:15:11 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jonathan Corbet <corbet@....net>
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, Akira Yokosawa
 <akiyks@...il.com>
Subject: Re: [PATCH 10/12] docs: kdoc: further rewrite_struct_members()
 cleanup

Em Fri, 01 Aug 2025 16:52:33 -0600
Jonathan Corbet <corbet@....net> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> writes:
> 
> > Em Thu, 31 Jul 2025 18:13:24 -0600
> > Jonathan Corbet <corbet@....net> escreveu:
> >  
> >> Get rid of some single-use variables and redundant checks, and generally
> >> tighten up the code; no logical change.
> >> 
> >> Signed-off-by: Jonathan Corbet <corbet@....net>
> >> ---
> >>  scripts/lib/kdoc/kdoc_parser.py | 89 ++++++++++++++++-----------------
> >>  1 file changed, 42 insertions(+), 47 deletions(-)
> >> 
> >> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> >> index 20e0a2abe13b..2b7d7e646367 100644
> >> --- a/scripts/lib/kdoc/kdoc_parser.py
> >> +++ b/scripts/lib/kdoc/kdoc_parser.py
> >> @@ -673,73 +673,68 @@ class KernelDoc:
> >>          while tuples:
> >>              for t in tuples:
> >>                  newmember = ""
> >> -                maintype = t[0]
> >> -                _ids = t[5]s
> >> -                content = t[3]  
> >
> > The reason I opted for this particular approach...  
> >> -
> >> -                oldmember = "".join(t)
> >> -
> >> -                for s_id in s_ids.split(','):
> >> +                oldmember = "".join(t) # Reconstruct the original formatting
> >> +                #
> >> +                # Pass through each field name, normalizing the form and formatting.
> >> +                #
> >> +                for s_id in t[5].split(','):  
> >
> > ... is that it is easier to understand and to maintain:
> >
> > 	for s_id in s_ids.split(','):
> >
> > than when magic numbers like this are used:
> >
> > 	for s_id in t[5].split(','):  
> 
> Coming into this code, I had a different experience, and found the
> variables to just be a layer of indirection I had to pass through to get
> to the capture groups and see what was really going on.  That was part
> of why I put the group numbers in the comments next to that gnarly
> regex, to make that mapping more direct and easier to understand.
> 
> I will not insist on this change either - at least not indefinitely :)
> I do feel, though, that adding a step between the regex and its use just
> serves to obscure things.
> 
> (And yes, I don't really think that named groups make things better.
> I've found those useful in situations where multiple regexes are in use
> and the ordering of the groups may vary, but otherwise have generally
> avoided them).

I'd say that, when the magic number is within up to 3-lines hunk
distance - e.g. if all of them will appear at the same hunk, it is 
probably safe to use, but when it gets far away, it makes more harm 
than good.

Perhaps one alternative would do something like:

	tuples = struct_members.findall(members)
        if not tuples:
            break

	maintype, -, -, content, -, s_ids = tuples

(assuming that we don't need t[1], t[2] and t[4] here)

Btw, on this specific case, better to use non-capture group matches
to avoid those "empty" spaces, e.g. (if I got it right):

	# Curly Brackets are not captured
        struct_members = KernRe(type_pattern +	        # Capture main type
				r'([^\{\};]+)' +
				r'(?:\{)' +
				r'(?:[^\{\}]*)' +	# Capture content
				r'(?:\})' +
				r'([^\{\}\;]*)(\;)')	# Capture IDs
	...
	tuples = struct_members.findall(members)
        if not tuples:
            break

	maintype, content, s_ids = tuples

Btw, a cleanup like the above is, IMHO, another good reason why not using
magic numbers: people may end fixing match groups to use non-capture
matches, but end forgetting to fix some hidden magic numbers. It is hard
for a reviewer to see it if the affected magic numbers aren't within the
3 lines range of a default unified diff, and may introduce hidden bugs.

Thanks,
Mauro