linux-kernel - Re: [PATCH 10/12] docs: kdoc: further rewrite_struct

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250806110538.35bcc127@foz.lan>
Date: Wed, 6 Aug 2025 11:05:38 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jonathan Corbet <corbet@....net>
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, Akira Yokosawa
 <akiyks@...il.com>
Subject: Re: [PATCH 10/12] docs: kdoc: further rewrite_struct_members()
 cleanup

Em Tue, 05 Aug 2025 16:46:10 -0600
Jonathan Corbet <corbet@....net> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> writes:
> 
> > Perhaps one alternative would do something like:
> >
> > 	tuples = struct_members.findall(members)
> >         if not tuples:
> >             break
> >
> > 	maintype, -, -, content, -, s_ids = tuples
> >
> > (assuming that we don't need t[1], t[2] and t[4] here)
> >
> > Btw, on this specific case, better to use non-capture group matches
> > to avoid those "empty" spaces, e.g. (if I got it right):  
> 
> The problem is this line here:
> 
>                 oldmember = "".join(t) # Reconstruct the original formatting
> 
> The regex *has* to capture the entire match string so that it can be
> reconstructed back to its original form, which we need to edit the full
> list of members later on.
> 
> This code could use a deep rethink, but it works for now :)

well, we can still do:

	for t in tuples:
	    maintype, -, -, content, -, s_ids = t
	    oldmember = "".join(t)

this way, we'll be naming the relevant parameters and reconstructing
the the original form.

IMO, this is a lot better than using t[0], t[3], t[5] at the code,
as the names makes it clear what each one actually captured.

-

Btw, while re.findall() has an API that doesn't return match
objects which is incoherent with the normal re API, while looking
at the specs today(*), there is an alternative: re.finditer(). 
We could add it to KernRE cass and use it on a way that it will use
a Match instance. Something like:

	# Original regex expression
	res = Re.finditer(...)

	# Not much difference here. Probably not worh using it
	for match in res:
	    oldmember = "".join(match.groups())
            maintype, -, -, content, -, s_ids = match.groups()

Or alternatively:

	res = Re.finditer(...)

	# Not much difference here. Probably not worth using it
	for match in res:
	    oldmember = "".join(match.groups())

		# replace at the code below:
		#	maintype -> match.group('maintype')
		#	content -> match.group('content')
		#	s_ids -> match.group('s_ids')

No idea about performance differences between findall and finditer.
(*) https://docs.python.org/3/library/re.html

btw, one possibility that would avoid having tuples and t is
to do something like:

	struct_members = KernRe("(" +			# group 1: the entire pattern
				type_pattern +	        # Capture main type
				r'([^\{\};]+)' +
				r'(?:\{)' +
				r'(?:[^\{\}]*)' +	# Capture content
				r'(?:\})' +
				r'([^\{\};]*)(;)')	# Capture IDs
				")")

	match = struct_members.finditer(line)
	for match in res:
	    oldmember, maintype, content, s_ids = match.groups()

(disclaimer notice: none of the above was tested)
	

Thanks,
Mauro