[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250806110538.35bcc127@foz.lan>
Date: Wed, 6 Aug 2025 11:05:38 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jonathan Corbet <corbet@....net>
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, Akira Yokosawa
<akiyks@...il.com>
Subject: Re: [PATCH 10/12] docs: kdoc: further rewrite_struct_members()
cleanup
Em Tue, 05 Aug 2025 16:46:10 -0600
Jonathan Corbet <corbet@....net> escreveu:
> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> writes:
>
> > Perhaps one alternative would do something like:
> >
> > tuples = struct_members.findall(members)
> > if not tuples:
> > break
> >
> > maintype, -, -, content, -, s_ids = tuples
> >
> > (assuming that we don't need t[1], t[2] and t[4] here)
> >
> > Btw, on this specific case, better to use non-capture group matches
> > to avoid those "empty" spaces, e.g. (if I got it right):
>
> The problem is this line here:
>
> oldmember = "".join(t) # Reconstruct the original formatting
>
> The regex *has* to capture the entire match string so that it can be
> reconstructed back to its original form, which we need to edit the full
> list of members later on.
>
> This code could use a deep rethink, but it works for now :)
well, we can still do:
for t in tuples:
maintype, -, -, content, -, s_ids = t
oldmember = "".join(t)
this way, we'll be naming the relevant parameters and reconstructing
the the original form.
IMO, this is a lot better than using t[0], t[3], t[5] at the code,
as the names makes it clear what each one actually captured.
-
Btw, while re.findall() has an API that doesn't return match
objects which is incoherent with the normal re API, while looking
at the specs today(*), there is an alternative: re.finditer().
We could add it to KernRE cass and use it on a way that it will use
a Match instance. Something like:
# Original regex expression
res = Re.finditer(...)
# Not much difference here. Probably not worh using it
for match in res:
oldmember = "".join(match.groups())
maintype, -, -, content, -, s_ids = match.groups()
Or alternatively:
res = Re.finditer(...)
# Not much difference here. Probably not worth using it
for match in res:
oldmember = "".join(match.groups())
# replace at the code below:
# maintype -> match.group('maintype')
# content -> match.group('content')
# s_ids -> match.group('s_ids')
No idea about performance differences between findall and finditer.
(*) https://docs.python.org/3/library/re.html
btw, one possibility that would avoid having tuples and t is
to do something like:
struct_members = KernRe("(" + # group 1: the entire pattern
type_pattern + # Capture main type
r'([^\{\};]+)' +
r'(?:\{)' +
r'(?:[^\{\}]*)' + # Capture content
r'(?:\})' +
r'([^\{\};]*)(;)') # Capture IDs
")")
match = struct_members.finditer(line)
for match in res:
oldmember, maintype, content, s_ids = match.groups()
(disclaimer notice: none of the above was tested)
Thanks,
Mauro
Powered by blists - more mailing lists