linux-kernel - Re: [RFC] scripts: kernel-doc: improve parsing for kernel-doc comments syntax

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87y2djgrsk.fsf@meer.lwn.net>
Date:   Thu, 15 Apr 2021 15:29:47 -0600
From:   Jonathan Corbet <corbet@....net>
To:     Aditya Srivastava <yashsri421@...il.com>
Cc:     yashsri421@...il.com, lukas.bulwahn@...il.com,
        linux-kernel-mentees@...ts.linuxfoundation.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] scripts: kernel-doc: improve parsing for kernel-doc
 comments syntax

Aditya Srivastava <yashsri421@...il.com> writes:

> Currently kernel-doc does not identify some cases of probable kernel
> doc comments, for e.g. pointer used as declaration type for identifier,
> space separated identifier, etc.
>
> Some example of these cases in files can be:
> i)" *  journal_t * jbd2_journal_init_dev() - creates and initialises a journal structure"
> in fs/jbd2/journal.c
>
> ii) "*      dget, dget_dlock -      get a reference to a dentry" in
> include/linux/dcache.h
>
> iii) "  * DEFINE_SEQLOCK(sl) - Define a statically allocated seqlock_t"
> in include/linux/seqlock.h
>
> Also improve identification for non-kerneldoc comments. For e.g.,
>
> i) " *	The following functions allow us to read data using a swap map"
> in kernel/power/swap.c does follow the kernel-doc like syntax, but the
> content inside does not adheres to the expected format.
>
> Improve parsing by adding support for these probable attempts to write
> kernel-doc comment.
>
> Suggested-by: Jonathan Corbet <corbet@....net>
> Link: https://lore.kernel.org/lkml/87mtujktl2.fsf@meer.lwn.net
> Signed-off-by: Aditya Srivastava <yashsri421@...il.com>
> ---
>  scripts/kernel-doc | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)

OK, I've applied this, but I have a couple of comments...

> diff --git a/scripts/kernel-doc b/scripts/kernel-doc
> index 888913528185..37665aa41e6b 100755
> --- a/scripts/kernel-doc
> +++ b/scripts/kernel-doc
> @@ -2110,17 +2110,25 @@ sub process_name($$) {
>      } elsif (/$doc_decl/o) {
>  	$identifier = $1;
>  	my $is_kernel_comment = 0;
> -	if (/^\s*\*\s*([\w\s]+?)(\(\))?\s*([-:].*)?$/) {
> +	my $decl_start = qr{\s*\*};

I appreciate the attempt to make the regexes a bit more comprehensible,
but we can do better yet, methinks.  This $decl_start is very much like
$doc_com defined globally.

It would really help a lot if we could at least take the incredible mass
of regexes in this program and boil them down to a smaller, unique set
that is used throughout.  kernel-doc might still make brains explode,
but perhaps the blast radius would be a bit smaller.

> +	my $fn_type = qr{\w+\s*\*\s*}; # i.e. pointer declaration type, foo * bar() - desc

Some of the lines in this change go waaaaay beyond the 80-character
limit; please try not to do that.  I fixed up the offending comments
this time around.

Thanks,

jon