linux-kernel - RE: [PATCH] rcu-tasks: Check the atomic variable trc_n_readers_need

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <PH0PR11MB5880D6B11469266144C58032DA1F9@PH0PR11MB5880.namprd11.prod.outlook.com>
Date:   Wed, 30 Mar 2022 22:30:11 +0000
From:   "Zhang, Qiang1" <qiang1.zhang@...el.com>
To:     "paulmck@...nel.org" <paulmck@...nel.org>
CC:     "frederic@...nel.org" <frederic@...nel.org>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] rcu-tasks: Check the atomic variable
 trc_n_readers_need_end again when wait timeout


On Wed, Mar 30, 2022 at 07:20:14PM +0800, Zqiang wrote:
> When the trc_wait waitqueue timeout, the atomic variable 
> trc_n_readers_need_end need to be checked again, perhaps the 
> conditions have been established at this time, avoid invalid stall 
> information output.
>
>But are you actually seeing this invalid stall information?  Either way, please seem my comments and question below.
>
>Please don't get me wrong, we do have similar checks for normal vanilla RCU stall warnings, for example, this statement in print_other_cpu_stall() in kernel/rcu/tree_stall.h:
>
>	pr_err("INFO: Stall ended before state dump start\n");
>
>However, the approach used there did benefit from significant real-world experience with its absence.  ;-)
>
>							Thanx, Paul
>
> Signed-off-by: Zqiang <qiang1.zhang@...el.com>
> ---
>  kernel/rcu/tasks.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 
> 65d6e21a607a..b73a2b362d6b 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -1544,7 +1544,7 @@ static void rcu_tasks_trace_postgp(struct rcu_tasks *rtp)
>  				trc_wait,
>  				atomic_read(&trc_n_readers_need_end) == 0,
>  				READ_ONCE(rcu_task_stall_timeout));

>If I understand correctly, this patch is intended to handle a situation where the wait_event_idle_exclusive_timeout() timed out, but where the value of trc_n_readers_need_end decreased to zero just at this point.

Yes,  this patch is intended to handle a situation.

>If so, please see my question below.  If not, please show me the exact sequence of events leading up to the problem.

> -		if (ret)
> +		if (ret || !atomic_read(&trc_n_readers_need_end))
>  			break;  // Count reached zero.

>Couldn't the value of trc_n_readers_need_end decrease to zero right here, still resulting in invalid stall information?

The value of trc_n_readers_need_end decrease to zero right here, indicates that the current grace period has been completed.
Even if we go to print some task with condition 't->trc_reader_special.b.need_qs' as true,  and these task  will not affect the current grace period. Is my understanding correct?

Thanks
Zqiang

>  		// Stall warning time, so make a list of the offenders.
>  		rcu_read_lock();
> --
> 2.25.1
>