lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220902115228.GV24324@pengutronix.de>
Date:   Fri, 2 Sep 2022 13:52:28 +0200
From:   Sascha Hauer <sha@...gutronix.de>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        kernel-team@...com, rostedt@...dmis.org,
        Matthew Wilcox <willy@...radead.org>,
        Zhouyi Zhou <zhouzhouyi@...il.com>, kernel@...gutronix.de
Subject: Re: [PATCH rcu 04/32] rcu-tasks: Drive synchronous grace periods
 from calling task

On Thu, Sep 01, 2022 at 10:33:04AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 01, 2022 at 10:27:25AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 01, 2022 at 12:36:25PM +0200, Sascha Hauer wrote:
> > > Hi Paul,
> > > 
> > > On Mon, Jun 20, 2022 at 03:53:43PM -0700, Paul E. McKenney wrote:
> > > > This commit causes synchronous grace periods to be driven from the task
> > > > invoking synchronize_rcu_*(), allowing these functions to be invoked from
> > > > the mid-boot dead zone extending from when the scheduler was initialized
> > > > to to point that the various RCU tasks grace-period kthreads are spawned.
> > > > This change will allow the self-tests to run in a consistent manner.
> > > > 
> > > > Reported-by: Matthew Wilcox <willy@...radead.org>
> > > > Reported-by: Zhouyi Zhou <zhouzhouyi@...il.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> > > 
> > > This commit (appeared in mainline as 4a8cc433b8bf) breaks booting my
> > > ARMv7 based i.MX6ul board when CONFIG_PROVE_RCU is enabled. Reverting
> > > this patch on v6.0-rc3 makes my board boot again. See below for a boot
> > > log. The last message is "Running RCU-tasks wait API self tests", after
> > > that the board hangs. Any idea what goes wrong here?
> > 
> > New one on me!
> > 
> > Is it possible to get a stack trace of the hang, perhaps via
> > one form or another of sysrq-T?  Such a stack trace would likely
> > include synchronize_rcu_tasks(), synchronize_rcu_tasks_rude(), or
> > synchronize_rcu_tasks_trace() followed by synchronize_rcu_tasks_generic(),
> > rcu_tasks_one_gp(), and one of rcu_tasks_wait_gp(),
> > rcu_tasks_rude_wait_gp(), or rcu_tasks_wait_gp().
> 
> If there is no chance of sysrq-T, kernel debuggers, kernel crash
> dumps, or any other source of the stack trace, please decorate the
> code path with printk() or similar and let me know where it goes.
> Under normal circumstances, this code path is not sensitive to performance
> perturbations of the printk() persuasion.

Some unrelated bug I was searching for made me turn on early console
output with the "earlycon" parameter. It turned out that when I remove
this parameter then my board boots fine.

I then realized that even with earlycon enabled my board boots fine
when I remove the call to

	pr_info("Running RCU-tasks wait API self tests\n");

Given that I am not sure how useful it is to add more printk. I did that
anyway like this:

> static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot)
> {
> 	int needgpcb;
> 
> 	printk("%s: mutex_lock... midboot: %d\n", __func__, midboot);
> 	mutex_lock(&rtp->tasks_gp_mutex);
> 	printk("%s: mutex_locked midboot: %d\n", __func__, midboot);
> 
> 	// If there were none, wait a bit and start over.
> 	if (unlikely(midboot)) {
> 		needgpcb = 0x2;
> 	} else {
> 		printk("%s: set_tasks_gp_state(RTGS_WAIT_CBS)...\n", __func__);
> 		set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
> 		printk("%s: rcuwait_wait_event...\n", __func__);
> 		rcuwait_wait_event(&rtp->cbs_wait,
> 				   (needgpcb = rcu_tasks_need_gpcb(rtp)),
> 				   TASK_IDLE);
>		printk("%s: rcuwait_wait_event done\n", __func__);
> 	}
>

What I see then is:

[    0.156362] synchronize_rcu_tasks_generic: rcu_tasks_one_gp....
[    0.162087] rcu_tasks_one_gp: mutex_lock... midboot: 1
[    0.167386] rcu_tasks_one_gp: mutex_lock... midboot: 0
[    0.172489] rcu_tasks_one_gp: mutex_locked midboot: 0
[    0.177535] rcu_tasks_one_gp: set_tasks_gp_state(RTGS_WAIT_CBS)...
[    0.183525] rcu_tasks_one_gp: rcuwait_wait_event...

Here the board hangs. After some time I get:

[  254.493010] random: crng init done

But that's it.

> 
> > At this point in the boot sequence, there is only one online CPU,
> > correct?

Yes, it's a single core system.

Sascha

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ