lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 29 Dec 2020 15:21:34 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Neeraj Upadhyay <neeraju@...eaurora.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        Josh Triplett <josh@...htriplett.org>, rcu@...r.kernel.org
Subject: Re: [PATCH 00/19] rcu/nocb: De-offload and re-offload support v4

On Wed, Dec 16, 2020 at 08:59:30AM -0800, Paul E. McKenney wrote:
> On Fri, Nov 13, 2020 at 01:13:15PM +0100, Frederic Weisbecker wrote:
> > 
> > Frederic Weisbecker (19):
> >       rcu/nocb: Turn enabled/offload states into a common flag
> >       rcu/nocb: Provide basic callback offloading state machine bits
> >       rcu/nocb: Always init segcblist on CPU up
> >       rcu/nocb: De-offloading CB kthread
> >       rcu/nocb: Don't deoffload an offline CPU with pending work
> >       rcu/nocb: De-offloading GP kthread
> >       rcu/nocb: Re-offload support
> >       rcu/nocb: Shutdown nocb timer on de-offloading
> >       rcu: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLY
> >       rcu/nocb: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloading
> >       rcu/nocb: Only cond_resched() from actual offloaded batch processing
> >       rcu/nocb: Process batch locally as long as offloading isn't complete
> >       rcu/nocb: Locally accelerate callbacks as long as offloading isn't complete
> >       tools/rcutorture: Support nocb toggle in TREE01
> 
> I applied the above, with the usual commit-log wordsmithing.
> 
> >       rcutorture: Remove weak nocb declarations
> >       rcutorture: Export nocb (de)offloading functions
> 
> These I folded into the rcutorture commit, as you suggested.

Good!

> 
> >       cpu/hotplug: Add lockdep_is_cpus_held()
> >       timer: Add timer_curr_running()
> 
> I applied these two.
> 
> >       rcu/nocb: Detect unsafe checks for offloaded rdp
> 
> This one didn't apply, probably due to recent changes in -rcu.  Could
> you please take a look?

Ok I'm going to rebase it.

> I believe that the following enhancements will be needed:
> 
> o	Forbid toggling of offlined CPUs.  This is a simple rule that
> 	results in deterministic success or failure.  The current setup
> 	(which I freely admit that I suggested) will fail very rarely,
> 	only if a newly offlined offloaded CPU is toggled before its
> 	remaining callbacks are invoked.  The current state is therefore
> 	an accident waiting to happen.
> 
> 	This will entail fixing a few comments as well.

Sure, I'm always glad to remove lines!

> 
> o	Drain the bypass early in the de-offloading process and prohibit
> 	queuing onto the bypass anywhere in the toggling process.
> 	Then the only CPUs permitted to use the bypass are those that
> 	are fully offloaded.  This approach allows rcu_core() and the
> 	rcuog kthreads to safely manipulate callbacks and grace periods
> 	concurrently.  (Famous last words!)  This might address the
> 	rcutorture writer stall warnings I am seeing.
> 
> 	This will entail fixing a few comments as well.

Ok, I'm checking that.

> 
> o	Avoid double write to rdp->nocb_cb_sleep in nocb_cb_wait().
> 	Unless there is some reason why this is absolutely required.
> 	It will cause confusion as it is.

Ok.

> 
> o	What bad thing happens if we de-offload the CPU corresponding to
> 	the nocb GP kthread?  It does work fine when that CPU is offlined.
> 
> 	Coincidence or not, all but one of the rcutorture writer stalls
> 	is followed by a message refusing to de-offload this CPU.

It probably started with a brainfart that I elaborated a bit too far. Let me check
that again.

> 
> o	The nocb_gp_update_state() function's return value seems backwards.
> 	What am I missing here?

Indeed I should rename it to nocb_gp_check_rdp() or something.

> 
> o	__rcu_nocb_rdp_deoffload(): Why move rcu_nocb_lock_irqsave()
> 	across a comment?

Because the comment only concerns the locking?

> 
> o	We need better debug.  For example, "Can't deoffload an rdp GP
> 	leader (yet)".	Well, which CPU?  For another example, we need
> 	deoffload state in rcutorture writer-stall dumps.

Ok.

> 
> I intend to do the following, and, if feasible, fold them into the
> original commits:
> 
> o	Make rcu_segcblist_is_offloaded() use "&&" instead of a
> 	nested "if" statement.
> 
> o	nocb_cb_wait() needs the "^^^ Ensure CB invocation follows
> 	_sleep test" to be adjusted so that it is properly attached to
> 	its smp_load_acquire().
> 
> o	nocb_cb_wait() -- alphabetical order for local variables, please!
> 	Yeah, I know, others like inverted tree for reasons that escape me
> 	completely.

Ah, ok noted.

Thanks!

> 							Thanx, Paul
> 
> >  include/linux/cpu.h                                |   1 +
> >  include/linux/rcu_segcblist.h                      | 119 +++++-
> >  include/linux/rcupdate.h                           |   4 +
> >  include/linux/timer.h                              |   2 +
> >  kernel/cpu.c                                       |   7 +
> >  kernel/rcu/rcu_segcblist.c                         |  13 +-
> >  kernel/rcu/rcu_segcblist.h                         |  45 ++-
> >  kernel/rcu/rcutorture.c                            |   3 -
> >  kernel/rcu/tree.c                                  |  49 ++-
> >  kernel/rcu/tree.h                                  |   2 +
> >  kernel/rcu/tree_plugin.h                           | 416 +++++++++++++++++++--
> >  kernel/time/timer.c                                |  13 +
> >  .../selftests/rcutorture/configs/rcu/TREE01.boot   |   4 +-
> >  13 files changed, 614 insertions(+), 64 deletions(-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ