lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150930050833.GA4412@linux.vnet.ibm.com>
Date:	Tue, 29 Sep 2015 22:08:33 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Petr Mladek <pmladek@...e.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>, Tejun Heo <tj@...nel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Josh Triplett <josh@...htriplett.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jiri Kosina <jkosina@...e.cz>, Borislav Petkov <bp@...e.de>,
	Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org,
	Vlastimil Babka <vbabka@...e.cz>,
	live-patching@...r.kernel.org, linux-api@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC v2 00/18] kthread: Use kthread worker API more widely

On Mon, Sep 21, 2015 at 03:03:41PM +0200, Petr Mladek wrote:
> My intention is to make it easier to manipulate kthreads. This RFC tries
> to use the kthread worker API. It is based on comments from the
> first attempt. See https://lkml.org/lkml/2015/7/28/648 and
> the list of changes below.
> 
> 1st..8th patches: improve the existing kthread worker API
> 
> 9th, 12th, 17th patches: convert three kthreads into the new API,
>      namely: khugepaged, ring buffer benchmark, RCU gp kthreads[*]
> 
> 10th, 11th patches: fix potential problems in the ring buffer
>       benchmark; also sent separately
> 
> 13th patch: small fix for RCU kthread; also sent separately;
>      being tested by Paul
> 
> 14th..16th patches: preparation steps for the RCU threads
>      conversion; they are needed _only_ if we split GP start
>      and QS handling into separate works[*]
> 
> 18th patch: does a possible improvement of the kthread worker API;
>      it adds an extra parameter to the create*() functions, so I
>      rather put it into this draft
>      
> 
> [*] IMPORTANT: I tried to split RCU GP start and GS state handling
>     into separate works this time. But there is a problem with
>     a race in rcu_gp_kthread_worker_poke(). It might queue
>     the wrong work. It can be detected and fixed by the work
>     itself but it is a bit ugly. Alternative solution is to
>     do both operations in one work. But then we sleep too much
>     in the work which is ugly as well. Any idea is appreciated.

I think that the kernel is trying really hard to tell you that splitting
up the RCU grace-period kthreads in this manner is not such a good idea.

So what are we really trying to accomplish here?  I am guessing something
like the following:

1.	Get each grace-period kthread to a known safe state within a
	short time of having requested a safe state.  If I recall
	correctly, the point of this is to allow no-downtime kernel
	patches to the functions executed by the grace-period kthreads.

2.	At the same time, if someone suddenly needs a grace period
	at some point in this process, the grace period kthreads are
	going to have to wake back up and handle the grace period.
	Or do you have some tricky way to guarantee that no one is
	going to need a grace period beyond the time you freeze
	the grace-period kthreads?

3.	The boost kthreads should not be a big problem because failing
	to boost simply lets the grace period run longer.

4.	The callback-offload kthreads are likely to be a big problem,
	because in systems configured with them, they need to be running
	to invoke the callbacks, and if the callbacks are not invoked,
	the grace period might just as well have failed to end.

5.	The per-CPU kthreads are in the same boat as the callback-offload
	kthreads.  One approach is to offline all the CPUs but one, and
	that will park all but the last per-CPU kthread.  But handling
	that last per-CPU kthread would likely be "good clean fun"...

6.	Other requirements?

One approach would be to simply say that the top-level rcu_gp_kthread()
function cannot be patched, and arrange for the grace-period kthreads
to park at some point within this function.  Or is there some requirement
that I am missing?

							Thanx, Paul

> Changes against v1:
> 
> + remove wrappers to manipulate the scheduling policy and priority
> 
> + remove questionable wakeup_and_destroy_kthread_worker() variant
> 
> + do not check for chained work when draining the queue
> 
> + allocate struct kthread worker in create_kthread_work() and
>   use more simple checks for running worker
> 
> + add support for delayed kthread works and use them instead
>   of waiting inside the works
> 
> + rework the "unrelated" fixes for the ring buffer benchmark
>   as discussed in the 1st RFC; also sent separately
> 
> + convert also the consumer in the ring buffer benchmark
> 
> 
> I have tested this patch set against the stable Linus tree
> for 4.3-rc2.
> 
> Petr Mladek (18):
>   kthread: Allow to call __kthread_create_on_node() with va_list args
>   kthread: Add create_kthread_worker*()
>   kthread: Add drain_kthread_worker()
>   kthread: Add destroy_kthread_worker()
>   kthread: Add pending flag to kthread work
>   kthread: Initial support for delayed kthread work
>   kthread: Allow to cancel kthread work
>   kthread: Allow to modify delayed kthread work
>   mm/huge_page: Convert khugepaged() into kthread worker API
>   ring_buffer: Do no not complete benchmark reader too early
>   ring_buffer: Fix more races when terminating the producer in the
>     benchmark
>   ring_buffer: Convert benchmark kthreads into kthread worker API
>   rcu: Finish folding ->fqs_state into ->gp_state
>   rcu: Store first_gp_fqs into struct rcu_state
>   rcu: Clean up timeouts for forcing the quiescent state
>   rcu: Check actual RCU_GP_FLAG_FQS when handling quiescent state
>   rcu: Convert RCU gp kthreads into kthread worker API
>   kthread: Better support freezable kthread workers
> 
>  include/linux/kthread.h              |  67 +++++
>  kernel/kthread.c                     | 544 ++++++++++++++++++++++++++++++++---
>  kernel/rcu/tree.c                    | 407 ++++++++++++++++----------
>  kernel/rcu/tree.h                    |  24 +-
>  kernel/rcu/tree_plugin.h             |  16 +-
>  kernel/rcu/tree_trace.c              |   2 +-
>  kernel/trace/ring_buffer_benchmark.c | 194 ++++++-------
>  mm/huge_memory.c                     | 116 ++++----
>  8 files changed, 1017 insertions(+), 353 deletions(-)
> 
> -- 
> 1.8.5.6
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ