linux-kernel - Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y9Qt4BT2WXK2dToL@do-x1extreme>
Date:   Fri, 27 Jan 2023 14:02:40 -0600
From:   Seth Forshee <sforshee@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Josh Poimboeuf <jpoimboe@...nel.org>,
        Petr Mladek <pmladek@...e.com>,
        Jason Wang <jasowang@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jiri Kosina <jikos@...nel.org>,
        Miroslav Benes <mbenes@...e.cz>,
        Joe Lawrence <joe.lawrence@...hat.com>,
        virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
        netdev@...r.kernel.org, live-patching@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] vhost: improve livepatch switching for heavily
 loaded vhost worker kthreads

On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote:
> > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote:
> > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's
> > > > > timeout period due to busy vhost worker kthreads.
> > > > 
> > > > I have missed this detail. Miroslav told me that we have solved
> > > > something similar some time ago, see
> > > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/
> > > 
> > > Interesting thread. I had thought about something along the lines of the
> > > original patch, but there are some ideas in there that I hadn't
> > > considered.
> > 
> > Here's another idea, have we considered this?  Have livepatch set
> > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
> > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
> > set.
> > 
> > Not sure how scheduler folks would feel about that ;-)
> 
> So, let me try and page all that back in.... :-)
> 
> KLP needs to unwind the stack to see if any of the patched functions are
> active, if not, flip task to new set.
> 
> Unwinding the stack of a task can be done when:
> 
>  - task is inactive (stable reg and stack) -- provided it stays inactive
>    while unwinding etc..
> 
>  - task is current (guarantees stack doesn't dip below where we started
>    due to being busy on top etc..)
> 
> Can NOT be done from interrupt context, because can hit in the middle of
> setting up stack frames etc..
> 
> The issue at hand is that some tasks run for a long time without passing
> through an explicit check.
> 
> The thread above tried sticking something in cond_resched() which is a
> problem for PREEMPT=y since cond_resched() is a no-op.
> 
> Preempt notifiers were raised, and those would actually be nice, except
> you can only install a notifier on current and you need some memory
> allocated per task, which makes it less than ideal. Plus ...
> 
> ... putting something in finish_task_switch() wouldn't be the end of the
> world I suppose, but then you still need to force schedule the task --
> imagine it being the only runnable task on the CPU, there's nothing
> going to make it actually switch.
> 
> Which then leads me to suggest something daft like this.. does that
> help?

The changes below are working well for me. The context has a read lock
on tasklist_lock so it can't sleep, so I'm using stop_one_cpu_nowait()
with per-CPU state.

Based on Josh's comments it sounds like the klp_have_reliable_stack()
check probably isn't quite right, and we might want to add something
else for PREEMPT+!ORC. But I wanted to go ahead and see if this approach
seems reasonable to everyone.

Thanks,
Seth


diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index f1b25ec581e0..9f3898f02828 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -9,6 +9,7 @@
 
 #include <linux/cpu.h>
 #include <linux/stacktrace.h>
+#include <linux/stop_machine.h>
 #include "core.h"
 #include "patch.h"
 #include "transition.h"
@@ -334,9 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task)
 	return !ret;
 }
 
+static int __try_switch_kthread(void *arg)
+{
+	return klp_try_switch_task(arg) ? 0 : -EBUSY;
+}
+
+static DEFINE_PER_CPU(struct cpu_stop_work, klp_stop_work);
+
 /*
  * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
- * Kthreads with TIF_PATCH_PENDING set are woken up.
+ * Kthreads with TIF_PATCH_PENDING set are preempted or woken up.
  */
 static void klp_send_signals(void)
 {
@@ -357,11 +365,22 @@ static void klp_send_signals(void)
 		 * would be meaningless. It is not serious though.
 		 */
 		if (task->flags & PF_KTHREAD) {
-			/*
-			 * Wake up a kthread which sleeps interruptedly and
-			 * still has not been migrated.
-			 */
-			wake_up_state(task, TASK_INTERRUPTIBLE);
+			if (task_curr(task) && klp_have_reliable_stack()) {
+				/*
+				 * kthread is currently running on a CPU; try
+				 * to preempt it.
+				 */
+				stop_one_cpu_nowait(task_cpu(task),
+						    __try_switch_kthread,
+						    task,
+						    this_cpu_ptr(&klp_stop_work));
+			} else {
+				/*
+				 * Wake up a kthread which sleeps interruptedly
+				 * and still has not been migrated.
+				 */
+				wake_up_state(task, TASK_INTERRUPTIBLE);
+			}
 		} else {
 			/*
 			 * Send fake signal to all non-kthread tasks which are