lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 27 Jan 2023 08:37:19 -0600
From:   Seth Forshee <sforshee@...nel.org>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Jason Wang <jasowang@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jiri Kosina <jikos@...nel.org>,
        Miroslav Benes <mbenes@...e.cz>,
        Joe Lawrence <joe.lawrence@...hat.com>,
        virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
        netdev@...r.kernel.org, live-patching@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] vhost: improve livepatch switching for heavily
 loaded vhost worker kthreads

On Fri, Jan 27, 2023 at 01:09:02PM +0100, Petr Mladek wrote:
> There might actually be two possibilities why the transition fails
> too often:
> 
> 1. The task might be in the running state most of the time. Therefore
>    the backtrace is not reliable most of the time.
> 
>    In this case, some cooperation with the scheduler would really
>    help. We would need to stop the task and check the stack
>    when it is stopped. Something like the patch you proposed.

This is the situation we are encountering.

> 2. The task might be sleeping but almost always in a livepatched
>    function. Therefore it could not be transitioned.
> 
>    It might be the case with vhost_worker(). The main loop is "tiny".
>    The kthread probaly spends most of the time with processing
>    a vhost_work. And if the "works" are livepatched...
> 
>    In this case, it would help to call klp_try_switch_task(current)
>    in the main loop in vhost_worker(). It would always succeed
>    when vhost_worker() is not livepatched on its own.
> 
>    Note that even this would not help with kPatch when a single
>    vhost_work might need more than the 1 minute timout to get proceed.
> 
> > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > index f1b25ec581e0..06746095a724 100644
> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -9,6 +9,7 @@
> >  
> >  #include <linux/cpu.h>
> >  #include <linux/stacktrace.h>
> > +#include <linux/stop_machine.h>
> >  #include "core.h"
> >  #include "patch.h"
> >  #include "transition.h"
> > @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task)
> >  	return !ret;
> >  }
> >  
> > +static int __stop_try_switch(void *arg)
> > +{
> > +	return klp_try_switch_task(arg) ? 0 : -EBUSY;
> > +}
> > +
> > +static bool klp_try_switch_task_harder(struct task_struct *task)
> > +{
> > +	return !stop_one_cpu(task_cpu(task), __stop_try_switch, task);
> > +}
> > +
> >  /*
> >   * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
> >   * Kthreads with TIF_PATCH_PENDING set are woken up.
> 
> Nice. I am surprised that it can be implemented so easily.

Yes, that's a neat solution. I will give it a try.

AIUI this still doesn't help for architectures without a reliable
stacktrace though, right? So we probably should only try this for
architectures which do have relaible stacktraces.

Thanks,
Seth

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ