[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMxIMADtzYrJg6Pb@google.com>
Date: Thu, 18 Sep 2025 10:58:08 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Paolo Bonzini <pbonzini@...hat.com>,
Jason Wang <jasowang@...hat.com>, kvm@...r.kernel.org, virtualization@...ts.linux.dev,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/3] vhost_task: Fix a bug where KVM wakes an exited task
On Thu, Sep 18, 2025, Michael S. Tsirkin wrote:
> On Thu, Sep 18, 2025 at 09:52:19AM -0700, Sean Christopherson wrote:
> > On Thu, Sep 18, 2025, Michael S. Tsirkin wrote:
> > > On Thu, Sep 18, 2025 at 09:04:07AM -0700, Sean Christopherson wrote:
> > > > On Thu, Sep 18, 2025, Sebastian Andrzej Siewior wrote:
> > > > > On 2025-09-18 11:09:05 [-0400], Michael S. Tsirkin wrote:
> > > > > > So how about switching to this approach then?
> > > > > > Instead of piling up fixes like we seem to do now ...
> > > >
> > > > I don't have a strong preference for 6.17, beyond landing a fix of some kind.
> > > > I think there are three options for 6.17, in order of "least like to break
> > > > something":
> > > >
> > > > 1. Sebastian's get_task_struct() fix
> > >
> > >
> > > I am just a bit apprehensive that we don't create a situation
> > > where we leak the task struct somehow, given the limited
> > > testing time. Can you help me get convinced that risk is 0?
> >
> > I doubt it, I share same similar concerns about lack of testing. So I guess
> > thinking about this again, #2 is probably safer since it'd only impact KVM?
>
> I can't say I understand completely how we get that state though?
> Why did the warning trigger if it's not a UAF?
It's purely a flaw in the sanity check itself due to the ordering in vhost_task_fn().
As is, vhost_task_fn() marks the task KILLED before invoking ->handle_sigkill(),
i.e. before vhost_worker_killed() is guaranteed to complete, and thus before
worker->killed is set. As a result, vhost can keep waking workers that have
KILLED set, but haven't actually exited. That's perfectly fine as UAF won't
occur until do_exit() is called, and that won't happen until ->handle_sigkill()
completes.
> > > > 2. This series, without the KILLED sanity check in __vhost_task_wake()
> > > > 3. This series, with my fixup (with which syzbot was happy)
>
Powered by blists - more mailing lists