lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACGkMEtQPWmV297_4oak2KxGhXYgef-eevB3KsC7RDy8mSMbNA@mail.gmail.com>
Date:   Tue, 21 Jun 2022 14:25:13 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     netdev <netdev@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        davem <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
        erwan.yvin@...ricsson.com
Subject: Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister

On Tue, Jun 21, 2022 at 2:00 PM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Tue, Jun 21, 2022 at 11:09:45AM +0800, Jason Wang wrote:
> > On Mon, Jun 20, 2022 at 6:18 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > >
> > > On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > > > >
> > > > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > > > > > We use to do the following steps during .remove():
> > > > >
> > > > > We currently do
> > > > >
> > > > >
> > > > > > static void cfv_remove(struct virtio_device *vdev)
> > > > > > {
> > > > > >       struct cfv_info *cfv = vdev->priv;
> > > > > >
> > > > > >       rtnl_lock();
> > > > > >       dev_close(cfv->ndev);
> > > > > >       rtnl_unlock();
> > > > > >
> > > > > >       tasklet_kill(&cfv->tx_release_tasklet);
> > > > > >       debugfs_remove_recursive(cfv->debugfs);
> > > > > >
> > > > > >       vringh_kiov_cleanup(&cfv->ctx.riov);
> > > > > >       virtio_reset_device(vdev);
> > > > > >       vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > > >       cfv->vr_rx = NULL;
> > > > > >       vdev->config->del_vqs(cfv->vdev);
> > > > > >       unregister_netdev(cfv->ndev);
> > > > > > }
> > > > > > This is racy since device could be re-opened after dev_close() but
> > > > > > before unregister_netdevice():
> > > > > >
> > > > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> > > > > >    is called after the vringh_kiov_cleanup() will result a UAF
> > > > > > 2) Network stack can still try to use TX virtqueue even if it has been
> > > > > >    deleted after dev_vqs()
> > > > > >
> > > > > > Fixing this by unregistering the network device first to make sure not
> > > > > > device access from both TX and RX side.
> > > > > >
> > > > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > > > > > Signed-off-by: Jason Wang <jasowang@...hat.com>
> > > > > > ---
> > > > > >  drivers/net/caif/caif_virtio.c | 6 ++----
> > > > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > > > > > index 66375bea2fcd..a29f9b2df5b1 100644
> > > > > > --- a/drivers/net/caif/caif_virtio.c
> > > > > > +++ b/drivers/net/caif/caif_virtio.c
> > > > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > > >  {
> > > > > >       struct cfv_info *cfv = vdev->priv;
> > > > > >
> > > > > > -     rtnl_lock();
> > > > > > -     dev_close(cfv->ndev);
> > > > > > -     rtnl_unlock();
> > > > > > +     /* Make sure NAPI/TX won't try to access the device */
> > > > > > +     unregister_netdev(cfv->ndev);
> > > > > >
> > > > > >       tasklet_kill(&cfv->tx_release_tasklet);
> > > > > >       debugfs_remove_recursive(cfv->debugfs);
> > > > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > > >       vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > > >       cfv->vr_rx = NULL;
> > > > > >       vdev->config->del_vqs(cfv->vdev);
> > > > > > -     unregister_netdev(cfv->ndev);
> > > > > >  }
> > > > >
> > > > >
> > > > > This gives me pause, callbacks can now trigger after device
> > > > > has been unregistered. Are we sure this is safe?
> > > >
> > > > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
> > > > after tasklet_kill(). I can add a comment to explain this.
> > >
> > > that waits for outstanding tasklets but does it really prevent
> > > future ones?
> >
> > I think so, it tries to test and set TASKLET_STATE_SCHED which blocks
> > the future scheduling of a tasklet.
> >
> > Thanks
>
> But then in the end it clears it, does it not?

Right, so we need to reset before taskset_kill().

Thanks

>
> > >
> > > > > Won't it be safer to just keep the rtnl_lock around
> > > > > the whole process?
> > > >
> > > > It looks to me we rtnl_lock can't help in synchronizing with the
> > > > callbacks, anything I miss?
> > > >
> > > > Thanks
> > >
> > > good point.
> > >
> > >
> > > > >
> > > > > >  static struct virtio_device_id id_table[] = {
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > >
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ