[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190521141204.dimpnwq7zi3z2pup@steredhat>
Date: Tue, 21 May 2019 16:12:04 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Jason Wang <jasowang@...hat.com>, netdev@...r.kernel.org,
Stefan Hajnoczi <stefanha@...hat.com>
Subject: Re: Question about IRQs during the .remove() of virtio-vsock driver
On Tue, May 21, 2019 at 09:56:42AM -0400, Michael S. Tsirkin wrote:
> On Tue, May 21, 2019 at 03:49:20PM +0200, Stefano Garzarella wrote:
> > On Tue, May 21, 2019 at 06:05:31AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, May 21, 2019 at 11:44:07AM +0200, Stefano Garzarella wrote:
> > > > Hi Micheal, Jason,
> > > > as suggested by Stefan, I'm checking if we have some races in the
> > > > virtio-vsock driver. We found some races in the .probe() and .remove()
> > > > with the upper layer (socket) and I'll fix it.
> > > >
> > > > Now my attention is on the bottom layer (virtio device) and my question is:
> > > > during the .remove() of virtio-vsock driver (virtio_vsock_remove), could happen
> > > > that an IRQ comes and one of our callback (e.g. virtio_vsock_rx_done()) is
> > > > executed, queueing new works?
> > > >
> > > > I tried to follow the code in both cases (device unplugged or module removed)
> > > > and maybe it couldn't happen because we remove it from bus's knowledge,
> > > > but I'm not sure and your advice would be very helpful.
> > > >
> > > > Thanks in advance,
> > > > Stefano
> > >
> > >
> > > Great question! This should be better documented: patches welcome!
> >
> > When I'm clear, I'll be happy to document this.
> >
> > >
> > > Here's my understanding:
> > >
> > >
> > > A typical removal flow works like this:
> > >
> > > - prevent linux from sending new kick requests to device
> > > and flush such outstanding requests if any
> > > (device can still send notifications to linux)
> > >
> > > - call
> > > vi->vdev->config->reset(vi->vdev);
> > > this will flush all device writes and interrupts.
> > > device will not use any more buffers.
> > > previously outstanding callbacks might still be active.
> > >
> > > - Then call
> > > vdev->config->del_vqs(vdev);
> > > to flush outstanding callbacks if any.
> >
> > Thanks for sharing these useful information.
> >
> > So, IIUC between step 1 (e.g. in virtio-vsock we flush all work-queues) and
> > step 2, new IRQs could happen, and in the virtio-vsock driver new work
> > will be queued.
> >
> > In order to handle this case, I'm thinking to add a new variable
> > 'work_enabled' in the struct virtio_vsock, put it to false at the start
> > of the .remove(), then call synchronize_rcu() before to flush all work
> > queues
> > and use an helper function virtio_transport_queue_work() to queue
> > a new work, where the check of work_enabled and the queue_work are in the
> > RCU read critical section.
> >
> > Here a pseudo code to explain better the idea:
> >
> > virtio_vsock_remove() {
> > vsock->work_enabled = false;
> >
> > /* Wait for other CPUs to finish to queue works */
> > synchronize_rcu();
> >
> > flush_works();
> >
> > vdev->config->reset(vdev);
> >
> > ...
> >
> > vdev->config->del_vqs(vdev);
> > }
> >
> > virtio_vsock_queue_work(vsock, work) {
> > rcu_read_lock();
> >
> > if (!vsock->work_enabled) {
> > goto out;
> > }
> >
> > queue_work(virtio_vsock_workqueue, work);
> >
> > out:
> > rcu_read_unlock();
> > }
> >
> >
> > Do you think can work?
> > Please tell me if there is a better way to handle this case.
> >
> > Thanks,
> > Stefano
>
>
> instead of rcu tricks I would just have rx_run and tx_run and check it
> within the queued work - presumably under tx or rx lock.
>
> then queueing an extra work becomes harmless,
> and you flush it after del vqs which flushes everything for you.
>
>
Sure, the patch should be even smaller!
Thank you very much for the advice,
Stefano
Powered by blists - more mailing lists