[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091104172542.GC6736@linux.vnet.ibm.com>
Date: Wed, 4 Nov 2009 09:25:42 -0800
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Gregory Haskins <gregory.haskins@...il.com>,
Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org,
virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, mingo@...e.hu, linux-mm@...ck.org,
akpm@...ux-foundation.org, hpa@...or.com,
Rusty Russell <rusty@...tcorp.com.au>, s.hetze@...ux-ag.com
Subject: Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Wed, Nov 04, 2009 at 01:57:29PM +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 03, 2009 at 03:57:44PM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 03, 2009 at 01:14:06PM -0500, Gregory Haskins wrote:
> > > Gregory Haskins wrote:
> > > > Eric Dumazet wrote:
> > > >> Michael S. Tsirkin a écrit :
> > > >>> +static void handle_tx(struct vhost_net *net)
> > > >>> +{
> > > >>> + struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
> > > >>> + unsigned head, out, in, s;
> > > >>> + struct msghdr msg = {
> > > >>> + .msg_name = NULL,
> > > >>> + .msg_namelen = 0,
> > > >>> + .msg_control = NULL,
> > > >>> + .msg_controllen = 0,
> > > >>> + .msg_iov = vq->iov,
> > > >>> + .msg_flags = MSG_DONTWAIT,
> > > >>> + };
> > > >>> + size_t len, total_len = 0;
> > > >>> + int err, wmem;
> > > >>> + size_t hdr_size;
> > > >>> + struct socket *sock = rcu_dereference(vq->private_data);
> > > >>> + if (!sock)
> > > >>> + return;
> > > >>> +
> > > >>> + wmem = atomic_read(&sock->sk->sk_wmem_alloc);
> > > >>> + if (wmem >= sock->sk->sk_sndbuf)
> > > >>> + return;
> > > >>> +
> > > >>> + use_mm(net->dev.mm);
> > > >>> + mutex_lock(&vq->mutex);
> > > >>> + vhost_no_notify(vq);
> > > >>> +
> > > >> using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect
> > > >> that your use of RCU is not correct.
> > > >>
> > > >> 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
> > > >> we are not allowed to sleep in such a section.
> > > >> (Quoting Documentation/RCU/whatisRCU.txt :
> > > >> It is illegal to block while in an RCU read-side critical section, )
> > > >>
> > > >> 2) mutex_lock() can sleep (ie block)
> > > >>
> > > >
> > > >
> > > > Michael,
> > > > I warned you that this needed better documentation ;)
> > > >
> > > > Eric,
> > > > I think I flagged this once before, but Michael convinced me that it
> > > > was indeed "ok", if but perhaps a bit unconventional. I will try to
> > > > find the thread.
> > > >
> > > > Kind Regards,
> > > > -Greg
> > > >
> > >
> > > Here it is:
> > >
> > > http://lkml.org/lkml/2009/8/12/173
> >
> > What was happening in that case was that the rcu_dereference()
> > was being used in a workqueue item. The role of rcu_read_lock()
> > was taken on be the start of execution of the workqueue item, of
> > rcu_read_unlock() by the end of execution of the workqueue item, and
> > of synchronize_rcu() by flush_workqueue(). This does work, at least
> > assuming that flush_workqueue() operates as advertised, which it appears
> > to at first glance.
> >
> > The above code looks somewhat different, however -- I don't see
> > handle_tx() being executed in the context of a work queue. Instead
> > it appears to be in an interrupt handler.
> > So what is the story? Using synchronize_irq() or some such?
> >
> > Thanx, Paul
>
> No, there has been no change (I won't be able to use a mutex in an
> interrupt handler, will I?). handle_tx is still called in the context
> of a work queue: either from handle_tx_kick or from handle_tx_net which
> are work queue items.
Ah, my mistake -- I was looking at 2.6.31 rather than latest git with
your patches.
> Can you ack this usage please?
I thought I had done so in my paragraph above, but if you would like
something a bit more formal...
I, Paul E. McKenney, maintainer of the RCU implmentation
embodied in the Linux kernel and co-inventor of RCU, being of
sound mind and body, notwithstanding the wear and tear inherent
in my numerous decades sojourn on this planet, hereby declare
that the following usage of work queues constitutes a valid
RCU implementation:
1. Execution of a full workqueue item being substituted
for a conventional RCU read-side critical section, so
that the start of execution of the function specified to
INIT_WORK() corresponds to rcu_read_lock(), and the end of
this self-same function corresponds to rcu_read_unlock().
2. Execution of flush_workqueue() being substituted for
the conventional synchronize_rcu().
The kernel developer availing himself or herself of this
declaration must observe the following caveats:
a. The function specified to INIT_WORK() may only be
invoked via the workqueue mechanism. Invoking said
function directly renders this declaration null
and void, as it prevents the flush_workqueue() function
from delivering the fundamental guarantee inherent in RCU.
b. At some point in the future, said developer may be
required to apply some gcc attribute or sparse annotation
to the function passed to INIT_WORK(). Beyond that
point, failure to comply will render this declaration
null and void, as such failure would render inoperative
some potential RCU-validation tools, as duly noted by
Eric Dumazet.
c. This declaration in no way relieves the developer of
the responsibility to use this and other synchronization
mechanisms correctly, again, as duly noted by Eric
Dumazet.
(Sorry, but, as always, I could not resist!)
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists