[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110523111900.GB27212@redhat.com>
Date: Mon, 23 May 2011 14:19:00 +0300
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Rusty Russell <rusty@...tcorp.com.au>
Cc: linux-kernel@...r.kernel.org, Carsten Otte <cotte@...ibm.com>,
Christian Borntraeger <borntraeger@...ibm.com>,
linux390@...ibm.com, Martin Schwidefsky <schwidefsky@...ibm.com>,
Heiko Carstens <heiko.carstens@...ibm.com>,
Shirley Ma <xma@...ibm.com>, lguest@...ts.ozlabs.org,
virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
linux-s390@...r.kernel.org, kvm@...r.kernel.org,
Krishna Kumar <krkumar2@...ibm.com>,
Tom Lendacky <tahm@...ux.vnet.ibm.com>, steved@...ibm.com,
habanero@...ux.vnet.ibm.com
Subject: Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote:
> On Sun, 22 May 2011 15:10:08 +0300, "Michael S. Tsirkin" <mst@...hat.com> wrote:
> > On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote:
> > > On Fri, 20 May 2011 02:11:56 +0300, "Michael S. Tsirkin" <mst@...hat.com> wrote:
> > > > Current code might introduce a lot of latency variation
> > > > if there are many pending bufs at the time we
> > > > attempt to transmit a new one. This is bad for
> > > > real-time applications and can't be good for TCP either.
> > >
> > > Do we have more than speculation to back that up, BTW?
> >
> > Need to dig this up: I thought we saw some reports of this on the list?
>
> I think so too, but a reference needs to be here too.
>
> It helps to have exact benchmarks on what's being tested, otherwise we
> risk unexpected interaction with the other optimization patches.
>
> > > > struct sk_buff *skb;
> > > > unsigned int len;
> > > > -
> > > > - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> > > > + bool c;
> > > > + int n;
> > > > +
> > > > + /* We try to free up at least 2 skbs per one sent, so that we'll get
> > > > + * all of the memory back if they are used fast enough. */
> > > > + for (n = 0;
> > > > + ((c = virtqueue_get_capacity(vi->svq) < capacity) || n < 2) &&
> > > > + ((skb = virtqueue_get_buf(vi->svq, &len)));
> > > > + ++n) {
> > > > pr_debug("Sent skb %p\n", skb);
> > > > vi->dev->stats.tx_bytes += skb->len;
> > > > vi->dev->stats.tx_packets++;
> > > > dev_kfree_skb_any(skb);
> > > > }
> > > > + return !c;
> > >
> > > This is for() abuse :)
> > >
> > > Why is the capacity check in there at all? Surely it's simpler to try
> > > to free 2 skbs each time around?
> >
> > This is in case we can't use indirect: we want to free up
> > enough buffers for the following add_buf to succeed.
>
> Sure, or we could just count the frags of the skb we're taking out,
> which would be accurate for both cases and far more intuitive.
>
> ie. always try to free up twice as much as we're about to put in.
>
> Can we hit problems with OOM? Sure, but no worse than now...
> The problem is that this "virtqueue_get_capacity()" returns the worst
> case, not the normal case. So using it is deceptive.
>
Maybe just document this?
I still believe capacity really needs to be decided
at the virtqueue level, not in the driver.
E.g. with indirect each skb uses a single entry: freeing
1 small skb is always enough to have space for a large one.
I do understand how it seems a waste to leave direct space
in the ring while we might in practice have space
due to indirect. Didn't come up with a nice way to
solve this yet - but 'no worse than now :)'
> > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
> > sure we have enough space in the buffer. Another way to do
> > that is with a define :).
>
> To do this properly, we should really be using the actual number of sg
> elements needed, but we'd have to do most of xmit_skb beforehand so we
> know how many.
>
> Cheers,
> Rusty.
Maybe I'm confused here. The problem isn't the failing
add_buf for the given skb IIUC. What we are trying to do here is stop
the queue *before xmit_skb fails*. We can't look at the
number of fragments in the current skb - the next one can be
much larger. That's why we check capacity after xmit_skb,
not before it, right?
--
MST
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists