netdev - Re: [PATCH 2/2] virtio_net: remove send completion interrupts and avoid TX queue overrun through packet drop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 25 Mar 2011 15:20:46 +1030
From:	Rusty Russell <rusty@...tcorp.com.au>
To:	"Michael S. Tsirkin" <mst@...hat.com>
Cc:	Shirley Ma <mashirle@...ibm.com>,
	Herbert Xu <herbert@...dor.hengli.com.au>, davem@...emloft.net,
	kvm@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 2/2] virtio_net: remove send completion interrupts and avoid TX queue overrun through packet drop

On Thu, 24 Mar 2011 16:28:22 +0200, "Michael S. Tsirkin" <mst@...hat.com> wrote:
> On Thu, Mar 24, 2011 at 11:00:53AM +1030, Rusty Russell wrote:
> > > With simply removing the notify here, it does help the case when TX
> > > overrun hits too often, for example for 1K message size, the single
> > > TCP_STREAM performance improved from 2.xGb/s to 4.xGb/s.
> > 
> > OK, we'll be getting rid of the "kick on full", so please delete that on
> > all benchmarks.
> > 
> > Now, does the capacity check before add_buf() still win anything?  I
> > can't see how unless we have some weird bug.
> > 
> > Once we've sorted that out, we should look at the more radical change
> > of publishing last_used and using that to intuit whether interrupts
> > should be sent.  If we're not careful with ordering and barriers that
> > could introduce more bugs.
> 
> Right. I am working on this, and trying to be careful.
> One thing I'm in doubt about: sometimes we just want to
> disable interrupts. Should still use flags in that case?
> I thought that if we make the published index 0 to vq->num - 1,
> then a special value in the index field could disable
> interrupts completely. We could even reuse the space
> for the flags field to stick the index in. Too complex?

Making the index free-running avoids the "full or empty" confusion, plus
offers and extra debugging insight.

I think that if they really want to disable interrrupts, the flag should
still work, and when the client accepts the "publish last_idx" feature
they are accepting that interrupts may be omitted if they haven't
updated last_idx yet.

> > Anything else on the optimization agenda I've missed?
> > 
> > Thanks,
> > Rusty.
> 
> Several other things I am looking at, wellcome cooperation:
> 1. It's probably a good idea to update avail index
>    immediately instead of upon kick: for RX
>    this might help parallelism with the host.

Yes, once we've done everything else, we should measure this.  It makes
sense.

> 2. Adding an API to add a single buffer instead of s/g,
>    seems to help a bit.

This goes last, since it's kind of an ugly hack, but all internal to
Linux if we decide it's a win.

> 3. For TX sometimes we free a single buffer, sometimes
>    a ton of them, which might make the transmit latency
>    vary. It's probably a good idea to limit this,
>    maybe free the minimal number possible to keep the device
>    going without stops, maybe free up to MAX_SKB_FRAGS.

This kind of heuristic is going to be quite variable depending on
circumstance, I think, so it's a lot of work to make sure we get it
right.

> 4. If the ring is full, we now notify right after
>    the first entry is consumed. For TX this is suboptimal,
>    we should try delaying the interrupt on host.

Lguest already does that: only sends an interrupt when it's run out of
things to do.  It does update the used ring, however, as it processes
them.

This seems sensible to me, but needs to be measured separately as well.

> More ideas, would be nice if someone can try them out:
> 1. We are allocating/freeing buffers for indirect descriptors.
>    Use some kind of pool instead?
>    And we could preformat part of the descriptor.

We need some poolish mechanism for virtio_blk too; perhaps an allocation
callback which both can use (virtio_blk to alloc from a pool, virtio_net
to recycle?).

Along similar lines to preformatting, we could actually try to prepend
the skb_vnet_hdr to the vnet data, and use a single descriptor for the
hdr and the first part of the packet.

Though IIRC, qemu's virtio barfs if the first descriptor isn't just the
hdr (barf...).

> 2. I didn't have time to work on virtio2 ideas presented
>    at the kvm forum yet, any takers?

I didn't even attend.  But I think that virtio is moribund for the
moment; there wasn't enough demand and it's clear that there are
optimization unexplored in virtio1.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html