netdev - Re: [PATCH net-next 4/4] virtio-net: auto-tune mergeable rx buffer size for improved performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANJ5vPKaNsNkx9MLYFYBSxgfuhm_xnmJh_UYadAfv-_t6nPjbA@mail.gmail.com>
Date:	Sat, 16 Nov 2013 01:06:32 -0800
From:	Michael Dalton <mwdalton@...gle.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
Cc:	Jason Wang <jasowang@...hat.com>,
	"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
	Eric Dumazet <edumazet@...gle.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Daniel Borkmann <dborkman@...hat.com>,
	Eric Northup <digitaleric@...gle.com>,
	lf-virt <virtualization@...ts.linux-foundation.org>
Subject: Re: [PATCH net-next 4/4] virtio-net: auto-tune mergeable rx buffer
 size for improved performance

Hi,

Apologies for the delay, I wanted to get answers together for all of the
open questions raised on this thread. The first patch in this patchset is
already merged, so after the merge window re-opens I'll send out new
patchsets covering the remaining 3 patches.

After reflecting on feedback from this thread, I think it makes sense to
separate out the per-receive queue page frag allocator patches from the
autotuning patch when the merge window re-opens. The per-receive queue
page frag allocator patches help deal with fragmentation (PAGE_SIZE does
not evenly divide MERGE_BUFFER_LEN), and provide benefits whether or not
auto-tuning is present. Auto-tuning can then be evaluated separately.

On Wed, 2013-11-13 at 15:10 +0800, Jason Wang wrote:
> There's one concern with EWMA. How well does it handle multiple streams
> each with different packet size? E.g there may be two flows, one with
> 256 bytes each packet another is 64K.  Looks like it can result we
> allocate PAGE_SIZE buffer for 256 (which is bad since the
> payload/truesize is low) bytes or 1500+ for 64K buffer (which is ok
> since we can do coalescing).

If multiple streams of very different packet sizes are arriving on the
same receive queue, no single buffer size is ideal(e.g., large buffers
will cause small packets to take up too much memory, but small buffers
may reduce throughput somewhat for large packets). We don't know a
priori which packet will be delivered to a given receive queue packet
buffer, so any size we choose will not be optimal for all cases if we
have significant variance in packet sizes.

> Do you have perf numbers that just without this patch? We need to know
> how much EWMA help exactly.

Great point, I should have included that in my initial benchmarking. I ran
a benchmark in the same environment as my initial results, this time with
the first 3 patches in this patchset applied but without the autotuning
patch.  The average performance over 5 runs of 30-second netperf was
13760.85Gb/s.

> Is there a chance that est_buffer_len was smaller than or equal with len?

Yes, that is possible if the average packet length decreases.

> Not sure this is accurate, since buflen may change and several frags may
> share a single page. So the est_buffer_len we get in receive_mergeable()
> may not be the correct value.

I agree it may not be 100% accurate but we can choose a weight that will
cause the average packet size to change slowly. Even with an order 3 page
there will not be too many packet buffers allocated from a single page.

On Wed, 2013-11-13 at 17:42 +0800, Michael S. Tsirkin wrote:
> I'm not sure it's useful - no one is likely to tune it in practice.
> But how about a comment explaining how was the number chosen?

That makes sense, I agree a comment is needed. The weight determines
how quickly we react to a change in packet size. As we attempt to fill
all free ring entries on refill (in try_fill_recv), I chose a large
weight so that a short burst of traffic with a different average packet
size will not substantially shift the packet buffer size for the entire
ring the next time try_fill_recv is called. I'll add a comment that
compares 64 to nearby values (32, 16).

Best,

Mike
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html