[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1460939371.10638.97.camel@edumazet-glaptop3.roam.corp.google.com>
Date: Sun, 17 Apr 2016 17:29:31 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Saeed Mahameed <saeedm@...lanox.com>
Cc: "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
Or Gerlitz <ogerlitz@...lanox.com>,
Tal Alon <talal@...lanox.com>,
Tariq Toukan <tariqt@...lanox.com>,
Eran Ben Elisha <eranbe@...lanox.com>,
Achiad Shochat <achiad@...lanox.com>
Subject: Re: [PATCH net-next V2 05/11] net/mlx5e: Support RX multi-packet
WQE (Striding RQ)
On Mon, 2016-04-18 at 00:31 +0300, Saeed Mahameed wrote:
> Performance tested on ConnectX4-Lx 50G.
> To isolate the feature under test, the numbers below were measured with
> HW LRO turned off. We verified that the performance just improves when
> LRO is turned back on.
>
> * Netperf single TCP stream:
> - BW raised by 10-15% for representative packet sizes:
> default, 64B, 1024B, 1478B, 65536B.
>
> * Netperf multi TCP stream:
> - No degradation, line rate reached.
>
> * Pktgen: packet rate raised by 5-10% for traffic of different message
> sizes: 64B, 128B, 256B, 1024B, and 1500B.
>
> * Pktgen: packet loss in bursts of small messages (64byte),
> single stream:
> - | num packets | packets loss before | packets loss after
> | 2K | ~ 1K | 0
> | 8K | ~ 6K | 0
> | 16K | ~13K | 0
> | 32K | ~28K | 0
> | 64K | ~57K | ~24K
As I already mentioned, allocated order-5 pages and hoping host only
receives friendly traffic is very optimistic.
A 192 bytes frame, is claiming to consume 192 bytes frag with your new
allocation strategy. (skb->truesize is kind of minimal)
In reality, it can prevent a whole 131072 bytes of memory from being
reclaimed/freed. TCP stack will not consider such skb has a candidate
for collapsing in case of memory pressure or hostile peer.
Your tests are obviously run on a freshly booted host, where all
physical memory can be consumed for networking buffers.
Even with order-3 pages, we have problems (at Facebook and Google) on
hosts that we do not reboot every day.
At the time order-5 allocations fail, it is already too late, as maybe
thousands of out-of-order TCP packets might have consumed all the memory
and the host will die.
/proc/sys/net/ipv4/tcp_mem by default allows TCP to use up to 10% of
hysical memory, assuming skb->truesize is true.
In your schem, TCP might never notice it uses 100% of the ram for
packets stored in out or order queues, since a frag will hold 32 times
more pages than really announced.
If really you need to allocate physically contiguous memory, have you
considered converting the order-5 pages into 32 order-0 ones ?
This way, a 192 bytes frame sitting in one socket would hold one order-0
page in the worst case, and TCP wont be allowed to use all physical
memory.
Powered by blists - more mailing lists