[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1426105262.11398.66.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Wed, 11 Mar 2015 13:21:02 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Martin Lau <kafai@...com>
Cc: Amir Vadai <amirv@...lanox.com>,
Or Gerlitz <ogerlitz@...lanox.com>, netdev@...r.kernel.org,
kernel-team@...com
Subject: Re: [Question] net/mlx4_en: Memory consumption issue with mlx4_en
driver
On Wed, 2015-03-11 at 11:51 -0700, Martin Lau wrote:
> Hi,
>
> We have seen a memory consumption issue related to the mlx4 driver.
> We suspect it is related to the page order used to do the alloc_pages().
> The order starts by 3 and then try the next lower value in case of failure.
> I have copy and paste the alloc_pages() call site at the end of the email.
>
> Is it a must to get order 3 pages? Based on the code and its comment,
> it seems it is a little bit of functional and/or performance reason.
> Can you share some perf test numbers on different page order allocation,
> like 3 vs 2 vs 1?
>
> It can be reproduced by:
> 1. At netserver (receiver), sysctl net.ipv4.tcp_rmem ='4096 125000 67108864'
> and net.core.rmem_max=67108864.
> 2. Start two netservers listening on 2 different ports:
> - One for taking 1000 background netperf flows
> - Another netserver for taking 200 netperf flows. It will be
> suspended (ctrl-z) in the middle of the test.
> 2. Start 1000 background netperf TCP_STREAM flows
> 3. Start another 200 netperf TCP_STREAM flows
> 4. Suspend the netserver taking the 200 flows.
> 5. Observe the socket memory usage of the suspended netserver by 'ss -t -m'.
> 200 of them will eventually reach 64MB rmem.
>
> We observed the total socket rmem usage reported by 'ss -t -m'
> has a huge difference from /proc/meminfo. We have seen ~6x-10x difference.
>
> Any of the fragment queued in the suspended socket will
> hold a refcount to page->_count and stop 8 pages from freeing.
> The net.ipv4.tcp_mem seems not saving us here since it only
> counts the skb->truesize which is 1536 in our setup.
>
> Thanks,
> --Martin
You know, even the order-3 allocations done for regular skb allocations
will hurt you : a single copybreaked skb stored a long time in a tcp
receive queue will hold 32KB of memory.
Even 4KB can lead to disasters.
You could lower tcp_rmem so that collapsing happens sooner.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists