netdev - Re: [Question] net/mlx4_en: Memory consumption issue with mlx4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1426105262.11398.66.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Wed, 11 Mar 2015 13:21:02 -0700
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Martin Lau <kafai@...com>
Cc:	Amir Vadai <amirv@...lanox.com>,
	Or Gerlitz <ogerlitz@...lanox.com>, netdev@...r.kernel.org,
	kernel-team@...com
Subject: Re: [Question] net/mlx4_en: Memory consumption issue with mlx4_en
 driver

On Wed, 2015-03-11 at 11:51 -0700, Martin Lau wrote:
> Hi,
> 
> We have seen a memory consumption issue related to the mlx4 driver.
> We suspect it is related to the page order used to do the alloc_pages().
> The order starts by 3 and then try the next lower value in case of failure.
> I have copy and paste the alloc_pages() call site at the end of the email.
> 
> Is it a must to get order 3 pages?  Based on the code and its comment,
> it seems it is a little bit of functional and/or performance reason.
> Can you share some perf test numbers on different page order allocation,
> like 3 vs 2 vs 1?
> 
> It can be reproduced by:
> 1. At netserver (receiver), sysctl net.ipv4.tcp_rmem ='4096 125000  67108864'
>    and net.core.rmem_max=67108864.
> 2. Start two netservers listening on 2 different ports:
>    - One for taking 1000 background netperf flows
>    - Another netserver for taking 200 netperf flows.  It will be
>      suspended (ctrl-z) in the middle of the test.
> 2. Start 1000 background netperf TCP_STREAM flows
> 3. Start another 200 netperf TCP_STREAM flows
> 4. Suspend the netserver taking the 200 flows.
> 5. Observe the socket memory usage of the suspended netserver by 'ss -t -m'.
>    200 of them will eventually reach 64MB rmem.
> 
> We observed the total socket rmem usage reported by 'ss -t -m'
> has a huge difference from /proc/meminfo. We have seen ~6x-10x difference.
> 
> Any of the fragment queued in the suspended socket will
> hold a refcount to page->_count and stop 8 pages from freeing.
> The net.ipv4.tcp_mem seems not saving us here since it only
> counts the skb->truesize which is 1536 in our setup.
> 
> Thanks,
> --Martin

You know, even the order-3 allocations done for regular skb allocations
will hurt you : a single copybreaked skb stored a long time in a tcp
receive queue will hold 32KB of memory.

Even 4KB can lead to disasters.

You could lower tcp_rmem so that collapsing happens sooner.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html