lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150611214525.GA406740@devbig257.prn2.facebook.com>
Date:	Thu, 11 Jun 2015 14:45:25 -0700
From:	Shaohua Li <shli@...com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Chris Mason <clm@...com>, <netdev@...r.kernel.org>,
	<davem@...emloft.net>, <Kernel-team@...com>,
	Eric Dumazet <edumazet@...gle.com>,
	David Rientjes <rientjes@...gle.com>, <linux-mm@...ck.org>
Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation

On Thu, Jun 11, 2015 at 02:22:13PM -0700, Eric Dumazet wrote:
> On Thu, 2015-06-11 at 17:16 -0400, Chris Mason wrote:
> > On 06/11/2015 04:48 PM, Eric Dumazet wrote:
> > > On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote:
> > >> We saw excessive memory compaction triggered by skb_page_frag_refill.
> > >> This causes performance issues. Commit 5640f7685831e0 introduces the
> > >> order-3 allocation to improve performance. But memory compaction has
> > >> high overhead. The benefit of order-3 allocation can't compensate the
> > >> overhead of memory compaction.
> > >>
> > >> This patch makes the order-3 page allocation atomic. If there is no
> > >> memory pressure and memory isn't fragmented, the alloction will still
> > >> success, so we don't sacrifice the order-3 benefit here. If the atomic
> > >> allocation fails, compaction will not be triggered and we will fallback
> > >> to order-0 immediately.
> > >>
> > >> The mellanox driver does similar thing, if this is accepted, we must fix
> > >> the driver too.
> > >>
> > >> Cc: Eric Dumazet <edumazet@...gle.com>
> > >> Signed-off-by: Shaohua Li <shli@...com>
> > >> ---
> > >>  net/core/sock.c | 2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/net/core/sock.c b/net/core/sock.c
> > >> index 292f422..e9855a4 100644
> > >> --- a/net/core/sock.c
> > >> +++ b/net/core/sock.c
> > >> @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
> > >>  
> > >>  	pfrag->offset = 0;
> > >>  	if (SKB_FRAG_PAGE_ORDER) {
> > >> -		pfrag->page = alloc_pages(gfp | __GFP_COMP |
> > >> +		pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP |
> > >>  					  __GFP_NOWARN | __GFP_NORETRY,
> > >>  					  SKB_FRAG_PAGE_ORDER);
> > >>  		if (likely(pfrag->page)) {
> > > 
> > > This is not a specific networking issue, but mm one.
> > > 
> > > You really need to start a discussion with mm experts.
> > > 
> > > Your changelog does not exactly explains what _is_ the problem.
> > > 
> > > If the problem lies in mm layer, it might be time to fix it, instead of
> > > work around the bug by never triggering it from this particular point,
> > > which is a safe point where a process is willing to wait a bit.
> > > 
> > > Memory compaction is either working as intending, or not.
> > > 
> > > If we enabled it but never run it because it hurts, what is the point
> > > enabling it ?
> > 
> > networking is asking for 32KB, and the MM layer is doing what it can to
> > provide it.  Are the gains from getting 32KB contig bigger than the cost
> > of moving pages around if the MM has to actually go into compaction?
> > Should we start disk IO to give back 32KB contig?
> > 
> > I think we want to tell the MM to compact in the background and give
> > networking 32KB if it happens to have it available.  If not, fall back
> > to smaller allocations without doing anything expensive.
> 
> Exactly my point. (And I mentioned this about 4 months ago)

This is exactly what the patch try to do. Atomic 32k allocation will
fail with memory pressure, kswapd is waken up to do compaction and we
fallback to 4k.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ