netdev - Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181105084846.GA6203@intel.com>
Date:   Mon, 5 Nov 2018 16:48:47 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     Tariq Toukan <tariqt@...lanox.com>
Cc:     Jesper Dangaard Brouer <brouer@...hat.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        "pstaszewski@...are.pl" <pstaszewski@...are.pl>,
        "eric.dumazet@...il.com" <eric.dumazet@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
        "yoel@...knet.dk" <yoel@...knet.dk>,
        "mgorman@...hsingularity.net" <mgorman@...hsingularity.net>
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal
 users traffic

On Mon, Nov 05, 2018 at 08:42:33AM +0000, Tariq Toukan wrote:
> 
> On 03/11/2018 2:53 PM, Jesper Dangaard Brouer wrote:
> > 
> > On Fri, 2 Nov 2018 22:20:24 +0800 Aaron Lu <aaron.lu@...el.com> wrote:
> >>
> >> I think here is a problem - order 0 pages are freed directly to buddy,
> >> bypassing per-cpu-pages. This might be the reason lock contention
> >> appeared on free path.
> > 
> > OMG - you just found a significant issue with the network stacks
> > interaction with the page allocator!  This explains why I could not get
> > the PCP (Per-Cpu-Pages) system to have good performance, in my
> > performance networking benchmarks. As we are basically only using the
> > alloc side of PCP, and not the free side.
> >   We have spend years adding different driver level recycle tricks to
> > avoid this code path getting activated, exactly because it is rather
> > slow and problematic that we hit this zone->lock.
> > 
> 
> Oh! It has been behaving this way for too long.
> Good catch!

Thanks.

> >> Can someone apply below diff and see if lock contention is gone?
> > 
> > I have also applied and tested this patch, and yes the lock contention
> > is gone.  As mentioned is it rather difficult to hit this code path, as
> > the driver page recycle mechanism tries to hide/avoid it, but mlx5 +
> > page_pool + CPU-map recycling have a known weakness that bypass the
> > driver page recycle scheme (that I've not fixed yet).  I observed a 7%
> > speedup for this micro benchmark.
> > 
> 
> Great news. I also have a benchmark that uses orde-r0 pages and stresses 
> the zone-lock. I'll test your patch during this week.

Note this patch only helps when order-0 pages are freed through
page_frag_free().

I'll send a formal patch later.

> >   
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index e2ef1c17942f..65c0ae13215a 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -4554,8 +4554,14 @@ void page_frag_free(void *addr)
> >>   {
> >>   	struct page *page = virt_to_head_page(addr);
> >>   
> >> -	if (unlikely(put_page_testzero(page)))
> >> -		__free_pages_ok(page, compound_order(page));
> >> +	if (unlikely(put_page_testzero(page))) {
> >> +		unsigned int order = compound_order(page);
> >> +
> >> +		if (order == 0)
> >> +			free_unref_page(page);
> >> +		else
> >> +			__free_pages_ok(page, order);
> >> +	}
> >>   }
> >>   EXPORT_SYMBOL(page_frag_free);
> > 
> > Thank you Aaron for spotting this!!!
> > 
> Thanks Aaron :) !!
> 
> Does it conflict with your recent work that optimizes order-0 allocation?

No it doesn't. This patch optimize code outside of zone lock(by reducing
the need to take zone lock) while my recent work optimize code inside
the zone lock :-)