netdev - Re: Page allocator bottleneck

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171103134020.3hwquerifnc6k6qw@techsingularity.net>
Date:   Fri, 3 Nov 2017 13:40:20 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Tariq Toukan <tariqt@...lanox.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        David Miller <davem@...emloft.net>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Alexei Starovoitov <ast@...com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Eran Ben Elisha <eranbe@...lanox.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...e.com>
Subject: Re: Page allocator bottleneck

On Thu, Nov 02, 2017 at 07:21:09PM +0200, Tariq Toukan wrote:
> 
> 
> On 18/09/2017 12:16 PM, Tariq Toukan wrote:
> > 
> > 
> > On 15/09/2017 1:23 PM, Mel Gorman wrote:
> > > On Thu, Sep 14, 2017 at 07:49:31PM +0300, Tariq Toukan wrote:
> > > > Insights: Major degradation between #1 and #2, not getting any
> > > > close to linerate! Degradation is fixed between #2 and #3. This is
> > > > because page allocator cannot stand the higher allocation rate. In
> > > > #2, we also see that the addition of rings (cores) reduces BW (!!),
> > > > as result of increasing congestion over shared resources.
> > > > 
> > > 
> > > Unfortunately, no surprises there.
> > > 
> > > > Congestion in this case is very clear. When monitored in perf
> > > > top: 85.58% [kernel] [k] queued_spin_lock_slowpath
> > > > 
> > > 
> > > While it's not proven, the most likely candidate is the zone lock
> > > and that should be confirmed using a call-graph profile. If so, then
> > > the suggestion to tune to the size of the per-cpu allocator would
> > > mitigate the problem.
> > > 
> > Indeed, I tuned the per-cpu allocator and bottleneck is released.
> > 
> 
> Hi all,
> 
> After leaving this task for a while doing other tasks, I got back to it now
> and see that the good behavior I observed earlier was not stable.
> 
> Recall: I work with a modified driver that allocates a page (4K) per packet
> (MTU=1500), in order to simulate the stress on page-allocator in 200Gbps
> NICs.
> 

There is almost new in the data that hasn't been discussed before. The
suggestion to free on a remote per-cpu list would be expensive as it would
require per-cpu lists to have a lock for safe remote access.  However,
I'd be curious if you could test the mm-pagealloc-irqpvec-v1r4 branch
ttps://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git .  It's an
unfinished prototype I worked on a few weeks ago. I was going to revisit
in about a months time when 4.15-rc1 was out. I'd be interested in seeing
if it has a postive gain in normal page allocations without destroying
the performance of interrupt and softirq allocation contexts. The
interrupt/softirq context testing is crucial as that is something that
hurt us before when trying to improve page allocator performance.

-- 
Mel Gorman
SUSE Labs