[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170322234004.kffsce4owewgpqnm@techsingularity.net>
Date: Wed, 22 Mar 2017 23:40:04 +0000
From: Mel Gorman <mgorman@...hsingularity.net>
To: Tariq Toukan <tariqt@...lanox.com>
Cc: Jesper Dangaard Brouer <brouer@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
akpm@...ux-foundation.org, linux-mm <linux-mm@...ck.org>,
Saeed Mahameed <saeedm@...lanox.com>
Subject: Re: Page allocator order-0 optimizations merged
On Wed, Mar 22, 2017 at 07:39:17PM +0200, Tariq Toukan wrote:
> > > > This modification may slow allocations from IRQ context slightly
> > > > but the
> > > > main gain from the per-cpu allocator is that it scales better for
> > > > allocations from multiple contexts. There is an implicit
> > > > assumption that
> > > > intensive allocations from IRQ contexts on multiple CPUs from a single
> > > > NUMA node are rare
> Hi Mel, Jesper, and all.
>
> This assumption contradicts regular multi-stream traffic that is naturally
> handled
> over close numa cores. I compared iperf TCP multistream (8 streams)
> over CX4 (mlx5 driver) with kernels v4.10 (before this series) vs
> kernel v4.11-rc1 (with this series).
> I disabled the page-cache (recycle) mechanism to stress the page allocator,
> and see a drastic degradation in BW, from 47.5 G in v4.10 to 31.4 G in
> v4.11-rc1 (34% drop).
> I noticed queued_spin_lock_slowpath occupies 62.87% of CPU time.
Can you get the stack trace for the spin lock slowpath to confirm it's
from IRQ context?
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists