[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1494858437.29205.26.camel@redhat.com>
Date: Mon, 15 May 2017 10:27:17 -0400
From: Rik van Riel <riel@...hat.com>
To: Vlastimil Babka <vbabka@...e.cz>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: Mel Gorman <mgorman@...hsingularity.net>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/numa: use down_read_trylock for mmap_sem
On Mon, 2017-05-15 at 15:13 +0200, Vlastimil Babka wrote:
> A customer has reported a soft-lockup when running a proprietary
> intensive
> memory stress test, where the trace on multiple CPU's looks like
> this:
>
> RIP: 0010:[<ffffffff810c53fe>]
> [<ffffffff810c53fe>] native_queued_spin_lock_slowpath+0x10e/0x190
> ...
> Call Trace:
> [<ffffffff81182d07>] queued_spin_lock_slowpath+0x7/0xa
> [<ffffffff811bc331>] change_protection_range+0x3b1/0x930
> [<ffffffff811d4be8>] change_prot_numa+0x18/0x30
> [<ffffffff810adefe>] task_numa_work+0x1fe/0x310
> [<ffffffff81098322>] task_work_run+0x72/0x90
>
> Further investigation showed that the lock contention here is
> pmd_lock().
>
> The task_numa_work() function makes sure that only one thread is let
> to perform
> the work in a single scan period (via cmpxchg), but if there's a
> thread with
> mmap_sem locked for writing for several periods, multiple threads in
> task_numa_work() can build up a convoy waiting for mmap_sem for read
> and then
> all get unblocked at once.
>
> This patch changes the down_read() to the trylock version, which
> prevents the
> build up. For a workload experiencing mmap_sem contention, it's
> probably better
> to postpone the NUMA balancing work anyway. This seems to have fixed
> the soft
> lockups involving pmd_lock(), which is in line with the convoy
> theory.
>
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
Acked-by: Rik van Riel <riel@...hat.com>
Powered by blists - more mailing lists