[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05946af9f1c223bf20a43b9ced346e39e2f54cad.camel@surriel.com>
Date: Tue, 03 Dec 2024 20:43:32 -0500
From: Rik van Riel <riel@...riel.com>
To: Dave Hansen <dave.hansen@...el.com>, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com>
Cc: kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, linux-kernel@...r.kernel.org, x86@...nel.org, Ingo Molnar
<mingo@...nel.org>, Linus Torvalds <torvalds@...ux-foundation.org>, Peter
Zijlstra <peterz@...radead.org>, Mel Gorman <mgorman@...e.de>
Subject: Re: [PATCH v2] x86,mm: only trim the mm_cpumask once a second
On Tue, 2024-12-03 at 16:46 -0800, Dave Hansen wrote:
> On 12/3/24 12:07, Rik van Riel wrote:
> > The tlb_flush2 threaded test does not only madvise in a
> > loop, but also mmap and munmap from inside every thread.
> >
> > This should create massive contention on the mmap_lock,
> > resulting in threads going to sleep while waiting in
> > mmap and munmap.
> >
> > https://github.com/antonblanchard/will-it-scale/blob/master/tests/tlb_flush2.c
>
> Oh, wow, it only madvise()'s a 1MB allocation before doing the
> munmap()/mmap(). I somehow remembered it being a lot larger. And,
> yeah,
> I see a ton of idle time which would be 100% explained by mmap_lock
> contention.
>
> Did the original workload that you care about have idle time?
>
The workloads that I care about are things like memcache,
web servers, web proxies, and other workloads that typically
handle very short requests before going idle again.
These programs have a LOT of context switches to and from
the idle task.
> I'm wondering if trimming mm_cpumask() on the way to idle but leaving
> it
> alone on a context switch to another thread is a good idea.
>
The problem with that is that you then have to set the bit
again when switching back to the program, which creates
contention when a number of CPUs are transitioning to and
from idle at the same time.
Atomic operations on a contended cache line from the
context switch code end up being quite visible when
profiling some workloads :)
--
All Rights Reversed.
Powered by blists - more mailing lists