linux-kernel - Re: [PATCH v2] sched/core: Don't use dying mm as active

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <89d6acc3cc5d72f750f1a77164043dbfd6e599e8.camel@surriel.com>
Date:   Mon, 29 Jul 2019 12:12:14 -0400
From:   Rik van Riel <riel@...riel.com>
To:     Waiman Long <longman@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Phil Auld <pauld@...hat.com>, Andy Lutomirski <luto@...nel.org>
Subject: Re: [PATCH v2] sched/core: Don't use dying mm as active_mm of
 kthreads

On Mon, 2019-07-29 at 11:37 -0400, Waiman Long wrote:
> On 7/29/19 11:03 AM, Peter Zijlstra wrote:
> > On Mon, Jul 29, 2019 at 10:51:51AM -0400, Waiman Long wrote:
> > > On 7/29/19 4:52 AM, Peter Zijlstra wrote:
> > > > On Sat, Jul 27, 2019 at 01:10:47PM -0400, Waiman Long wrote:
> > > > > It was found that a dying mm_struct where the owning task has
> > > > > exited
> > > > > can stay on as active_mm of kernel threads as long as no
> > > > > other user
> > > > > tasks run on those CPUs that use it as active_mm. This
> > > > > prolongs the
> > > > > life time of dying mm holding up memory and other resources
> > > > > like swap
> > > > > space that cannot be freed.
> > > > Sure, but this has been so 'forever', why is it a problem now?
> > > I ran into this probem when running a test program that keeps on
> > > allocating and touch memory and it eventually fails as the swap
> > > space is
> > > full. After the failure, I could not rerun the test program again
> > > because the swap space remained full. I finally track it down to
> > > the
> > > fact that the mm stayed on as active_mm of kernel threads. I have
> > > to
> > > make sure that all the idle cpus get a user task to run to bump
> > > the
> > > dying mm off the active_mm of those cpus, but this is just a
> > > workaround,
> > > not a solution to this problem.
> > The 'sad' part is that x86 already switches to init_mm on idle and
> > we
> > only keep the active_mm around for 'stupid'.
> > 
> > Rik and Andy were working on getting that 'fixed' a while ago, not
> > sure
> > where that went.
> 
> Good, perhaps the right thing to do is for the idle->kernel case to
> keep
> init_mm as the active_mm instead of reuse whatever left behind the
> last
> time around.

Absolutely not. That creates heavy cache line
contention on the mm_cpumask as we switch the
mm out and back in after an idle period.

The cache line contention on the mm_cpumask
alone can take up as much as a percent of
CPU time on a very busy system with a large
multi-threaded application, multiple sockets,
and lots of context switches.

-- 
All Rights Reversed.

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)