lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 03 Mar 2022 15:10:35 +0100
From:   Nicolas Saenz Julienne <nsaenzju@...hat.com>
To:     Vlastimil Babka <vbabka@...e.cz>, akpm@...ux-foundation.org
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        frederic@...nel.org, tglx@...utronix.de, mtosatti@...hat.com,
        mgorman@...e.de, linux-rt-users@...r.kernel.org, cl@...ux.com,
        paulmck@...nel.org, willy@...radead.org
Subject: Re: [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support

On Thu, 2022-03-03 at 14:27 +0100, Vlastimil Babka wrote:
> On 2/8/22 11:07, Nicolas Saenz Julienne wrote:
> > This series replaces mm/page_alloc's per-cpu page lists drain mechanism with
> > one that allows accessing the lists remotely. Currently, only a local CPU is
> > permitted to change its per-cpu lists, and it's expected to do so, on-demand,
> > whenever a process demands it by means of queueing a drain task on the local
> > CPU. This causes problems for NOHZ_FULL CPUs and real-time systems that can't
> > take any sort of interruption and to some lesser extent inconveniences idle and
> > virtualised systems.
> > 
> > The new algorithm will atomically switch the pointer to the per-cpu page lists
> > and use RCU to make sure it's not being concurrently used before draining the
> > lists. And its main benefit of is that it fixes the issue for good, avoiding
> > the need for configuration based heuristics or having to modify applications
> > (i.e. using the isolation prctrl being worked by Marcello Tosatti ATM).
> > 
> > All this with minimal performance implications: a page allocation
> > microbenchmark was run on multiple systems and architectures generally showing
> > no performance differences, only the more extreme cases showed a 1-3%
> > degradation. See data below. Needless to say that I'd appreciate if someone
> > could validate my values independently.
> > 
> > The approach has been stress-tested: I forced 100 drains/s while running
> > mmtests' pft in a loop for a full day on multiple machines and archs (arm64,
> > x86_64, ppc64le).
> > 
> > Note that this is not the first attempt at fixing this per-cpu page lists:
> >  - The first attempt[1] tried to conditionally change the pagesets locking
> >    scheme based the NOHZ_FULL config. It was deemed hard to maintain as the
> >    NOHZ_FULL code path would be rarely tested. Also, this only solves the issue
> >    for NOHZ_FULL setups, which isn't ideal.
> >  - The second[2] unanimously switched the local_locks to per-cpu spinlocks. The
> >    performance degradation was too big.
> 
> For completeness, what was the fate of the approach to have pcp->high = 0
> for NOHZ cpus? [1] It would be nice to have documented why it wasn't
> feasible. Too much overhead for when these CPUs eventually do allocate, or
> some other unforeseen issue? Thanks.

Yes sorry, should've been more explicit on why I haven't gone that way yet.

Some points:
 - As I mention above, not only CPU isolation users care for this. RT and HPC
   do too. This is my main motivation for focusing on this solution, or
   potentially Mel's.

 - Fully disabling pcplists on nohz_full CPUs is too drastic, as isolated CPUs
   might want to retain the performance edge while not running their sensitive
   workloads. (I remember Christoph Lamenter's commenting about this on the
   previous RFC).

 - So the idea would be to selectively disable pcplists upon entering in the
   really 'isolated' area. This could be achieved with Marcelo Tosatti's new
   WIP prctrl[1]. And if we decide the current solutions are unacceptable I'll
   have a go at it.

Thanks!

[1] https://lore.kernel.org/lkml/20220204173554.534186379@fedora.localdomain/T/

-- 
Nicolás Sáenz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ