lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230919162215.2cszdylo2skevnr6@suse.de>
Date:   Tue, 19 Sep 2023 17:22:15 +0100
From:   Mel Gorman <mgorman@...e.de>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Raghavendra K T <raghavendra.kt@....com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Ingo Molnar <mingo@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>, rppt@...nel.org,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Bharata B Rao <bharata@....com>,
        Aithal Srikanth <sraithal@....com>,
        kernel test robot <oliver.sang@...el.com>,
        Sapkal Swapnil <Swapnil.Sapkal@....com>,
        K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [RFC PATCH V1 0/6] sched/numa: Enhance disjoint VMA scanning

On Tue, Sep 19, 2023 at 11:28:30AM +0200, Peter Zijlstra wrote:
> On Tue, Aug 29, 2023 at 11:36:08AM +0530, Raghavendra K T wrote:
> 
> > Peter Zijlstra (1):
> >   sched/numa: Increase tasks' access history
> > 
> > Raghavendra K T (5):
> >   sched/numa: Move up the access pid reset logic
> >   sched/numa: Add disjoint vma unconditional scan logic
> >   sched/numa: Remove unconditional scan logic using mm numa_scan_seq
> >   sched/numa: Allow recently accessed VMAs to be scanned
> >   sched/numa: Allow scanning of shared VMAs
> > 
> >  include/linux/mm.h       |  12 +++--
> >  include/linux/mm_types.h |   5 +-
> >  kernel/sched/fair.c      | 109 ++++++++++++++++++++++++++++++++-------
> >  3 files changed, 102 insertions(+), 24 deletions(-)
> 
> So I don't immediately see anything horrible with this. Mel, do you have
> a few cycles to go over this as well?

I've been trying my best to find the necessary time and it's still on my
radar for this week. Preliminary results don't look great for the first part
of the series up to the patch "sched/numa: Add disjoint vma unconditional
scan logic" even though other reports indicate the performance may be
fixed up later in the series. For example

autonumabench
                                   6.5.0-rc6              6.5.0-rc6
                         sched-pidclear-v1r5   sched-forcescan-v1r5
Min       syst-NUMA02        1.94 (   0.00%)        1.38 (  28.87%)
Min       elsp-NUMA02       12.67 (   0.00%)       21.02 ( -65.90%)
Amean     syst-NUMA02        2.35 (   0.00%)        1.86 (  21.13%)
Amean     elsp-NUMA02       12.93 (   0.00%)       21.69 * -67.76%*
Stddev    syst-NUMA02        0.54 (   0.00%)        0.90 ( -67.67%)
Stddev    elsp-NUMA02        0.18 (   0.00%)        0.44 (-144.19%)
CoeffVar  syst-NUMA02       22.82 (   0.00%)       48.50 (-112.58%)
CoeffVar  elsp-NUMA02        1.38 (   0.00%)        2.01 ( -45.56%)
Max       syst-NUMA02        3.15 (   0.00%)        3.89 ( -23.49%)
Max       elsp-NUMA02       13.16 (   0.00%)       22.36 ( -69.91%)
BAmean-50 syst-NUMA02        2.01 (   0.00%)        1.45 (  27.69%)
BAmean-50 elsp-NUMA02       12.77 (   0.00%)       21.34 ( -67.04%)
BAmean-95 syst-NUMA02        2.22 (   0.00%)        1.52 (  31.68%)
BAmean-95 elsp-NUMA02       12.89 (   0.00%)       21.58 ( -67.39%)
BAmean-99 syst-NUMA02        2.22 (   0.00%)        1.52 (  31.68%)
BAmean-99 elsp-NUMA02       12.89 (   0.00%)       21.58 ( -67.39%)

                   6.5.0-rc6   6.5.0-rc6
                sched-pidclear-v1r5sched-forcescan-v1r5
Duration User        5702.00    10264.25
Duration System        17.02       13.59
Duration Elapsed       92.57      156.30

Similar results seen across multiple machines. It's not universally bad
but the NUMA02 tests appear to suffer quite badly and while not realistic,
they are somewhat relevant because numa02 is likely an "adverse workload"
for the logic that skips VMAs based on PID accesses.

For the rest of the series, the changelogs lacked detail on why those
changes helped. Patch 4's changelog lacks detail and patch 6 stating
"VMAs being accessed by more than two tasks are critical" is not helpful
either -- e.g. why are they critical? They are obviously shared VMAs and
therefore it may be the case that they need to be identified and interleaved
quickly but maybe not. Is the shared VMA that is critical a large malloc'd
area split into per-thread sections or something that is MAP_SHARED? The
changelog doesn't say so I have to guess. There are also a bunch of
magic variables with limited explanation (e.g. why NR_ACCESS_PID_HIST==4
and SHARED_VMA_THRESH=3?), the numab fields are not documented and the
changelogs lack supporting data. I suspect that patches 3-6 may be dealing
with regressions introduced by patch 2, particularly for NUMA02, but I'm
not certain as I didn't dedicate the necessary test time to prove that
and it's the type of information that should be in the changelog. While
there is nothing wrong with that as such, it's very hard to imagine how
patches 3-6 work in every case and be certain that the various parameters
make sense. That could cause difficulties later in terms of maintenance.

My initial thinking was "There should be a standalone series that deals
*only* with scanning VMAs that had no fault activity and skipped due to
PID hashing". These are important because there may be no fault activity
because there is no scan activity which is due to to fault activity. The
series is incomplete and without changelogs but I pushed it anyway to

https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/ sched-numabselective-v1r5

The first two patches simply improve the documentation on what is going
on, patch 3 adds a tracepoint for figuring out why VMAs were skipped or
not skipped. Patch 4 handles a corner case to complete the scan of a VMA
once it has started regardless of what task is doing the scanning. The
last patch scans VMAs that have seen no fault activity once active VMAs
have been scanned.

It has its weaknesses because it may be overly simplisitic and it forces
all VMAs to be scanned on every sequence which is wasteful. It also hurts
NUMA02 performance, although not as badly as ""sched/numa: Add disjoint
vma unconditional scan logic". On the plus side, it is easier to reason
about, it solves only one problem in the series and any patch on top or
modification should justify each change individually.

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ