[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc9f5209-5c59-c921-d85e-e2e54b2375db@redhat.com>
Date: Wed, 20 Apr 2022 13:34:58 -0400
From: Nico Pache <npache@...hat.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org, aquini@...hat.com,
shakeelb@...gle.com, llong@...hat.com, mhocko@...e.com,
hakavlad@...ox.lv
Subject: Re: [PATCH v3] vm_swappiness=0 should still try to avoid swapping
anon memory
On 4/20/22 10:01, Johannes Weiner wrote:
>> My swappiness=0 solution was a minimal approach to regaining the 'avoid swapping
>> ANON' behavior that was previously there, but as Shakeel pointed out, there may
>> be something larger at play.
>
> So with my patch and swappiness=0 you get excessive swapping on v1 but
> not on v2? And the patch to avoid DEACTIVATE_ANON fixes it?
correct, I haven't tested the DEACTIVATE_ANON patch since last time I was
working on this, but it did cure it. I can build a new kernel with it and verify
again.
The larger issue is that our workload has regressed in performance.
With V2 and swappiness=10 we are still seeing some swap, but very little tearing
down of THPs over time. With swappiness=0 it did some when swap but we are not
losings GBs of THPS (with your patch swappiness=0 has swap or THP issues on V2).
With V1 and swappiness=(0|10)(with and without your patch), it swaps a ton and
ultimately leads to a significant amount of THP splitting. So the longer the
system/workload runs, the less likely we are to get THPs backing the guest and
the performance gain from THPs is lost.
So your patch does help return the old swappiness=0 behavior, but only for V2.
Ideally we would like to keep swappiness>0 but I found that with my patch and
swappiness=0 we could create a workaround for this effect on V1, but any other
value still results in the THP issue.
After the workload is run with V2 and swappiness=0 the host system look like this**:
total used free shared buff/cache available
Mem: 264071432 257536896 927424 4664 5607112 4993184
Swap: 4194300 0 4194300
Node 0 AnonPages: 128145476 kB Node 1 AnonPages: 128111908 kB
Node 0 AnonHugePages: 128026624 kB Node 1 AnonHugePages: 128090112 kB
** without your patch there is still some swap and THP splitting but nothing
like the case below.
Same workload on V1/swappiness=0 looks like this:
total used free shared buff/cache available
Mem: 264071432 257169500 1032612 4192 5869320 5357944
Swap: 4194300 623008 3571292
Node 0 AnonPages: 127927156 kB Node 1 AnonPages: 127701088 kB
Node 0 AnonHugePages: 127789056 kB Node 1 AnonHugePages: 87552000 kB
^^^^^^^
This leads to the performance regression I'm referring to in later workloads.
V2 used to have a similar effect to V1, but not nearly as bad. Recent updates
upstream fixed this in V2.
The workload tests multiple FS types so this is most likely not a FS specific
issue either.
> If you haven't done so, it could be useful to litter shrink_node() and
> get_scan_count() with trace_printk() to try to make sense of all the
> decisions that result in it swapping.
Will do :) I was originally doing some BPF tracing that lead me to find the
DEACTIVE_ANON case.
Thanks,
-- Nico
Powered by blists - more mailing lists