lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230605180013.GD221380@cmpxchg.org>
Date:   Mon, 5 Jun 2023 14:00:13 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Charan Teja Kalla <quic_charante@...cinc.com>
Cc:     akpm@...ux-foundation.org, minchan@...nel.org,
        quic_pkondeti@...cinc.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Suren Baghdasaryan <surenb@...gle.com>
Subject: Re: [PATCH] mm: madvise: fix uneven accounting of psi

Hi Charan,

On Thu, Jun 01, 2023 at 06:37:50PM +0530, Charan Teja Kalla wrote:
> Thanks Johannes for taking a look at it..
> 
> On 6/1/2023 3:49 AM, Johannes Weiner wrote:
> > On Wed, May 31, 2023 at 04:39:34PM +0530, Charan Teja Kalla wrote:
> >> This patch is tested on Android, Snapdragon SoC with 8Gb RAM, 4GB swap
> >> mounted on zram which has 2GB of backingdev. The test case involved
> >> launching some memory hungry apps in an order and do the proactive
> >> reclaim for the app that went to background using madvise(MADV_PAGEOUT).
> >> We are seeing ~40% less total values of psi mem some and full when this
> >> patch is combined with [1].
> > Does that mean those pages are thrashing, but because you clear their
> > workingset it isn't detected and reported via psi?
> > 
> 
> Seems I didn't mention the usecase clearly and let me correct it. Say we
> have the Android apps A, B, C, ... H and launching of these apps goes
> like below.
> 
> 1) Launch app A.
> 2) Launch app B.
> 3) Launch app C. At this time, we consider the memory used by app A is
> no more in active use thus proactively reclaim them where we do issue
> MADV_PAGEOUT on anon regions only thus these pages goes to swap mounted
> on zram and subsequently into the backing dev attached to the zram.
> 4) Launch app D.. Proactively reclaim the anon regions of App B into
> swap and through to backing dev.
> 5) Now make the app A to foreground. This can read the pages from the
> swap + backing dev (because of the step 3)) that belongs to app A and
> also proactively reclaim anon regions of app C.
> 6) Launch E --> proactive reclaim of app D to zram swap + backing dev.
> 7) Make App B to foreground --> Read memory of app B from swap +
> backingdev and as well reclaim the anon regions of app A.
> 8) Like wise launches of apps F, C, G, D, H, E .....
> 
> If we look at steps 5, 7,..., we are just making the apps foreground
> which folios (if were marked as workingset) can contribute to PSI events
> through swap_readpage(). But in reality, these are not the real
> workingset folios (I think it is safe to say this), because it is the
> user who decided the reclaim of these pages by issuing the MADV_PAGEOUT
> which he knows that these are not going to be needed in the near future
> thus reclaim them.
> 
> I think the other way to look at the problem is the user can write a
> simple application where he can do  MADV_PAGEOUT and read them back in a
> loop. If at any point, folios user working on turns out to be a
> workingset( he can just be probabilistic here), the PSI events will be
> triggered though there may not be real memory pressure in the system.
>
> > I don't rally get why silencing the thrashing is an improvement.
> > 
> Agree that we shouldn't be really silence the thrashing. My point is we
> shouldn't be  considering the folios as thrashing If those were getting
> reclaim by the user him self through MADV_PAGEOUT under the assumption
> that __user knows they are not real working set__.  Please let me know
> if I am not making sense here.

I'm not sure I agree with this. I think it misses the point of what
the madvise is actually for.

The workingset is defined based on access frequency and available
memory. Thrashing is defined as having to read pages back shortly
after their eviction.

MADV_PAGEOUT is for the application to inform the kernel that it's
done accessing the pages, so that the kernel can accelerate their
eviction over other pages that may still be in use. This is ultimately
meant to REDUCE reclaim and paging.

However, in this case, the MADVISE_PAGEOUT evicts pages that are
reused after and then refault. It INCREASED reclaim and paging.

Surely that's a problem? And the system would have behaved better
without the madvise() in the first place?

In fact, I would argue that the pressure spike is a great signal for
detecting overzealous madvising. If you're redefining the workingset
from access frequency to "whatever the user is saying", that will take
away an important mechanism to detect advise bugs and unnecessary IO.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ