linux-kernel - Re: [PATCH] mm: swap: async free swap slot cache entries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkZSGm+ZMZsg+pRPZr2L9uydxXZ0B9oUn-H3=dQsyrM1dg@mail.gmail.com>
Date: Thu, 28 Dec 2023 07:34:44 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Chris Li <chrisl@...nel.org>
Cc: David Rientjes <rientjes@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, Wei Xu <weixugc@...gle.com>, 
	Yu Zhao <yuzhao@...gle.com>, Greg Thelen <gthelen@...gle.com>, 
	Chun-Tse Shao <ctshao@...gle.com>, Suren Baghdasaryan <surenb@...gle.com>, Brain Geffon <bgeffon@...gle.com>, 
	Minchan Kim <minchan@...nel.org>, Michal Hocko <mhocko@...e.com>, 
	Mel Gorman <mgorman@...hsingularity.net>, Huang Ying <ying.huang@...el.com>, 
	Nhat Pham <nphamcs@...il.com>, Johannes Weiner <hannes@...xchg.org>, Kairui Song <kasong@...cent.com>, 
	Zhongkun He <hezhongkun.hzk@...edance.com>, Kemeng Shi <shikemeng@...weicloud.com>, 
	Barry Song <v-songbaohua@...o.com>, Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH] mm: swap: async free swap slot cache entries

On Sun, Dec 24, 2023 at 2:07 PM Chris Li <chrisl@...nel.org> wrote:
>
> On Sun, Dec 24, 2023 at 1:13 PM David Rientjes <rientjes@...gle.com> wrote:
> >
> > On Sun, 24 Dec 2023, Chris Li wrote:
> >
> > > On Sat, Dec 23, 2023 at 7:01 PM David Rientjes <rientjes@...gle.com> wrote:
> > > >
> > > > On Sat, 23 Dec 2023, Chris Li wrote:
> > > >
> > > > > > How do you quantify the impact of the delayed swap_entry_free()?
> > > > > >
> > > > > > Since the free and memcg uncharge are now delayed, is there not the
> > > > > > possibility that we stay under memory pressure for longer?  (Assuming at
> > > > > > least some users are swapping because of memory pressure.)
> > > > > >
> > > > > > I would assume that since the free and uncharge itself is delayed that in
> > > > > > the pathological case we'd actually be swapping *more* until the async
> > > > > > worker can run.
> > > > >
> > > > > Thanks for raising this interesting question.
> > > > >
> > > > > First of all, the swap_entry_free() does not impact "memory.current".
> > > > > It reduces "memory.swap.current". Technically it is the swap pressure
> > > > > not memory pressure that suffers the extra delay.
> > > > >
> > > > > Secondly, we are talking about delaying up to 64 swap entries for a
> > > > > few microseconds.
> > > >
> > > > What guarantees that the async freeing happens within a few microseconds?
> > >
> > > Linux kernel typically doesn't provide RT scheduling guarantees. You
> > > can change microseconds to milliseconds, my following reasoning still
> > > holds.
> > >
> >
> > What guarantees that the async freeing happens even within 10s?  Your
> > responses are implying that there is some deadline by which this freeing
> > absolutely must happen (us or ms), but I don't know of any strong
> > guarantees.
>
> I think we are in agreement there, there are no such strong guarantees
> in linux scheduling. However, when there are free CPU resources, the
> job will get scheduled to execute in a reasonable table time frame. If
> it does not, I consider that a bug if the CPU has idle resources and
> the pending jobs are not able to run for a long time.
> The existing code doesn't have such a guarantee either, see my point
> follows. I don't know why you want to ask for such a guarantee.
>
> > If there are no absolute guarantees about when the freeing may now occur,
> > I'm asking how the impact of the delayed swap_entry_free() can be
> > quantified.
>
> Presumably each application has their own SLO metrics for monitoring
> their application behavior. I am happy to take a look if any app has
> new SLO violations caused by this change.
> If you have one metric in mind, please  name it so we can look at it
> together. During my current experiment and the chromebook benchmark, I
> haven't noticed such ill effects show up in the other metrics drops in
> a statistically significant manner. That is not the same as saying
> such drops don't exist at all. Just I haven't noticed or the SLO
> watching system hasn't caught it.
>
> > The benefit to the current implementation is that there *are* strong
> > guarantees about when the freeing will occur and cannot grow exponentially
> > before the async worker can do the freeing.
>
> I don't understand your point. Please help me. In the current code,
> for the previous swapin fault that releases the swap slots into the
> swap slot caches. Let's say the swap slot remains in the cache for X
> seconds until Nth (N < 64) swapin page fault later, the cache is full
> and finally all 64 swap slot caches are free. Are you suggesting there
> is some kind of guarantee X is less than some fixed bound seconds?
> What is that bound then? 10 second? 1 minutes?
>
> BTW, there will be no exponential growth, that is guaranteed. Until
> the 64 entries cache were freed. The swapin code will take the direct
> free path for the current swap slot in hand. The direct free path
> existed before my change.

FWIW, it's 64 * the number of CPUs.