[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=PD+P_wugkAJ83ti6YRo4-6QNM7HDFs+KDURVwx2JrnZg@mail.gmail.com>
Date: Thu, 12 Jun 2025 13:08:08 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Kairui Song <ryncsn@...il.com>
Cc: youngjun.park@....com, linux-mm@...ck.org, akpm@...ux-foundation.org,
hannes@...xchg.org, mhocko@...nel.org, roman.gushchin@...ux.dev,
shakeel.butt@...ux.dev, cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
shikemeng@...weicloud.com, bhe@...hat.com, baohua@...nel.org,
chrisl@...nel.org, muchun.song@...ux.dev, iamjoonsoo.kim@....com,
taejoon.song@....com, gunho.lee@....com
Subject: Re: [RFC PATCH 2/2] mm: swap: apply per cgroup swap priority
mechansim on swap layer
On Thu, Jun 12, 2025 at 11:20 AM Kairui Song <ryncsn@...il.com> wrote:
>
> On Fri, Jun 13, 2025 at 1:28 AM Nhat Pham <nphamcs@...il.com> wrote:
> >
> > On Thu, Jun 12, 2025 at 4:14 AM Kairui Song <ryncsn@...il.com> wrote:
> > >
> > > On Thu, Jun 12, 2025 at 6:43 PM <youngjun.park@....com> wrote:
> > > >
> > > > From: "youngjun.park" <youngjun.park@....com>
> > > >
> > >
> > > Hi, Youngjun,
> > >
> > > Thanks for sharing this series.
> > >
> > > > This patch implements swap device selection and swap on/off propagation
> > > > when a cgroup-specific swap priority is set.
> > > >
> > > > There is one workaround to this implementation as follows.
> > > > Current per-cpu swap cluster enforces swap device selection based solely
> > > > on CPU locality, overriding the swap cgroup's configured priorities.
> > >
> > > I've been thinking about this, we can switch to a per-cgroup-per-cpu
> > > next cluster selector, the problem with current code is that swap
> >
> > What about per-cpu-per-order-per-swap-device :-? Number of swap
> > devices is gonna be smaller than number of cgroups, right?
>
> Hi Nhat,
>
> The problem is per cgroup makes more sense (I was suggested to use
> cgroup level locality at the very beginning of the implementation of
> the allocator in the mail list, but it was hard to do so at that
> time), for container environments, a cgroup is a container that runs
> one type of workload, so it has its own locality. Things like systemd
> also organize different desktop workloads into cgroups. The whole
> point is about cgroup.
Yeah I know what cgroup represents. Which is why I mentioned in the
next paragraph that are still making decisions based per-cgroup - we
just organize the per-cpu cache based on swap devices. This way, two
cgroups with similar/same priority list can share the clusters, for
each swapfile, in each CPU. There will be a lot less duplication and
overhead. And two cgroups with different priority lists won't
interfere with each other, since they'll target different swapfiles.
Unless we want to nudge the swapfiles/clusters to be self-partitioned
among the cgroups? :) IOW, each cluster contains pages mostly from a
single cgroup (with some stranglers mixed in). I suppose that will be
very useful for swap on rotational drives where read contiguity is
imperative, but not sure about other backends :-?
Anyway, no strong opinions to be completely honest :) Was just
throwing out some ideas. Per-cgroup-per-cpu-per-order sounds good to
me too, if it's easy to do.
Powered by blists - more mailing lists