[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSyD1MhCaAzycSUSQfirLaLp22mcabVr3jfaRbJqFRkX2VoFw@mail.gmail.com>
Date: Thu, 19 Jun 2025 11:49:58 +0800
From: Zhongkun He <hezhongkun.hzk@...edance.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: Tejun Heo <tj@...nel.org>, Waiman Long <llong@...hat.com>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, muchun.song@...ux.dev
Subject: Re: [External] Re: [PATCH] cpuset: introduce non-blocking cpuset.mems
setting option
On Wed, Jun 18, 2025 at 5:05 PM Michal Koutný <mkoutny@...e.com> wrote:
>
> On Wed, Jun 18, 2025 at 10:46:02AM +0800, Zhongkun He <hezhongkun.hzk@...edance.com> wrote:
> > It is unnecessary to adjust memory affinity periodically from userspace,
> > as it is a costly operation.
>
> It'd be always costly when there's lots of data to migrate.
>
> > Instead, we need to shrink cpuset.mems to explicitly specify the NUMA
> > node from which newly allocated pages should come, and migrate the
> > pages once in userspace slowly or adjusted by numa balance.
>
> IIUC, the issue is that there's no set_mempolicy(2) for 3rd party
> threads (it only operates on current) OR that the migration path should
> be optimized to avoid those latencies -- do you know what is the
> contention point?
Hi Michal
In our scenario, when we shrink the allowed cpuset.mems —for example,
from nodes 1, 2, 3 to just nodes 2,3—there may still be a large number of pages
residing on node 1. Currently, modifying cpuset.mems triggers synchronous memory
migration, which results in prolonged and unacceptable service downtime under
cgroup v2. This behavior has become a major blocker for us in adopting
cgroup v2.
Tejun suggested adding an interface to control the migration rate, and
I plan to try
that later. However, we believe that the cpuset.migrate interface in
cgroup v1 is also
sufficient for our use case and is easier to work with. :)
Thanks,
Zhongkun
>
> Thanks,
> Michal
Powered by blists - more mailing lists