[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSyD1My_UJxhDHNjvRmTyNKHcxjhQr0_SH=wXrOFd+dYa0h4A@mail.gmail.com>
Date: Wed, 18 Jun 2025 10:46:02 +0800
From: Zhongkun He <hezhongkun.hzk@...edance.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: Tejun Heo <tj@...nel.org>, Waiman Long <llong@...hat.com>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, muchun.song@...ux.dev
Subject: Re: [External] Re: [PATCH] cpuset: introduce non-blocking cpuset.mems
setting option
On Tue, Jun 17, 2025 at 8:40 PM Michal Koutný <mkoutny@...e.com> wrote:
>
> Hello.
>
> On Sat, May 24, 2025 at 09:10:21AM +0800, Zhongkun He <hezhongkun.hzk@...edance.com> wrote:
> > This is a story about optimizing CPU and memory bandwidth utilization.
> > In our production environment, the application exhibits distinct peak
> > and off-peak cycles and the cpuset.mems interface is modified
> > several times within a day.
> >
> > During off-peak periods, tasks are evenly distributed across all NUMA nodes.
> > When peak periods arrive, we collectively migrate tasks to a designated node,
> > freeing up another node to accommodate new resource-intensive tasks.
> >
> > We move the task by modifying the cpuset.cpus and cpuset.mems and
> > the memory migration is an option with cpuset.memory_migrate
> > interface in V1. After we relocate the threads, the memory will be
> > migrated by syscall move_pages in userspace slowly, within a few
> > minutes.
>
> Why do you need cpuset.mems at all?
> IIUC, you could configure cpuset.mems to a union of possible nodes for
> the pod and then you leave up the adjustments of affinity upon the
> userspace.
It is unnecessary to adjust memory affinity periodically from userspace,
as it is a costly operation. Instead, we need to shrink cpuset.mems to
explicitly specify the NUMA node from which newly allocated pages should
come, and migrate the pages once in userspace slowly or adjusted
by numa balance.
Thanks,
Zhongkun
>
> Thanks,
> Michal
Powered by blists - more mailing lists