linux-kernel - Re: [External] Re: [PATCH] cpuset: introduce non-blocking cpuset.mems setting option

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACSyD1OWe-PkUjmcTtbYCbLi3TrxNQd==-zjo4S9X5Ry3Gwbzg@mail.gmail.com>
Date: Sat, 24 May 2025 09:10:21 +0800
From: Zhongkun He <hezhongkun.hzk@...edance.com>
To: Tejun Heo <tj@...nel.org>
Cc: Waiman Long <llong@...hat.com>, cgroups@...r.kernel.org, linux-kernel@...r.kernel.org, 
	muchun.song@...ux.dev
Subject: Re: [External] Re: [PATCH] cpuset: introduce non-blocking cpuset.mems
 setting option

On Sat, May 24, 2025 at 12:51 AM Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
> On Fri, May 23, 2025 at 11:35:57PM +0800, Zhongkun He wrote:
> > > Is this something you want on the whole machine? If so, would global cgroup
> > > mount option work?
> >
> > It doesn't apply to the whole machine. It is only relevant to the pod with
> > huge pages, where the service will be unavailable for over ten seconds if
> > modify the cpuset.mems. Therefore, it would be ideal if there were an
> > option to disable the migration for this special case.
>
> I suppose we can add back an interface similar to cgroup1 but can you detail
> the use case a bit? If you relocate threads without relocating memory, you'd

Thanks, that sounds great.

> be paying on-going cost for memory access. It'd be great if you can
> elaborate why such mode of operation is desirable.
>
> Thanks.

This is a story about optimizing CPU and memory bandwidth utilization.
In our production environment, the application exhibits distinct peak
and off-peak cycles and the cpuset.mems interface is modified
several times within a day.

During off-peak periods, tasks are evenly distributed across all NUMA nodes.
When peak periods arrive, we collectively migrate tasks to a designated node,
freeing up another node to accommodate new resource-intensive tasks.

We move the task by modifying the cpuset.cpus and cpuset.mems and
the memory migration is an option with cpuset.memory_migrate
interface in V1. After we relocate the threads, the memory will be
migrated by syscall move_pages in userspace slowly, within a few
minutes.

Presently, cpuset.mems triggers synchronous memory migration,
leading to prolonged and unacceptable service downtime in V2.

So we hope to add back an interface similar to cgroup v1, optional
the migration.

Thanks.

>
> --
> tejun