lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSyD1OWe-PkUjmcTtbYCbLi3TrxNQd==-zjo4S9X5Ry3Gwbzg@mail.gmail.com>
Date: Sat, 24 May 2025 09:10:21 +0800
From: Zhongkun He <hezhongkun.hzk@...edance.com>
To: Tejun Heo <tj@...nel.org>
Cc: Waiman Long <llong@...hat.com>, cgroups@...r.kernel.org, linux-kernel@...r.kernel.org, 
	muchun.song@...ux.dev
Subject: Re: [External] Re: [PATCH] cpuset: introduce non-blocking cpuset.mems
 setting option

On Sat, May 24, 2025 at 12:51 AM Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
> On Fri, May 23, 2025 at 11:35:57PM +0800, Zhongkun He wrote:
> > > Is this something you want on the whole machine? If so, would global cgroup
> > > mount option work?
> >
> > It doesn't apply to the whole machine. It is only relevant to the pod with
> > huge pages, where the service will be unavailable for over ten seconds if
> > modify the cpuset.mems. Therefore, it would be ideal if there were an
> > option to disable the migration for this special case.
>
> I suppose we can add back an interface similar to cgroup1 but can you detail
> the use case a bit? If you relocate threads without relocating memory, you'd

Thanks, that sounds great.

> be paying on-going cost for memory access. It'd be great if you can
> elaborate why such mode of operation is desirable.
>
> Thanks.

This is a story about optimizing CPU and memory bandwidth utilization.
In our production environment, the application exhibits distinct peak
and off-peak cycles and the cpuset.mems interface is modified
several times within a day.

During off-peak periods, tasks are evenly distributed across all NUMA nodes.
When peak periods arrive, we collectively migrate tasks to a designated node,
freeing up another node to accommodate new resource-intensive tasks.

We move the task by modifying the cpuset.cpus and cpuset.mems and
the memory migration is an option with cpuset.memory_migrate
interface in V1. After we relocate the threads, the memory will be
migrated by syscall move_pages in userspace slowly, within a few
minutes.

Presently, cpuset.mems triggers synchronous memory migration,
leading to prolonged and unacceptable service downtime in V2.

So we hope to add back an interface similar to cgroup v1, optional
the migration.

Thanks.

>
> --
> tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ