lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQl0P90Q7X7fG5q-@fedora>
Date: Tue, 4 Nov 2025 11:34:23 +0800
From: Pingfan Liu <piliu@...hat.com>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: Waiman Long <llong@...hat.com>, linux-kernel@...r.kernel.org,
	cgroups@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
	Pierre Gondois <pierre.gondois@....com>,
	Frederic Weisbecker <frederic@...nel.org>,
	Ingo Molnar <mingo@...hat.com>, Tejun Heo <tj@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Koutný <mkoutny@...e.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCHv4 2/2] sched/deadline: Walk up cpuset hierarchy to decide
 root domain when hot-unplug

On Mon, Nov 03, 2025 at 02:50:15PM +0100, Juri Lelli wrote:
> On 29/10/25 11:31, Waiman Long wrote:
> > On 10/27/25 11:43 PM, Pingfan Liu wrote:
> 
> ...
> 
> > > @@ -2891,16 +2893,32 @@ void dl_add_task_root_domain(struct task_struct *p)
> > >   		return;
> > >   	}
> > > -	rq = __task_rq_lock(p, &rf);
> > > -
> > > +	/* prevent race among cpu hotplug, changing of partition_root_state */
> > > +	lockdep_assert_cpus_held();
> > > +	/*
> > > +	 * If @p is in blocked state, task_cpu() may be not active. In that
> > > +	 * case, rq->rd does not trace a correct root_domain. On the other hand,
> > > +	 * @p must belong to an root_domain at any given time, which must have
> > > +	 * active rq, whose rq->rd traces the valid root domain.
> > > +	 */
> > > +	cpuset_get_task_effective_cpus(p, &msk);
> > > +	cpu = cpumask_first_and(cpu_active_mask, &msk);
> > > +	/*
> > > +	 * If a root domain reserves bandwidth for a DL task, the DL bandwidth
> > > +	 * check prevents CPU hot removal from deactivating all CPUs in that
> > > +	 * domain.
> > > +	 */
> > > +	BUG_ON(cpu >= nr_cpu_ids);
> > > +	rq = cpu_rq(cpu);
> > > +	/*
> > > +	 * This point is under the protection of cpu_hotplug_lock. Hence
> > > +	 * rq->rd is stable.
> > > +	 */
> > 
> > So you trying to find a active sched domain with some dl bw to use for
> > checking. I don't know enough about this dl bw checking code to know if it
> > is valid or not. I will let Juri comment on that.
> 
> So, just to refresh my understanding of this issue, the task was
> sleeping/blocked while the cpu it was running on before blocking has
> been turned off. dl_add_task_root_domain() wrongly adds its bw
> contribution to def_root_domain as it's where offline cpus are attached
> to while off. We instead want to attach the sleeping task contribution
> to the root domain that once comprised also the cpu it was running on
> before blocking. Correct?
> 

Yes, that's correct.

> If that is the case, and assuming nobody touched the sleeping task
> affinity (p->cpus_ptr), can't we just use another online cpu from

In fact, IIUC, the change will be always propagated through the cpuset
hier into cpus_ptr by cpuset_update_tasks_cpumask() in cpuset v2.
(Ridong, please correct me if my understanding is wrong)

But for cpuset v1, due to async, it is not reliable at this point [1].

> current task affinity to get to the right root domain? Somewhat similar
> to what dl_task_offline_migration() is doing in the (!later_rq) case,
> I'm thinking.
> 

Sorry, I don't quite understand what you mean. Do you mean something
like cpumask_any_and(cpu_active_mask, p->cpus_ptr) in
dl_task_offline_migration()?

If so, that will run into the async challenge discussed in [1], where
p->cpus_ptr becomes stale with no active CPUs. However, in fact, there
are still active CPUs in the root domain.


So my plan is to follow Waiman's suggestion. Any further comments or
suggestion?

[1]: https://lore.kernel.org/all/aQge00u94JKGF9Tb@fedora/


Best Regards,

Pingfan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ