lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250523095350.GA1215853@bytedance>
Date: Fri, 23 May 2025 17:53:50 +0800
From: Aaron Lu <ziqianlu@...edance.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Valentin Schneider <vschneid@...hat.com>,
	Ben Segall <bsegall@...gle.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
	Chengming Zhou <chengming.zhou@...ux.dev>,
	Chuyi Zhou <zhouchuyi@...edance.com>,
	Jan Kiszka <jan.kiszka@...mens.com>,
	Florian Bezdeka <florian.bezdeka@...mens.com>
Subject: Re: [PATCH 2/7] sched/fair: prepare throttle path for task based
 throttle

On Thu, May 22, 2025 at 08:40:02PM +0800, Aaron Lu wrote:
> On Thu, May 22, 2025 at 01:54:18PM +0200, Peter Zijlstra wrote:
> > On Thu, May 22, 2025 at 07:44:55PM +0800, Aaron Lu wrote:
> > > On Thu, May 22, 2025 at 12:48:43PM +0200, Peter Zijlstra wrote:
> > > > On Tue, May 20, 2025 at 06:41:05PM +0800, Aaron Lu wrote:
> > > > 
> > > > >  static void throttle_cfs_rq_work(struct callback_head *work)
> > > > >  {
> > > > > +	struct task_struct *p = container_of(work, struct task_struct, sched_throttle_work);
> > > > > +	struct sched_entity *se;
> > > > > +	struct cfs_rq *cfs_rq;
> > > > > +	struct rq *rq;
> > > > > +
> > > > > +	WARN_ON_ONCE(p != current);
> > > > > +	p->sched_throttle_work.next = &p->sched_throttle_work;
> > > > > +
> > > > > +	/*
> > > > > +	 * If task is exiting, then there won't be a return to userspace, so we
> > > > > +	 * don't have to bother with any of this.
> > > > > +	 */
> > > > > +	if ((p->flags & PF_EXITING))
> > > > > +		return;
> > > > > +
> > > > > +	scoped_guard(task_rq_lock, p) {
> > > > > +		se = &p->se;
> > > > > +		cfs_rq = cfs_rq_of(se);
> > > > > +
> > > > > +		/* Raced, forget */
> > > > > +		if (p->sched_class != &fair_sched_class)
> > > > > +			return;
> > > > > +
> > > > > +		/*
> > > > > +		 * If not in limbo, then either replenish has happened or this
> > > > > +		 * task got migrated out of the throttled cfs_rq, move along.
> > > > > +		 */
> > > > > +		if (!cfs_rq->throttle_count)
> > > > > +			return;
> > > > > +		rq = scope.rq;
> > > > > +		update_rq_clock(rq);
> > > > > +		WARN_ON_ONCE(!list_empty(&p->throttle_node));
> > > > > +		dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL);
> > > > > +		list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
> > > > > +		resched_curr(rq);
> > > > > +	}
> > > > > +
> > > > > +	cond_resched_tasks_rcu_qs();
> > > > >  }
> > > > 
> > > > What's that cond_resched thing about? The general plan is to make
> > > > cond_resched go away.
> > > 
> > > Got it.
> > > 
> > > The purpose is to let throttled task schedule and also mark a task rcu
> > > quiescent state. Without this cond_resched_tasks_rcu_qs(), this task
> > > will be scheduled by cond_resched() in task_work_run() and since that is
> > > a preempt schedule, it didn't mark a task rcu quiescent state.
> > > 
> > > Any suggestion here? Perhaps a plain schedule()? Thanks.
> > 
> > I am confused, this is task_work_run(), that is ran from
> > exit_to_user_mode_loop(), which contains a schedule().
>

I should probably have added that the schedule() call contained in
exit_to_user_mode_loop() is early in that loop, where the to-be-throttled
task doesn't have need_resched bit set yet.

> There is a cond_resched() in task_work_run() loop:
> 
> 		do {
> 			next = work->next;
> 			work->func(work);
> 			work = next;
> 			cond_resched();
> 		} while (work);
> 
> And when this throttle work returns with need_resched bit set,
> cond_resched() will cause a schedule but that didn't mark a task
> quiescent state...

Another approach I can think of is to add a test of task_is_throttled()
in rcu_tasks_is_holdout(). I remembered when I tried this before, I can
hit the following path:

exit_to_user_mode_loop() -> get_signal() -> throttle_task_work() -> 
do_exit() -> exit_signals() -> percpu_rwsem_wait() -> schedule() -> 
try_to_block_task() -> dequeue_task_fair().

I would like to avoid this path, because it doesn't feel right for a
throttled task to go through another dequeue again(except for the cases
like task group change, affinity change etc. are special cases that have
to be dealed with though).

It looks to me, a schedule() call(or any other form) that makes sure
this throttled task gets scheduled in its task work is the safest thing
to do.

Thoughts?

Thanks,
Aaron

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ