lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250523105222.GJ24938@noisy.programming.kicks-ass.net>
Date: Fri, 23 May 2025 12:52:22 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Aaron Lu <ziqianlu@...edance.com>
Cc: Valentin Schneider <vschneid@...hat.com>,
	Ben Segall <bsegall@...gle.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
	Chengming Zhou <chengming.zhou@...ux.dev>,
	Chuyi Zhou <zhouchuyi@...edance.com>,
	Jan Kiszka <jan.kiszka@...mens.com>,
	Florian Bezdeka <florian.bezdeka@...mens.com>,
	Paul McKenney <paulmck@...nel.org>
Subject: Re: [PATCH 2/7] sched/fair: prepare throttle path for task based
 throttle

On Fri, May 23, 2025 at 05:53:50PM +0800, Aaron Lu wrote:
> On Thu, May 22, 2025 at 08:40:02PM +0800, Aaron Lu wrote:
> > On Thu, May 22, 2025 at 01:54:18PM +0200, Peter Zijlstra wrote:
> > > On Thu, May 22, 2025 at 07:44:55PM +0800, Aaron Lu wrote:
> > > > On Thu, May 22, 2025 at 12:48:43PM +0200, Peter Zijlstra wrote:
> > > > > On Tue, May 20, 2025 at 06:41:05PM +0800, Aaron Lu wrote:
> > > > > 
> > > > > >  static void throttle_cfs_rq_work(struct callback_head *work)
> > > > > >  {
> > > > > > +	struct task_struct *p = container_of(work, struct task_struct, sched_throttle_work);
> > > > > > +	struct sched_entity *se;
> > > > > > +	struct cfs_rq *cfs_rq;
> > > > > > +	struct rq *rq;
> > > > > > +
> > > > > > +	WARN_ON_ONCE(p != current);
> > > > > > +	p->sched_throttle_work.next = &p->sched_throttle_work;
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * If task is exiting, then there won't be a return to userspace, so we
> > > > > > +	 * don't have to bother with any of this.
> > > > > > +	 */
> > > > > > +	if ((p->flags & PF_EXITING))
> > > > > > +		return;
> > > > > > +
> > > > > > +	scoped_guard(task_rq_lock, p) {
> > > > > > +		se = &p->se;
> > > > > > +		cfs_rq = cfs_rq_of(se);
> > > > > > +
> > > > > > +		/* Raced, forget */
> > > > > > +		if (p->sched_class != &fair_sched_class)
> > > > > > +			return;
> > > > > > +
> > > > > > +		/*
> > > > > > +		 * If not in limbo, then either replenish has happened or this
> > > > > > +		 * task got migrated out of the throttled cfs_rq, move along.
> > > > > > +		 */
> > > > > > +		if (!cfs_rq->throttle_count)
> > > > > > +			return;
> > > > > > +		rq = scope.rq;
> > > > > > +		update_rq_clock(rq);
> > > > > > +		WARN_ON_ONCE(!list_empty(&p->throttle_node));
> > > > > > +		dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL);
> > > > > > +		list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
> > > > > > +		resched_curr(rq);
> > > > > > +	}
> > > > > > +
> > > > > > +	cond_resched_tasks_rcu_qs();
> > > > > >  }
> > > > > 
> > > > > What's that cond_resched thing about? The general plan is to make
> > > > > cond_resched go away.
> > > > 
> > > > Got it.
> > > > 
> > > > The purpose is to let throttled task schedule and also mark a task rcu
> > > > quiescent state. Without this cond_resched_tasks_rcu_qs(), this task
> > > > will be scheduled by cond_resched() in task_work_run() and since that is
> > > > a preempt schedule, it didn't mark a task rcu quiescent state.
> > > > 
> > > > Any suggestion here? Perhaps a plain schedule()? Thanks.
> > > 
> > > I am confused, this is task_work_run(), that is ran from
> > > exit_to_user_mode_loop(), which contains a schedule().
> >
> 
> I should probably have added that the schedule() call contained in
> exit_to_user_mode_loop() is early in that loop, where the to-be-throttled
> task doesn't have need_resched bit set yet.

No, but if it does get set, it will get picked up at:

	ti_work = read_thread_flags();

and since TIF_NEED_RESCHED is part of EXIT_TO_USER_MODE_WORK, we'll get
another cycle, and do the schedule() thing.

> > There is a cond_resched() in task_work_run() loop:
> > 
> > 		do {
> > 			next = work->next;
> > 			work->func(work);
> > 			work = next;
> > 			cond_resched();
> > 		} while (work);

That cond_resched() is equally going away.

> > And when this throttle work returns with need_resched bit set,
> > cond_resched() will cause a schedule but that didn't mark a task
> > quiescent state...
> 
> Another approach I can think of is to add a test of task_is_throttled()
> in rcu_tasks_is_holdout(). I remembered when I tried this before, I can
> hit the following path:

So this really is about task_rcu needing something? Let me go look at
task-rcu.

So AFAICT, exit_to_user_mode_loop() will do schedule(), which will call
__schedule(SM_NONE), which then will have preempt = false and call:
rcu_note_context_switch(false) which in turn will do:
rcu_task_rq(current, false).

This should be sufficient, no?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ