[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250314094249.GC1633113@bytedance>
Date: Fri, 14 Mar 2025 17:42:49 +0800
From: Aaron Lu <ziqianlu@...edance.com>
To: Chengming Zhou <chengming.zhou@...ux.dev>
Cc: Valentin Schneider <vschneid@...hat.com>,
	Ben Segall <bsegall@...gle.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Peter Zijlstra <peterz@...radead.org>,
	Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
	Chuyi Zhou <zhouchuyi@...edance.com>
Subject: Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based
 throttle
On Fri, Mar 14, 2025 at 04:39:41PM +0800, Chengming Zhou wrote:
> On 2025/3/13 15:21, Aaron Lu wrote:
> > From: Valentin Schneider <vschneid@...hat.com>
> > 
> > Once a cfs_rq gets throttled, for all tasks belonging to this cfs_rq,
> > add a task work to them so that when those tasks return to user, the
> > actual throttle/dequeue can happen.
> > 
> > Note that since the throttle/dequeue always happens on a task basis when
> > it returns to user, it's no longer necessary for check_cfs_rq_runtime()
> > to return a value and pick_task_fair() acts differently according to that
> > return value, so check_cfs_rq_runtime() is changed to not return a
> > value.
> 
> Previously with the per-cfs_rq throttling, we use update_curr() -> put() path
> to throttle the cfs_rq and dequeue it from the cfs_rq tree.
> 
> Now with your per-task throttling, maybe things can become simpler. That we
> can just throttle_cfs_rq() (cfs_rq subtree) when curr accouting to mark these
> throttled.
Do I understand correctly that now in throttle_cfs_rq(), we just mark
this hierarchy as throttled, but do not add any throttle work to these
tasks in this hierarchy and leave the throttle work add job to pick
time?
> Then then if we pick a task from a throttled cfs_rq subtree, we can setup task work
> for it, so we don't botter with the delayed_dequeue task case that Prateek mentioned.
If we add a check point in pick time, maybe we can also avoid the check
in enqueue time. One thing I'm thinking is, for a task, it may be picked
multiple times with only a single enqueue so if we do the check in pick,
the overhead can be larger?
> WDYT?
Thanks for your suggestion. I'll try this approach and see how it turned
out.
Powered by blists - more mailing lists
 
