lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZiwjecOGR3G4ZGbS@slm.duckdns.org>
Date: Fri, 26 Apr 2024 11:58:17 -1000
From: Tejun Heo <tj@...nel.org>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: torvalds@...ux-foundation.org, mingo@...hat.com, peterz@...radead.org,
	juri.lelli@...hat.com, vincent.guittot@...aro.org,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
	ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
	martin.lau@...nel.org, joshdon@...gle.com, brho@...gle.com,
	pjt@...gle.com, derkling@...gle.com, haoluo@...gle.com,
	dvernet@...a.com, dschatzberg@...a.com, dskarlat@...cmu.edu,
	riel@...riel.com, changwoo@...lia.com, himadrics@...ia.fr,
	memxor@...il.com, linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
	kernel-team@...a.com, Andrea Righi <andrea.righi@...onical.com>
Subject: Re: [PATCH 12/36] sched_ext: Implement BPF extensible scheduler class

Hello, Joel.

On Thu, Apr 25, 2024 at 05:28:40PM -0400, Joel Fernandes wrote:
> Got it. I took some time to look at it some more. Now I am wondering
> why check_preempt_curr() has to be separately implemented for a class
> and why the enqueue() handler of each class cannot take care of
> preempting curr via setting resched flags.
> 
> The only reason I can see is that, activate_task() is not always
> followed by a check_preempt_curr() and sometimes there is an
> unconditional resched_curr() happening following the call to
> activate_task().
>
> But such issues don't affect sched_ext in its current form I guess.

There's ttwu_runnable() path which just changes the target task's state and
then checks for preemption. The path doesn't involve enqueueing but can
still preempt. Maybe SCX might need to support this in the future too but it
doesn't seem pressing.

> Btw, if sched_ext were to be implemented as a higher priority class
> above CFS [1], then check_preempt_curr() may preempt without even
> calling the class's check_preempt_curr() :
> 
> void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
> {
>         if (p->sched_class == rq->curr->sched_class)
>                 rq->curr->sched_class->check_preempt_curr(rq, p, flags);
>         else if (sched_class_above(p->sched_class, rq->curr->sched_class))
>                 resched_curr(rq);
> 
> But if I understand, sched_ext is below CFS at the moment, so that
> should not be an issue.
>
> [1] By the way, now that I brought up the higher priority class thing,
> I might as well discuss it here :-D :
> 
> One of my use cases is about scheduling high priority latency sensitive threads:
> I think if sched_ext could have 2 classes, one lower than CFS and one
> above CFS, that would be beneficial to those who want a gradual
> transition to use scx, instead of switching all tasks to scx at once.
> 
> One reason is EAS (in CFS).  It may be beneficial for people to use
> the existing EAS for everything but latency critical tasks (much like
> how people use RT class for those). This is quite involved and
> reimplementing EAS in BPF may be quite a project. Not that it
> shouldn't be implemented that way, but EAS is about a decade old with
> all kinds of energy modeling, math and what not. Having scx higher
> than cfs alongside the lower one is less of an invasive approach than
> switching everything on the system to scx.

I see.

> Do you have any opinions on that? If it makes sense, I can work on
> such an implementation.
> 
> Another reason for this is, general purpose systems run very varied
> workloads, and big dramatic changes are likely to be reverted due to
> power and performance regressions.  Hence, the request for a higher
> scx, so that we (high priority task scx users) can take baby steps.

Yeah, as a use case, it makes sense to me. Would it suffice to be able to
choose between above or below the fair class tho?

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ